Original article can be found here (source): Artificial Intelligence on Medium
Technology: How it works
The tool is built on Curiosity’s Mosaik knowledge engine, and uses four key components.
Data Ingestion & Extraction
Connector code ingests the original dataset into the knowledge graph. In most applications we also use extractors to extract text from files (e.g. pdf, doc, etc.) but they were not necessary here because we have access to the textual data.
A suite of natural language processing models including tasks like tokenization, POS, NER, entity linking, and embedding models. NLP models are combined in pipelines for each document type (here only the papers). Curiosity’s Catalyst library (that includes all of our NLP models is available) as open-source software on GitHub.
Curiosity systems use a custom knowledge graph technology. Besides natively supporting full-text search, it has been extended to support search by similarity, filtering by relationships, inferencing and more. The graph is tightly integrated with the NLP models to allow fast model training and enable similarity queries using the embeddings vector data using the HNSW algorithm.
UIs are built with Curiosity’s open source Tesserae component library, that is itself built on top of the Bridge.NET compiler. Besides the open source components, we provide simple-to-use interfaces for integrating search, graph visualization & topic mining as part of the tech stack. The UI is tightly integrated with the back-end and can exchange feedback from users for model retraining (active machine learning).