Production-Ready Nearest Neighbors With Vector AI

Original article was published by Vector AI on Artificial Intelligence on Medium


Production-Ready Nearest Neighbors With Vector AI

Open-source Vector Search/Nearest Neighbor search solutions have a lot of difficulties that prevent them from being easily used for production purposes. We can see an example of a simplified pipeline below:

A standard vector search engine!

These are incredibly complex pipelines. Their deployment is not straight-forward and their prototype cycles are long. The iteration cycle for vector search and recommendations can also be quite high as you want to continuously test how the search works.

The main problems with these open-source solutions when used practically are:

1) Over-focusing on the fastest nearest neighbor algorithms mathematically without software optimisations and understanding of use-cases. In practice, algorithms form a small part of the production system as there are a number of situations that can be fixed with speed, robustness and edge case handling.

2) The open-source nearest neighbor index libraries are usually built by researchers to reproduce nearest neighbor benchmarks and not for practitioners to test their vectors and the documents associated with the vectors. This means there is no functionality pertinent to documents, clusters, dimensionality reduction or iterating on vector quality. Therefore, the libraries are not necessarily built for practicality and often have limited functionality or are difficult to use.

3) Current existing libraries are difficult to use with steep learning curves and do not offer easy ways to begin using them, often encountering issues such as slow indexing and memory-intensive indexing (the indexes are slow to build and can end up causing crashes if RAM is exceeded — forcing you to re-build your index again!)

4) They do not offer more advanced search options apart from simple nearest neighbor search. Modern-day production search systems require a combination of various search options — for example, must contain or having weighting on words or requiring matching exact case-sensitive strings.

5) Storage. Storing documents/data associated with vectors is just another challenge to productionising these systems — especially in a state that is easily searchable.

And after this, we still need to productionise it. All of these end up requiring you to handle the infrastructure, scaling, encoding, index-adding whilst managing query speed and uptime — causing the developers of the search/recommendation engine to often forget about the most crucial part of the process — the vectors that build the indexes themselves. Instead of focusing on features and the vectors themselves, the business has to then focus on building the infrastructure to offer that feature. Often without the necessary talent in-house to make this process smooth.

Vector AI provides enables vector based applications by allowing data scientists and developers to focus on building search/recommendation/personalisation engines and other vector-based applications through focusing on the vectors instead of the infrastructure/scaling.

Our API and Python SDK allow data scientists and developers to focus on using high quality vectors to store alongside their documents/data.

Vector AI Features

Designed for simplicity. Built for customisability.

Using Vector AI makes it extremely easy to insert documents with vectors. Documents are simply Python dictionaries and vectors are simply lists of floats. An example document can be seen below.

Example document with nested fields in Vector AI (i.e. Python Dictionary)

This way — if your documents get more complex and you need to sort fields, then this is achievable through our example above. These can be accessed intuitively using document_example[‘car.wheels.number’].

To add to the index, simply insert them with our insert_documents API call/function and access them in real time. Afterwards — you still retain all the functionalities your data scientists have!

Vector AI can be used with the entire data science ecosystem!

Vector AI focuses on trying to help users understand their vectors as easily as possible through our SDK’s built-in functionality and comprehensive guides — encouraging users to explore and compare their vectors.

Building the vector index is hard enough but there is a lot more to vectors than simply search:

In other words — we care a lot about the documents most associated with those clusters.

In fact, Vector AI offers a comprehensive suite of tools to assist you in optimising your vector playground, which you can read about in our documentation: https://vector-ai.github.io/vectorai/.

But even then — search can be a complex product. Prior to searching through documents, you still require a lot of customisation/functionality. So — here are some of the complex search functionalities:

1) Hybrid Vector/Text search. Sometimes searches need to be filtered prior to obtaining nearest neighbors. This is built-into our functionality.

2) Weighted Multi-Model search. You need more than 1 model for your search? This is done with 1 line of code. This can be appropriate for if you want to combine image search with text search or if you want to search through different fields of text at the same time and want to vary their importance.

Also — It’s Production Ready From Day One.

Our design choices allow Vector AI to be production-ready from day one and are accessible via API and Python SDK. There is no need to wait hours for our search results to be ready. We know vector search will change the world and we want to make it magical. The value of vectors should be imminent without having to rely on entire teams of data/infrastructure engineers to manage the productionisation because there is such a rapid prototype cycle required for such advanced applications. Using Vector AI, the only bottleneck will be ideation and not tech when building these vector-based applications. This means that once data scientists upload their vectors and their deployed models, they should be able to test, iterative and improve their search solution easily with Vector AI.

So from a production standpoint, some of the difficulties we abstract away are:

  • A deployed database with documents associated with vectors (note: documents must be JSON-serializable — videos and images should ideally be stored elsewhere and then linked here but if this is something you are interested in — talk to us and we can arrange something at vctr.ai)
  • A managed solution that lets you get advanced vector search results immediately and with a lot of out-of-the-box practicality and functionality.

If you feel overwhelmed with all the functionality, here is a small map of our functionalities:

Mind-Map of Vector AI’s functionalities

If you are interested in using Vector AI, feel free to take a look at our Github package here: https://gh.vctr.ai and if you have any questions, fill out our form on our website and will get back to you within 24 hours! Our website: www.vctr.ai!

Check out our vector-based fashion application here at fashionfiesta.me!

Look at our Python and API documentation here:

Vector AI Team.