Monday, July 03, 2023

Vector Database - Let AI Have Memory

The in-depth application of large models has made the demand for vector databases very urgent. By processing vectorized data, the vector database can efficiently handle complex multi-dimensional data query and analysis tasks, which is very suitable for current artificial intelligence scenarios. After the development of large models, projects such as LangChain and AutoGPT have used vector databases extensively.

The vector database is actually not a new thing, but there were not so many vector dimensions in the previous application scenarios.

Why do we need a vector database

It is mainly to solve the problems faced by the current large models: contextual question and answer and hallucination problems, that is, to provide artificial intelligence with memory and knowledge capabilities.

Technical Difficulties

There are currently two major difficulties in vector databases, one is efficient storage, and the other is fast similarity search, or efficient indexing and searching.

Solutions

Existing solutions include multiplication and inverted product quantization, locality-sensitive hashing, hierarchical navigation small worlds, etc.

These solutions have their own advantages and disadvantages. The principle is to reduce the complexity of each processing through (multi-step) filtering. The common problem is that complete accuracy cannot be guaranteed.

It is difficult to have a general algorithm for the processing of high-dimensional data. But the human brain in nature is actually a very good vector database, which can be simulated through a multi-layer network model. This is why hierarchical navigating small worlds is currently the most widely used and best performing algorithm.

Related Projects and Startups

Vespa, Milvus, Qdrant, Weaviate, Pinecone, Zilliz, CozoDB, Twelve Labs, etc.