In recent years, breakthroughs in large language models(LLMs) have led to an explosive growth in vector data related to natural language, and the attention given to vector databases designed to manage these vector data has been continuously increasing. The combination application of LLMs and vector databases has a wide range of applications in various fields, such as semantic search, recommendation systems, and chatbots. This article will explore the application trends of vector databases in LLMs scenarios and, combined with user cases, will provide a detailed introduction to the architecture design and technical implementation of OpenPie's vector database, PieCloudVector.
With the rapid development of AI technology, LLMs have become a key force in driving the cutting-edge development of AIGC. Leveraging the powerful data processing capabilities of LLMs, AIGC can automatically generate high-quality, personalized content, achieving functions such as intelligent Q&A, sentiment analysis, and text generation, and has been widely applied in various fields including finance, media, entertainment, and e-commerce. However, LLMs still face challenges and limitations such as data timeliness, privacy, long-term memory, and hallucinations, which greatly limit the application of LLMs in various scenarios.
To effectively address the limitations of LLMs, the current industry widely adopts the Retrieval-Augmented Generation(RAG) technology to enhance the performance of LLMs.
RAG is a technology that combines an information retrieval system with a generative LLM. Its core idea is to allow the LLM to dynamically retrieve relevant information from an external knowledge base when generating an answer, thereby improving the accuracy and reliability of the content generated by the model.
Transforming the Original Data into Embeddings
The RAG framework consists of three main processes: Retrieval, Augmentation, and Generation:
A vector database is a database system specifically designed for storing and processing multi-dimensional vector data. Through specific models, we can convert text, images, and voice into vectors for storage, and the key task of the vector database is to associate the vectors with the corresponding original data.
After storing the data, another task of the vector database is to establish an index for efficient search of vector data. Although it is possible to search without an index, for example, by using a full table scan to compare records one by one, it is also possible to find the closest vectors, but when the scale of the information library reaches a certain level, this method will become very inefficient, and establishing an index can greatly improve search performance.
Finally, a complete vector database also requires supporting call interfaces and ecological tools to meet users' needs in various aspects such as data type support, flexibility, and easy use of the interface. The supporting tool ecosystem is directly related to the user-friendliness and practicality of the vector database.
OpenPie's cloud-native vector database PieCloudVector further realizes the storage and efficient querying of massive vector data, has multi-dimensional analysis capabilities, can support and cooperate with the Embeddings of LLMs, and helps multi-modal AI applications, represents the dimensional upgrade of analytical databases in the era of LLMs.
The current vector database technology field is mainly divided into two schools:
To integrate the strengths of the two technical directions, OpenPie's PieCloudVector integrates with Faiss on the basis of relational database, ensuring good data universality while performance also reaches the forefront of the industry.
Developing the Database Kernel Based on PostgreSQL
PieCloudVector is built on the PostgreSQL kernel. This approach allows PieCloudVector to inherit all the advantages of relational databases, including:
Drawing on its extensive experience and technical prowess in the eMPP (elastic MPP) and distributed systems domain, OpenPie has crafted the distributed architecture for its cloud-native virtual data warehouse, PieCloudDB Database, and applied these learnings to develop PieCloudVector.
The following shows the distributed architecture of PieCloudVector, including a Coordinator node and multiple Executor nodes. The PieCloudVector execution engine supports CPU for search and acceleration, as well as GPU acceleration. Data can be stored on local disks or cloud storage (S3). As we all know, although cloud storage has advantages in storage costs, its access performance may not be as good as local disks. However, the index of PieCloudVector is stored in memory, which can greatly avoid the impact of storage on system performance.
PieCloudVector Distributed Architecture
Developing a Vector Search Engine integrated with Faiss
PieCloudVector has built a vector search engine integrated with the open-source algorithm library: Faiss, which is also the choice of many mainstream vector databases on the market. Because there are many types of vector search algorithms, and each algorithm includes details that are also very complex. The cost of developing algorithms from scratch is too high in terms of manpower and time, while the advantage of Faiss is that it covers mainstream vector search algorithms and adjustable parameters. It is currently the most popular and efficient algorithm library, which not only ensures performance but also saves a lot of development costs.
In vector search, the K-Nearest Neighbor (KNN) algorithm is one of the most basic problems. By calculating the distance between the query vector and each sample in the dataset, a set of K vectors that are closest in distance are found to provide accurate search results. However, as the amount of data increases, the computational cost of this method will also increase exponentially, resulting in a long search time for large high-dimensional datasets.
To alleviate this situation, PieCloudVector has introduced the Approximate Nearest Neighbor (ANN) algorithm, which makes a trade-off between performance, recall rate, and memory. It sacrifices some accuracy to accelerate query speed and further improve query efficiency.
PieCloudVector supports mainstream ANN algorithms, including the most popular IVF and HNSW algorithms.
PieCloudVector Supports Mainstream ANN Algorithms
In practical scenarios, to meet performance requirements, almost all data needs to be loaded into memory for access. When the scale of vector data reaches a certain level, especially high-dimensional data, direct storage will put a lot of pressure on memory. Therefore, PieCloudVector has adopted the Product Quantization to compress vector data, which is also one of the most popular compression algorithms currently.
The theory of Product Quantization is to decompose the original high-dimensional vector into several low-dimensional sub-vectors, and quantize each sub-vector independently. In this way, each original vector can be represented by a combination of quantization codes in multiple low-dimensional spaces, saving a lot of memory space and accelerating the retrieval speed of the entire system.
Product Quantization
In addition, the Faiss algorithm library also has many other features, including:
Integration of Faiss with the PostgreSQL Kernel
After selecting the Faiss algorithm library as the vector search engine, the next step is to implement the integration of Faiss with the PostgreSQL kernel.
During the docking process, PieCloudVector has added a new vector column type for basic loading and unloading. The number of vector dimensions is determined according to the actual Embedding model, and other data operations (CREATE, INSERT, COPY, etc.) are consistent with traditional relational databases, supporting standard SQL syntax, which can greatly reduce the learning cost for users.
In addition to adding new data types, PieCloudVector has also implemented vector distance operators and indexes for approximate vector searches. Each operator actually corresponds to a built-in function, such as Euclidean distance, cosine distance, inner product distance, etc. During execution, the operator will be converted into the corresponding function call.
The vector search index is similar to the syntax for creating indexes in relational databases, with the main difference being that when creating a vector index, an Operator Family parameter is attached to specify the vector distance calculation formula used by the index. Different vector distance calculation methods will generate different index data structures, and other parameters are inherited from the corresponding algorithm parameters in Faiss.
After implementing the vector data type and index creation, the next step is to ensure that these data maintain consistency with the visibility of internal database transactions. Since Faiss only indexes vector data, PieCloudVector has transformed the internal storage format of Faiss, adding MVCC information and some metadata columns to the Faiss index.
Faiss OpenMP Threading Redevelopment
OpenMP is a multi-threading program design scheme for shared memory parallel systems, supporting programming languages including C, C++, and Fortran. OpenMP provides a high-level abstract description of parallel algorithms, which is particularly suitable for parallel program design on multi-core CPU machines. For example, by adding an OpenMP instruction at the upper level of a for loop, after compilation, the program will be executed in parallel by multiple threads.
The multi-threading performance of Faiss OpenMP is better when the number of CPU cores is low (below 32), but when the number of CPU cores increases to 128, a performance regression issue occurs, that is, the more cores are added, the slower the execution speed becomes. The reason for this problem is that most CPUs are not used on the code path of vector search, but are used to handle a large number of lock conflicts, mutual waiting, and global thread switching, thus wasting a lot of time.
To improve performance, PieCloudVector has redeveloped the entire multi-threading mechanism of Faiss:
After a series of modifications, PieCloudVector has solved the OpenMP thread bottleneck problem, avoiding a large number of invalid threads, greatly improving QPS, and significantly reducing memory usage. The freed-up memory can be used to store vector data, reducing system latency.
Performance Improvement After Threading Redevelopment
In addition, although Faiss supports GPU acceleration, there are thread safety issues when calling GPU search code with multi-threaded submissions. In response to this, PieCloudVector has made special code paths to avoid concurrent calls to the GPU, even when processing multi-connection queries, query requests will be submitted in batches in a single-threaded manner.
With its outstanding performance and wide applicability, PieCloudVector has been successfully applied in the field of LLMs in various industries, especially showing significant advantages in the field of finance.
In the application case of Soochow Securities, OpenPie created an AIGC application solution based on the vector database PieCloudVector for customers, using the Soochow Securities LLM Xiucai GPT, combined with the LangChain development framework and PieCloudVector to build an AIGC application platform. For more specific information, please refer to the relevant articles.
RAG Practice Cases Based on PieCloudVector
The entire application process can be divided into the following main steps:
By adding a knowledge base based on PieCloudVector, OpenPie helps Soochow Securities GPT to improve its ability to handle new problems, improve the efficiency and accuracy of retrieval work, and meet the needs of customers to build AI applications in various scenarios such as investment research analysis, quantitative trading, intelligent consulting, and sentiment analysis.
At present, PieCloudVector is still continuously optimizing and upgrading. The next step is to further explore the following aspects and is committed to combining with different application scenarios to provide customers with more powerful, comprehensive, and flexible solutions: