PieCloudVector: An In-Depth Look at Its Design and Implementation

AUGUST 02ND, 2024

In recent years, breakthroughs in large language models(LLMs) have led to an explosive growth in vector data related to natural language, and the attention given to vector databases designed to manage these vector data has been continuously increasing. The combination application of LLMs and vector databases has a wide range of applications in various fields, such as semantic search, recommendation systems, and chatbots. This article will explore the application trends of vector databases in LLMs scenarios and, combined with user cases, will provide a detailed introduction to the architecture design and technical implementation of OpenPie's vector database, PieCloudVector.

With the rapid development of AI technology, LLMs have become a key force in driving the cutting-edge development of AIGC. Leveraging the powerful data processing capabilities of LLMs, AIGC can automatically generate high-quality, personalized content, achieving functions such as intelligent Q&A, sentiment analysis, and text generation, and has been widely applied in various fields including finance, media, entertainment, and e-commerce. However, LLMs still face challenges and limitations such as data timeliness, privacy, long-term memory, and hallucinations, which greatly limit the application of LLMs in various scenarios.

Retrieval-Augmented Generation

To effectively address the limitations of LLMs, the current industry widely adopts the Retrieval-Augmented Generation(RAG) technology to enhance the performance of LLMs.

RAG is a technology that combines an information retrieval system with a generative LLM. Its core idea is to allow the LLM to dynamically retrieve relevant information from an external knowledge base when generating an answer, thereby improving the accuracy and reliability of the content generated by the model.

Transforming the Original Data into Embeddings

The RAG framework consists of three main processes: Retrieval, Augmentation, and Generation:

Retrieval: Convert the user's query into a vector through the Embedding model for comparison with the content stored in the retrieval system (vector data converted through the same Embedding model), and identify a set of data that best matches the query through similarity search (calculating the distance between vectors).

Augmentation: Embed the user's query content and the retrieved relevant content into a preset prompt template.

Generation: Input the prompt content that has been enhanced by retrieval into the LLMs to generate higher quality content.

Vector Databases: The Best Partner for LLMs

A vector database is a database system specifically designed for storing and processing multi-dimensional vector data. Through specific models, we can convert text, images, and voice into vectors for storage, and the key task of the vector database is to associate the vectors with the corresponding original data.

After storing the data, another task of the vector database is to establish an index for efficient search of vector data. Although it is possible to search without an index, for example, by using a full table scan to compare records one by one, it is also possible to find the closest vectors, but when the scale of the information library reaches a certain level, this method will become very inefficient, and establishing an index can greatly improve search performance.

Finally, a complete vector database also requires supporting call interfaces and ecological tools to meet users' needs in various aspects such as data type support, flexibility, and easy use of the interface. The supporting tool ecosystem is directly related to the user-friendliness and practicality of the vector database.

Design and Implementation of PieCloudVector

OpenPie's cloud-native vector database PieCloudVector further realizes the storage and efficient querying of massive vector data, has multi-dimensional analysis capabilities, can support and cooperate with the Embeddings of LLMs, and helps multi-modal AI applications, represents the dimensional upgrade of analytical databases in the era of LLMs.

The current vector database technology field is mainly divided into two schools:

Starting from vector search and indexing algorithms: Complement all functions related to databases, that is, proprietary vector databases specifically designed for vector search;

Starting from data storage solutions (relational/non-relational databases or other systems): Add-on vector search modules to achieve vector search and indexing algorithms, with good data universality, but performance is not as good as proprietary vector databases.

To integrate the strengths of the two technical directions, OpenPie's PieCloudVector integrates with Faiss on the basis of relational database, ensuring good data universality while performance also reaches the forefront of the industry.

Developing the Database Kernel Based on PostgreSQL

PieCloudVector is built on the PostgreSQL kernel. This approach allows PieCloudVector to inherit all the advantages of relational databases, including:

Support for single-machine or distributed deployment
Support for ACID
Support for vector search with SQL
Support for mixed vector-scalar queries

Drawing on its extensive experience and technical prowess in the eMPP (elastic MPP) and distributed systems domain, OpenPie has crafted the distributed architecture for its cloud-native virtual data warehouse, PieCloudDB Database, and applied these learnings to develop PieCloudVector.

The following shows the distributed architecture of PieCloudVector, including a Coordinator node and multiple Executor nodes. The PieCloudVector execution engine supports CPU for search and acceleration, as well as GPU acceleration. Data can be stored on local disks or cloud storage (S3). As we all know, although cloud storage has advantages in storage costs, its access performance may not be as good as local disks. However, the index of PieCloudVector is stored in memory, which can greatly avoid the impact of storage on system performance.

PieCloudVector Distributed Architecture

Developing a Vector Search Engine integrated with Faiss

PieCloudVector has built a vector search engine integrated with the open-source algorithm library: Faiss, which is also the choice of many mainstream vector databases on the market. Because there are many types of vector search algorithms, and each algorithm includes details that are also very complex. The cost of developing algorithms from scratch is too high in terms of manpower and time, while the advantage of Faiss is that it covers mainstream vector search algorithms and adjustable parameters. It is currently the most popular and efficient algorithm library, which not only ensures performance but also saves a lot of development costs.

Vector Search Algorithms

In vector search, the K-Nearest Neighbor (KNN) algorithm is one of the most basic problems. By calculating the distance between the query vector and each sample in the dataset, a set of K vectors that are closest in distance are found to provide accurate search results. However, as the amount of data increases, the computational cost of this method will also increase exponentially, resulting in a long search time for large high-dimensional datasets.

To alleviate this situation, PieCloudVector has introduced the Approximate Nearest Neighbor (ANN) algorithm, which makes a trade-off between performance, recall rate, and memory. It sacrifices some accuracy to accelerate query speed and further improve query efficiency.

PieCloudVector supports mainstream ANN algorithms, including the most popular IVF and HNSW algorithms.

IVF (Inverted File): A vector index algorithm based on inverted files.

HNSW (Hierarchical Navigable Small World): A graph-based vector search algorithm.

PieCloudVector Supports Mainstream ANN Algorithms

Vector Compression

In practical scenarios, to meet performance requirements, almost all data needs to be loaded into memory for access. When the scale of vector data reaches a certain level, especially high-dimensional data, direct storage will put a lot of pressure on memory. Therefore, PieCloudVector has adopted the Product Quantization to compress vector data, which is also one of the most popular compression algorithms currently.

The theory of Product Quantization is to decompose the original high-dimensional vector into several low-dimensional sub-vectors, and quantize each sub-vector independently. In this way, each original vector can be represented by a combination of quantization codes in multiple low-dimensional spaces, saving a lot of memory space and accelerating the retrieval speed of the entire system.

Product Quantization

In addition, the Faiss algorithm library also has many other features, including:

Support for binary indexing
Support for multi-level indexing
Support for multi-core CPU parallelism and GPU acceleration
...

Integration of Faiss with the PostgreSQL Kernel

After selecting the Faiss algorithm library as the vector search engine, the next step is to implement the integration of Faiss with the PostgreSQL kernel.

During the docking process, PieCloudVector has added a new vector column type for basic loading and unloading. The number of vector dimensions is determined according to the actual Embedding model, and other data operations (CREATE, INSERT, COPY, etc.) are consistent with traditional relational databases, supporting standard SQL syntax, which can greatly reduce the learning cost for users.

In addition to adding new data types, PieCloudVector has also implemented vector distance operators and indexes for approximate vector searches. Each operator actually corresponds to a built-in function, such as Euclidean distance, cosine distance, inner product distance, etc. During execution, the operator will be converted into the corresponding function call.

The vector search index is similar to the syntax for creating indexes in relational databases, with the main difference being that when creating a vector index, an Operator Family parameter is attached to specify the vector distance calculation formula used by the index. Different vector distance calculation methods will generate different index data structures, and other parameters are inherited from the corresponding algorithm parameters in Faiss.

After implementing the vector data type and index creation, the next step is to ensure that these data maintain consistency with the visibility of internal database transactions. Since Faiss only indexes vector data, PieCloudVector has transformed the internal storage format of Faiss, adding MVCC information and some metadata columns to the Faiss index.

Faiss OpenMP Threading Redevelopment

OpenMP is a multi-threading program design scheme for shared memory parallel systems, supporting programming languages including C, C++, and Fortran. OpenMP provides a high-level abstract description of parallel algorithms, which is particularly suitable for parallel program design on multi-core CPU machines. For example, by adding an OpenMP instruction at the upper level of a for loop, after compilation, the program will be executed in parallel by multiple threads.

The multi-threading performance of Faiss OpenMP is better when the number of CPU cores is low (below 32), but when the number of CPU cores increases to 128, a performance regression issue occurs, that is, the more cores are added, the slower the execution speed becomes. The reason for this problem is that most CPUs are not used on the code path of vector search, but are used to handle a large number of lock conflicts, mutual waiting, and global thread switching, thus wasting a lot of time.

To improve performance, PieCloudVector has redeveloped the entire multi-threading mechanism of Faiss:

Develop a set of thread libraries to control the number of threads in the global thread pool;

Use the LLVM (Low-Level Virtual Machine) toolchain to parse the entire source code of Faiss, find all OpenMP instruction, and replace these instructions with custom thread pool code;

For parallel code blocks containing reduction, summarize the results calculated by each thread and replace the shared variables in the code block.

After a series of modifications, PieCloudVector has solved the OpenMP thread bottleneck problem, avoiding a large number of invalid threads, greatly improving QPS, and significantly reducing memory usage. The freed-up memory can be used to store vector data, reducing system latency.

Performance Improvement After Threading Redevelopment

In addition, although Faiss supports GPU acceleration, there are thread safety issues when calling GPU search code with multi-threaded submissions. In response to this, PieCloudVector has made special code paths to avoid concurrent calls to the GPU, even when processing multi-connection queries, query requests will be submitted in batches in a single-threaded manner.

RAG Practice Cases Based on PieCloudVector

With its outstanding performance and wide applicability, PieCloudVector has been successfully applied in the field of LLMs in various industries, especially showing significant advantages in the field of finance.

In the application case of Soochow Securities, OpenPie created an AIGC application solution based on the vector database PieCloudVector for customers, using the Soochow Securities LLM Xiucai GPT, combined with the LangChain development framework and PieCloudVector to build an AIGC application platform. For more specific information, please refer to the relevant articles.

RAG Practice Cases Based on PieCloudVector

The entire application process can be divided into the following main steps:

Convert the original data documents of Soochow Securities into vectors through the Embedding model and store them in PieCloudVector, and these data will be updated regularly to ensure timeliness;

Then, convert the user input into vectors through the same model, and call PieCloudVector through the LangChain architecture to search for a set of documents related to the user input;

Combine the information with the user input into a new prompt and send it to the Soochow Securities GPT to generate higher quality answers for the user.

By adding a knowledge base based on PieCloudVector, OpenPie helps Soochow Securities GPT to improve its ability to handle new problems, improve the efficiency and accuracy of retrieval work, and meet the needs of customers to build AI applications in various scenarios such as investment research analysis, quantitative trading, intelligent consulting, and sentiment analysis.

Looking Forward to the Future

At present, PieCloudVector is still continuously optimizing and upgrading. The next step is to further explore the following aspects and is committed to combining with different application scenarios to provide customers with more powerful, comprehensive, and flexible solutions:

Non-intrusive thread optimization: At this stage, the redevelopment of the Faiss thread pool in PieCloudVector is an intrusive code change, which is not very friendly for subsequent updates and maintenance. OpenPie is exploring non-intrusive thread pool optimization modes.

Hybrid query index: For scenarios of mixed vector and scalar queries, if both sides of the data have indexes, only one index is usually called during execution. In the future, a hybrid index structure will be considered to accelerate queries.

GraphRAG: On the basis of the traditional vector search-based RAG model, further implement a retrieval-augmented generation solution based on the construction of knowledge graphs.

...

Related Blogs:

no related blog