Today, the application of Large Language Models(LLMs) is changing various industries at an unprecedented rate. From natural language processing and computer vision to solutions for multimodal tasks, AI technology has become the core force driving business innovation.
However, the training and inference of LLMs require handling a vast amount of high-dimensional vector data, which traditional databases often struggle to manage effectively. To address this challenge, vector databases have emerged. This article will introduce how PieCloudVector leverages its unique cloud-native architecture and powerful vector processing capabilities to help AI reach their full potential.
Hierarchical Structure of China's AGI Market
China's AGI market is divided into four layers: infrastructure, model, middleware, and application layers. These layers together form the technical framework of China's AGI market.
Hierarchical Structure of China's AGI Market
AI Agents Driving Rapid AI Development
Recently, a hot topic in the AI field is the "AI Agent," which is gradually becoming the core path of exploration. A single LLM can only generate text or images, and its application areas are relatively limited. The purpose of an Agent is to enable LLMs to complete tasks independently that would normally require human intervention, by interacting with the surrounding environment, using accessible data, calling interfaces, and utilizing various auxiliary tools. The development direction of Agents is to delve into vertical industries, improving implementation effects through clear and precise task definitions.
Currently, Agent applications are quite common. For example, voice assistants on smartphones are a specific manifestation of an Agent.
PieCloudVector, OpenPie's cloud-native vector database further realizes the storage and efficient querying of massive vector data, has multi-dimensional analysis capabilities, can support and cooperate with the Embeddings of LLMs, and helps multi-modal AI applications, represents the dimensional upgrade of analytical databases in the era of LLMs.
The Vector Databases in the Era of LLMs
LLMs are trained using vast corpora, but these corpora have a cutoff date and cannot answer questions about current events. For instance, if you ask a trained LLM, "How many gold medals has the Chinese team won at the Paris Olympics so far?" it will inevitably be unable to answer. In such cases, it is necessary to provide the LLM with external context to understand real-time information and better answer such questions.
Secondly, the corpora for training LLMs are generally obtained from public channels and do not have access to private domain data. Therefore, the trained LLMs do not possess specialized knowledge in a particular field and cannot answer related questions. For enterprises building knowledge base system, data security is an issue that cannot be ignored. Data should not be exposed on the public internet indiscriminately; thus, a platform that provides data within a private domain is needed.
Lastly, trained LLMs are static and do not include long-term memory. For example, in chatbot scenarios, the model does not remember previous conversation histories or content when users start a new conversation with it.
Limitations of LLMs
Vector databases help LLMs better meet enterprise needs by addressing the limitations mentioned above. Through RAG technology, the latest information is stored in vector databases, providing external knowledge to the model through persistent storage, enhancing the model's accuracy and usability in specific scenarios. Additionally, fine-tuning the model directly can equip LLMs with the latest information, but at a higher cost than using a vector database.
Core Competencies Required for Vector Computing Engines
Vector data is typically derived from text, voice, and images through embeddings. For vector databases, the challenge lies in how to quickly and accurately retrieve massive amounts of vector data. Therefore, vector databases must employ advanced technologies and algorithms.
Core Capabilities of Vector Computing Engine
In the vast sea of vector data, finding the k closest vectors to a query vector quickly and accurately is a key issue.(KNN) To this end, vector databases often use improved data structures (such as R-trees or M-trees) that can more effectively organize and store high-dimensional data, thereby enhancing retrieval efficiency.
Additionally, vector engines can employ approximate search algorithms (ANN), which, by sacrificing accuracy, can significantly improve search efficiency. Common algorithms include IVF, HNSW, and others. Currently, no universal algorithm can achieve optimal performance (recall/qps/memory) on any dataset; trade-offs are usually necessary to achieve an overall balance.
Lastly, calculating the distance between vectors is a core operation in vector retrieval, similar to the inference or training process of LLMs, which involves repeated same calculations. These operations are well-suited for acceleration using GPUs or FPGAs, as relying solely on CPUs is less efficient. Therefore, hardware acceleration is an indispensable capability for vector computing engines.
Cloud-Native Vector Database PieCloudVector
PieCloudVector is built on the PostgreSQL kernel, supporting both standalone and distributed deployments, with full ACID compliance. PieCloudVector supports mixed vector-scalar queries, and is compatible with mainstream LLM application frameworks such as LangChain and LlamaIndex, providing SQL/REST/Python interfaces.
Architecture of PieCloudVector
The overall architecture of PieCloudVector consists of four main components: control services, metadata services, computing nodes, and storage foundations. The specific structure is shown in the figure below:
Architecture of PieCloudVector
PieCloudVector's computing nodes can be elastically scaled and feature a dedicated metadata storage service. They can access local file systems or other storage like S3 or data lakes through the storage foundation. The computing nodes are divided into coordinators and executors by default. Coordinators are responsible for receiving SQL queries, parsing and optimizing, and finally assigning tasks to executors, who then carry out the actual vector search operations.
PieCloudVector supports mainstream ANN algorithms, including IVF, HNSW, as well as ScaNN and DiskANN, and supports accelerated computing using GPUs, significantly enhancing performance. The entire set of computing node services can be deployed directly on bare metal, or on platforms with a management layer, or in public cloud environments.
PieCloudVector's control services serve as the interface for direct interaction with users, offering control panels and various API. Within the control services, users can manage all vector indexes, monitor the entire cluster, back up between different clusters, manage user permissions, and gain data insights.
PieCloudVector RAG Workflow
PieCloudVector supports RAG-related application capabilities. RAG is a technology that combines retrieval models (usually vector databases) with generative models, with the core idea of using information from private data sources to assist models in generating more accurate content.
The following outlines the process of building a knowledge base application based on RAG technology with PieCloudVector:
For application users, when a user asks a question, the application converts the question into a vector using the same Embedding model and searches the vector database to find all knowledge chunks related to the target vector. It then combines the information from these knowledge chunks with the original question to form new prompts for the LLM, ultimately getting higher-quality answers.
RAG Workflow
Although vector search is very useful in RAG, it also has its limitations. For example, knowledge base content obtained through vector search can also lead to "hallucinations" in the results, outputting content that is irrelevant to the question. This phenomenon may be related to factors such as the choice of Embedding model, chunking size, etc. For instance, if important information is split into two chunks during chunking, or if the actual semantics are lost during context integration, or if the results obtained from vector search cannot be well combined with the prompts, these factors will affect the quality of the final generated answers. This is why PieCloudVector is beginning to explore the next-generation GraphRAG architecture.
PieCloudVector Development Trends: Next-Generation GraphRAG Architecture
To overcome the limitations of vector RAG, PieCloudVector plans to combine the advantages of vector search and graph search in the near future, adopting the GraphRAG architecture. The overall process is as follows:
For users, they prefer enterprises to build one-stop AIGC applications based on the private domain knowledge and corpora provided by users. To meet these needs, OpenPie has proposed the Large Model Data Computing System PieDataCS, which supports three computing engines, including the vector database PieCloudVector, the cloud-native virtual data warehouse PieCloudDB Database, and the large model machine learning PieCloudML.
PieDataCS integrates the entire model lifecycle management, including all operations such as model creation, training, quality testing, fine-tuning, deployment, and inference, while also integrating various computing engines and frameworks, providing users with a complete solution.
PieDataCS Integrates the Entire Model Lifecycle Management
In AIGC applications, the capabilities of models and frameworks account for half each. The process of building AIGC applications with PieDataCS can be divided into two steps:
For more information about PieDataCS, please refer to related articles.
OpenPie's cloud-native vector database PieCloudVector, with its outstanding performance and wide applicability, has been successfully implemented in various industry fields.
Financial AIGC Application Practice
In the practice of a financial customer's AIGC application, the entire application is built based on the PieCloudVector and the LangChain framework. By converting collected laws, regulations, policy documents, reports, and various research materials into vectors and storing them in PieCloudVector, a powerful RAG framework is constructed. Ultimately, the front-end application can provide users with complex tasks such as investment research analysis, quantitative analysis, and sentiment analysis, and has also realized the functionality of a Q&A robot.
For a specific case introduction, please refer to related articles.
Financial Industry Cases
Multimodal Data Analysis Course
In another case, OpenPie collaborated with a university to create a multimodal data analysis course. The entire data analysis process first uses an Embedding model to convert various texts, voices, images, and other data into vectors. Then, through the Python SDK provided externally, these vector data are input into PieCloudVector together. Leveraging PieCloudVector's powerful vector data processing and analysis capabilities, functions such as intelligent recommendation and document retrieval are ultimately achieved.
Multimodal Data Analysis