Practice of RAG Based on Cloud-Native Vector Database PieCloudVector

SEPTEMBER 14TH, 2024

In recent years, AI-Generated Content (AIGC) has become one of the hottest topics. The industry has seen a variety of generation tools that can produce diverse content across multiple modalities. These mainstream models have achieved outstanding performance, attributed to innovative algorithms, significant expansion of model scale, and massive high-quality datasets. However, AIGC still faces a series of challenges, and Retrieval-Augmented Generation (RAG) technology has been proposed as an important supplement to Large Language Models (LLMs). This article will demonstrate, with practical examples of RAG practices based on PieCloudVector.

AIGC takes the idea that content is generated by advanced generative models rather than human or rule-based methods. In recent years, AIGC technology has developed rapidly. Tasks that were once dependent on Long Short-Term Memory networks (LSTM) have shifted to models based on the Transformer, and image generation tasks have also evolved from Generative Adversarial Networks (GANs) to Latent Diffusion Models (LDMs).

The architecture of foundational models, initially composed of millions of parameters, has now expanded to billions or even trillions of parameters. These advancements are due to rich and high-quality datasets that provide ample training samples for the comprehensive optimization of model parameters.

Information retrieval is another core application in computer science, distinct from content generation, aiming to search relevant existing objects within vast data. Currently, efficient information retrieval systems can handle document collections reaching into the billions, and retrieval techniques are applied to various different scenarios.

Despite the significant progress in AIGC, there are still challenges such as keeping knowledge up-to-date, integrating long-tail knowledge, and preventing the leakage of private training data. To address these challenges, Retrieval-Augmented Generation (RAG) has been introduced. RAG, with its flexible data repository, can serve as a non-parametric memory that is easily modified, can widely integrate long-tail knowledge, and can securely encode sensitive data. Moreover, RAG can reduce generation costs, such as downsizing large generative models, supporting the generation of long texts, and simplifying generation processes.

What is RAG?

RAG is an important supplement to LLMs. It allows LLMs to access authoritative knowledge bases beyond their training data scope before generating responses, optimizing LLM outputs. This process does not require retraining the model, thus providing a cost-effective and flexible way to enhance LLM performance. By doing so, it "customizes" general-purpose LLMs to better adapt to specific business needs and use-case scenarios.

Without RAG, user inputs are directly passed to LLMs, which generate outputs based on their training data or known information. The introduction of RAG adds a crucial information retrieval component to this process. When receiving user inputs, RAG first uses the information retrieval component to extract relevant information based on the input content. This information is then provided to the LLM as contextual information, along with the user inputs.

LLM combines the provided contextual information with its training data to jointly influence the generation process. This method not only improves the relevance and accuracy of the output but also enhances the model's utilization of domain-specific knowledge. In other words, compared to model retraining and fine-tuning, RAG will show the following significant advantages:

Cost-effectiveness: Compared to traditional model retraining, RAG provides a more economical way to introduce new data. It avoids high hardware costs and computational resource consumption.

Real-time updates: RAG enables LLMs to connect with real-time data sources such as social media and news portal, ensuring that the model can provide users with results based on the latest information. This capability significantly enhances the timeliness and relevance of the output .

Enhanced credibility: With RAG, the outputs can include references to authoritative data sources, which not only improves the credibility of the results but also allows users to trace back to the original documents for verification. This transparency helps to build trust in generative AI.

Input control: RAG allows for precise control of model input information based on task requirements. This flexibility ensures the security of sensitive data and allows models to process data of varying sensitivities while protecting privacy.

What is PieCloudVector?

PieCloudVector, as one of the core computing engines of OpenPie's large model data computing system PieDataCS, represents the dimensional upgrade of analytical databases in the era of LLMs, specifically designed for multimodal AI applications. In addition to PieCloudVector, PieDataCS also supports two computing engines, PieCloudDB Database and PieCloudML.

PieCloudVector's technical approach combines mature open-source algorithm implementations with a relational database based on the PostgreSQL kernel, offering full ACID and supporting mixed queries of scalar and vector data. PieCloudVector supports mainstream Approximate Nearest Neighbor (ANN) algorithms and vector compression algorithms, supports SIMD/GPU acceleration, and is compatible with LangChain. Compared to traditional databases, PieCloudVector has achieved vectorized storage and elastic scaling of computing resources, improving usability and performance, enhancing metadata management, solving data consistency issues, and overcoming technical challenges in security, reliability, and availability.

In terms of architectural design, each Executor in PieCloudVector corresponds to a instance, achieving high performance, scalability and reliability of vector storage and similarity search services. With PieCloudVector, users can not only store and manage vectors but also use related tools for fuzzy searches. Compared to global searches, this sacrifices accuracy to achieve millisecond-level searches, further improving query efficiency.

Architectural Design of PieCloudVector

In the practice of RAG, PieCloudVector demonstrates an efficient method of combining user queries and relevant data to generate precise and authoritative responses. The following are the detailed steps of RAG workflow:

RAG Workflow

Creation of External Data Sources: First, identify and integrate new data located outside the LLM's original training dataset, known as external data. They may come from APIs, databases, or documents and in various formats, such as files, database records, text, as well as vectors. These external data are stored in PieCloudVector, retaining both the original content and the corresponding content embedding information.

Processing of User Input: For a user query, preprocess it before querying the external data source. This may include extracting the embedding of the user query to retrieve relevant contextual data through vector similarity search in the external data source.

Relevance Search: Once the user input is converted into an embedding vector, use these vectors to perform a relevance search in the external data source. PieCloudVector supports various efficient vector indexing, such as HNSW, IVFFLAT, IVFQD, etc., to accelerate this process.

Building Input Context: Use the data retrieved from the external data source that is similar to the user query to build the model's input context. For example, you can select the original content most similar top k pieces of data to construct the model's input context.

Model Input: Provide the original query and the retrieved relevant contextual information as input to the model.

Model Output: The model combines the user query and the retrieved contextual information to generate a response.

Next, we will demonstrate a complete RAG workflow using PieCloudVector to store external data and the Llama2, implemented based on LangChain.

Demonstration of RAG Based on PieCloudVector

Preparing External Data Sources and Models

The external data used in this example comes from a series of blog articles of OpenPie. These data have been organized and constructed into an internal dataset. Each record in this dataset contains only an independent English text, as shown below:

Openpie is dedicated to "Data Computing for New Discoveries" and has successfully completed three rounds of strategic financing .... 
 
OpenPie's flagship product, PieCloudDB realizes cutting-edge data warehouse virtualization technology .... 
 
With continuous innovation of artificial intelligence (AI) technology, we can observe its increasingly widespread applications ...

We have encapsulated PieCloudVector into an implementation class of VectorStore using the VectorStore interface provided by LangChain, to facilitate interaction with PieCloudVector. By using Langchain's API, we have preprocessed the external data, including text segmentation and extraction of embeddings. The processed data, including the original text data and the corresponding embedding data, are stored in PieCloudVector. At the same time, to improve the efficiency of vector similarity search, we have also created an HNSW index. The following is the core code to implement:

raw_doc_path = "./RAG-data/context-text" 
loader = DirectoryLoader(raw_doc_path) 
docs = loader.load() 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) 
doc_splits = text_splitter.split_documents(docs) 
model_name = "BAAI/bge-base-en" 
encode_kwargs = {'normalize_embeddings': True}  # set True to compute cosine similarity 
embedding_function = HuggingFaceBgeEmbeddings( 
    model_name=model_name, 
    model_kwargs={'device': 'cuda'}, 
    encode_kwargs=encode_kwargs 
) 
CONNECTION_STRING = "postgresql+psycopg2://openpie@xx.xx.xx.xx:5432/openpie" 
vectordb = PieCloudVector.from_documents( 
    documents=doc_splits,  # text data that you want to embed and store 
    embedding=embedding_function,  # used to convert the documents into embeddings 
    connection_string=CONNECTION_STRING, 
    collection_name="docs_v1" 
) 
vectordb.create_hnsw_index(dims=768, index_key="HNSW32", ef_construction=40, ef_search=16)

After the external data is written to PieCloudVector, each record consists of two important fields: embedding and document, as follows:

{
"embedding": [-0.0087991655,-0.027009273,0.0033726105,0.018299054,0549,0.045432627,-0.038479857,...], 
"document": "Openpie is dedicated to 'Data Computing for New Discoveries' and ... ", 
}

Using the transformers library from huggingface to load the Llama2 and construct a task pipeline:

MODEL_NAME = "NousResearch/Llama-2-7b-hf" 
tokenizer = AutoTokenizer.from_pretrained( 
    MODEL_NAME, 
    trust_remote_code=True, 
    use_fast=True, 
    add_eos_token=True, 
) 
model = AutoModelForCausalLM.from_pretrained( 
    MODEL_NAME, 
    use_safetensors=True, 
    trust_remote_code=True, 
    device_map='auto', 
    load_in_8bit=True, 
) 
pipe = pipeline( 
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_new_tokens=512, 
    temperature=0.7, 
    top_p=0.95, 
    repetition_penalty=1.15, 
) 
 
llm = HuggingFacePipeline(pipeline=pipe)

Inference

LangChain defines a Retriever interface that encapsulates the logic for retrieving similar documents for a given query. During the inference phase, the vectordb instance based on PieCloudVector is first converted into a Retriever object. For each query, this Retriever searches in PieCloudVector and returns the three most similar data. Then, construct a question-answering task chain with the external data source. Finally, execute the inference task with the input question.

retriever = vectordb.as_retriever(search_kwargs={"k": 3}) 
retrieval_qa_chain = RetrievalQA.from_chain_type( 
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    return_source_documents=True 
) 
 
query = "What is PieCloudVector? and any advantages of PieCloudVector?  please describe in short words" 
response = retrieval_qa_chain(query)

After using RAG, for the question:

"What is PieCloudVector? and any advantages of PieCloudVector? please describe in short words"

The input not only includes the question and necessary prompts but also the contextual information of the question retrieved from the external data source, as shown below:

{ 
"Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, 
 don't try to make up an answer.", 
 'PieCloudVector vector database has the capability to perform fast queries on trillion-scale vector databases.  
  It supports single-node multi-threaded index creation, effectively utilizing all available hardware computational resources. 
  This results in a five-fold improvement in index creation performance,  
  a six-fold improvement in retrieval performance, and a three-fold improvement in interactive response speed. 
  PieCloudVector, in conjunction with Soochow Securities Xiucai GPT,  
  forms the overall RAG architecture. PieCloudVector primarily stores the embedded vector data  
  while also supporting storage of scalar data for applications. Additionally,  ....',  
  'Question: What is PieCloudVector? and any advantages of PieCloudVector?  please describe in short words', 
}

Results

After using RAG, for the question:

"What is PieCloudVector? and any advantages of PieCloudVector? please describe in short words"

The output is as follows. It can be seen that the Llama2 can basically output a correct result based on the input contextual information.

'Helpful Answer: 
PieCloudVector is a distributed vector database developed by OpenPie.  
It offers high scalability, low latency, and efficient query processing,  
making it suitable for large-scale vector data analysis tasks such as  
recommendation systems, image recognition, and natural language processing. 
Some key features include support for multiple indexing methods (e.g., B+ tree, hash table),  
parallelized query execution, and fault tolerance through replication and redundancy techniques.  
Overall, PieCloudVector can help organizations process massive amounts of  
unstructured data quickly and efficiently, leading to  
improved decision-making and better customer experiences.'

Without using RAG and directly inputting the question to Llama2, the output is as follows:

Question: What is PieCloudVector? and any advantages of PieCloudVector?  please describe in short words. 
Answer: Comment: @user1095108 I've added a link to the documentation, which should answer your questions.

Since Llama2's training data lacks knowledge related to PieCloudVector, it can’t accurately answer the question, which actually reflects the importance and strength of RAG technology. By supplementing knowledge beyond the model's training data, RAG significantly enhances the model's ability to process specific queries and accuracy.

PieCloudVector, with its excellent performance and wide applicability, has been successfully implemented in various industries, especially in the field of financial large models, where it has shown significant advantages. In the future, OpenPie will continue to focus on technological development trends, explore and innovate continuously, and focus on more application scenarios of databases in multimodal AI systems.

Related Blogs:

no related blog