Skip to main content

RAG with Vector Store

Retrieval-Augmented Generation (RAG) is a useful method when you want to use AI with your own documents. In RAG, the AI model gets extra knowledge from outside sources, usually stored in something called a vector store. Instead of depending only on what the model learned during training, RAG finds and adds related documents at the time you ask a question. This helps the AI give more accurate, up-to-date, and relevant answers.

In this example, we’ll walk you through a complete RAG workflow — how to build a vector store(VectorStore) and integrate it to the Agent.

Initializing a Vector Store

Ailoy simplifies the construction of RAG pipelines through its built-in VectorStore component, which works alongside the Agent.

To initialize a vector store:

from ailoy import Runtime
from ailoy.vectorstore import VectorStore

rt = Runtime()
with VectorStore(rt, "BAAI/bge-m3", "faiss") as vs:
...

Ailoy currently supports both FAISS and ChromaDB as vector store backends. Refer to the official configuration guide for backend-specific options.

💡 Note: At this time, the only supported embedding model is BAAI/bge-m3. Additional embedding models will be supported in future releases.

Inserting Documents into the Vector Store

You can insert text along with optional metadata into the vector store:

vs.insert(
"Ailoy is a lightweight library for building AI applications",
metadata={"topic": "Ailoy"}
)

In practice, you should split large documents into smaller chunks before inserting them. This improves retrieval quality. You may use any text-splitting tool (e.g., LangChain), or utilize Ailoy’s low-level runtime API for text splitting. (See Calling Low-Level APIs for more details.)

Retrieving Relevant Documents

To retrieve documents similar to a given query:

query = "What is Ailoy?"
items = vs.retrieve(query, top_k=5)

This returns a list of VectorStoreRetrieveItem instances representing the most relevant chunks, ranked by similarity. The number of results is controlled via the top_k parameter (default is 5).

Constructing an Augmented Prompt

Once documents are retrieved, you can construct a context-enriched prompt as follows:

prompt = f"""
Based on the provided contexts, try to answer user's question.
Context: {[item.document for item in items]}
Question: {query}
"""

You can then pass this prompt to the agent for inference:

for resp in agent.query(prompt):
agent.print(resp)

Complete Example

from ailoy import Runtime, Agent, VectorStore

# Initialize Runtime

rt = Runtime()

# Initialize Agent and VectorStore

with Agent(rt, "Qwen/Qwen3-8B") as agent, VectorStore(rt, "BAAI/bge-m3", "faiss") as vs:
# Insert items
vs.insert(
"Ailoy is a lightweight library for building AI applications",
metadata={"topic": "Ailoy"}
)

# Search the most relevant items
query = "What is Ailoy?"
items = vs.retrieve(query, top_k=5)

# Augment user query
prompt = f"""
Based on the provided contexts, try to answer user's question.
Context: {[item.document for item in items]}
Question: {query}
"""

# Invoke agent
for resp in agent.query(prompt):
agent.print(resp)
note

For best results, ensure your documents are chunked semantically (e.g., by paragraphs or sections).