Skip to main content

Architecture

Ailoy is built on a modular architecture that separates AI inference, external tools, and knowledge retrieval, yet seamlessly composes them through Agent.

At a high level, Ailoy consists of the following core components:

ComponentDescription
AgentOrchestrates high-level agent behavior
LangModelServes as the reasoning and text generation engine
ToolProvides an extension interface for invoking external capabilities
KnowledgeManages retrieval and augmentation with external information

Agent

Agent is the top-level component. It is responsive to answer user's query, by controlling execution of its sub-components.

  • Generate text using the LangModel
  • Invoke external functions through Tool
  • Retrieve context from Knowledge

Agent exposes the run() function. It streams intermediate results or reasoning traces back to the user.

LangModel

The LangModel is the AI inference engine of Ailoy. It performs the actual reasoning and text generation that power the agent’s intelligence. Given an input sequence of messages, it interprets the context, predicts the next tokens, and generates coherent responses.

Ailoy supports two types of models:

  • API Models — cloud-based models
  • Local Models — models that run entirely on your device

The LangModel defines how messages are tokenized, formatted, and executed. It operates independently from the Agent, so you can use it directly when tools or knowledge modules are not required.

Tool

The Tool module enables the agent to perform operations beyond what LangModel can do. It allows the model to interact with external systems, execute logic, or query real-world data.

A tool consists of two main parts: tool description and tool behavior.

A tool description defines how the tool is exposed to the model. It includes the tool’s name, description, parameters, and an optional return schema. This format follows the JSON Schema convention.

Conventionally, this is represented in JSON Schema, which we also follow.

Schema Example

{
"name": "temperature",
"description": "Get current temperature",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city name"
},
"unit": {
"type": "string",
"enum": ["Celsius", "Fahrenheit"]
}
},
"required": ["location", "unit"]
},
"returns": {
"type": "number",
"description": "Null if the given city name is unavailable.",
"nullable": true
}
}

In contrast, a tool behavior defines what the tool actually does when invoked. For example, it can be a custom function in Python, JavaScript, or Rust. When the model outputs a structured function call matching the tool’s schema, the Agent automatically routes it to the corresponding behavior.

You can also use the Model Context Protocol (MCP) to register and expose tools dynamically. This provides a seamless way to define, describe, and connect tool behaviors across different environments.

Ailoy can define 3 types of the tools:

typesdescription
FunctionA native function in the host language, with explicit ToolDesc
MCPA remote tool exposed via the Model Context Protocol (MCP)
BuiltinPredefined tools provided by the Ailoy runtime itself

Knowledge

The Knowledge module enhances the agent’s reasoning ability by providing factual or contextual data retrieved from external sources such as vector stores, databases, or document collections.

When a query is made, the Knowledge module performs retrieval and returns an array of documents relevant to the user’s input. These documents are then integrated into the reasoning flow, allowing the model to produce more grounded and accurate responses. This process is known as Retrieval-Augmented Generation (RAG).

Knowledge combines two core components:

  • EmbeddingModel: Converts texts into numerical embeddings that represent semantic meaning.
  • VectorStore: Stores, searches, and manages embeddings for efficient similarity retrieval.

Together, these components allow the agent to search semantically similar documents rather than relying on keyword matches, enabling more intelligent context enrichment.

Integration with LangModel

Ailoy allows the Knowledge module to be attached to a LangModel in two integration modes:

1. Native

In native mode, the LangModel itself is expected to support external document input. The retrieved documents are directly attached to the model’s input, allowing the model to internally process them as part of its reasoning.

However, many models currently do not support native document input formats. To address this, Ailoy provides a polyfill mechanism. It is an internal augmentation layer that modifies the model’s chat template so it can recognize and interpret documents correctly. This polyfill enables RAG behavior even on models lacking explicit document-handling capabilities.

2. Tool

In tool mode, the Knowledge module is registered as a callable tool (knowledge tool). When the LangModel determines that external context is needed, it issues a function call to the knowledge tool. The tool executes the retrieval process and returns relevant documents, which the agent then feeds back into the reasoning flow.

This design allows the same retrieval process to be invoked dynamically through the model’s reasoning, even for models that cannot handle direct document input.