Agent
The Agent is the core building block in Ailoy for developing agentic AI applications. It serves as the primary interface for interacting with large language models (LLMs), allowing users to send queries, receive streamed responses, and integrate multi-modal inputs and external tools seamlessly.
Defining Agents
Before you can use an Agent, you must first initialize a Runtime
, which is
responsible for managing low-level execution, model inference, and resource
coordination. This setup step is required regardless of whether you're using
local or API-based models.
- Python
- JavaScript(Node)
from ailoy import Runtime
rt = Runtime()
import { startRuntime } from "ailoy-node";
const rt = await startRuntime();
Once the runtime is initialized, you can create an Agent
instance by
specifying the model you wish to use. In the example below, we instantiate an
agent using a local model Qwen/Qwen3-0.6B
. For a comprehensive list of models
and usage examples, see the Agent Models section.
- Python
- JavaScript(Node)
from ailoy import Agent, LocalModel
agent = Agent(rt, LocalModel("Qwen/Qwen3-0.6B"))
import { defineAgent, LocalModel } from "ailoy-node";
const agent = await defineAgent(rt, LocalModel({ id: "Qwen/Qwen3-0.6B" }));
System Messages
Agents can be initialized with a system message, which acts as an initial instruction to guide the assistant's behavior throughout the session. This message sets the tone, persona, or rules for the AI's responses.
- Python
- JavaScript(Node)
agent = Agent(
rt,
LocalModel("Qwen/Qwen3-0.6B"),
system_message="You are a friendly chatbot who always responds in the style of a pirate.",
)
const agent = await defineAgent(rt, LocalModel({ id: "Qwen/Qwen3-0.6B" }), {
systemMessage:
"You are a friendly chatbot who always responds in the style of a pirate.",
});
Cleaning Up Agents
To release the resources used by an agent, you should call .delete()
when the
agent is no longer needed.
- Python
- JavaScript(Node)
agent.delete()
await agent.delete();
In Python, a more robust and idiomatic approach is to use the agent as a context manager. This ensures automatic cleanup when the context exits.
- Python
with Agent(rt, LocalModel("Qwen/Qwen3-0.6B")) as agent:
...
Agent Models
The models used by agents can be either local models (running on your machine) or API-based models (hosted by external providers like OpenAI, Google, Anthropic or xAI). The setup differs slightly depending on the model type.
Local Models
Local models run entirely on your local machine, offering greater control over performance and privacy. You can define an agent with a local model as shown below:
- Python
- JavaScript(Node)
from ailoy import Runtime, Agent, LocalModel
rt = Runtime()
agent = Agent(rt, LocalModel("Qwen/Qwen3-0.6B"))
import { startRuntime, defineAgent, LocalModel } from "ailoy-node";
const rt = await startRuntime();
const agent = await defineAgent(rt, LocalModel({ id: "Qwen/Qwen3-0.6B" }));
Supported local models include:
- Qwen/Qwen3-0.6B
- Qwen/Qwen3-1.7B
- Qwen/Qwen3-4B
- Qwen/Qwen3-8B
- Qwen/Qwen3-14B
- Qwen/Qwen3-32B
- Qwen/Qwen3-30B-A3B
Ensure that the model is compatible with your hardware. For system requirements and setup instructions, refer to the Devices & Environments page.
API Models
API models are accessed via third-party services. You’ll need an API key from the respective provider. Here’s how to define an agent using API models:
- Python
- JavaScript(Node)
from ailoy import Runtime, Agent, APIModel
rt = Runtime()
# Use OpenAI
agent = Agent(rt, APIModel("gpt-4o", api_key="<OPENAI_API_KEY>"))
# Use Gemini
agent = Agent(rt, APIModel("gemini-2.5-flash", api_key="<GEMINI_API_KEY>"))
# Use Claude
agent = Agent(rt, APIModel("claude-sonnet-4-20250514", api_key="<CLAUDE_API_KEY>"))
# Use Grok
agent = Agent(rt, APIModel("grok-4", api_key="<XAI_API_KEY>"))
import { startRuntime, defineAgent, APIModel } from "ailoy-node";
const rt = await startRuntime();
// Use OpenAI
const agent = await defineAgent(
rt,
APIModel({ id: "gpt-4o", apiKey: "<OPENAI_API_KEY>" })
);
// Use Gemini
const agent = await defineAgent(
rt,
APIModel({ id: "gemini-2.5-flash", apiKey: "<GEMINI_API_KEY>" })
);
// Use Claude
const agent = await defineAgent(
rt,
APIModel({ id: "claude-sonnet-4-20250514", apiKey: "<CLAUDE_API_KEY>" })
);
// Use Grok
const agent = await defineAgent(
rt,
APIModel({ id: "grok-4", apiKey: "<XAI_API_KEY>" })
);
Supported API model providers and model IDs:
OpenAI
o4-mini
o3
o3-pro
o3-mini
gpt-4o
gpt-4o-mini
gpt-4.1
gpt-4.1-mini
gpt-4.1-nano
Gemini
gemini-2.5-flash
gemini-2.5-pro
gemini-2.0-flash
gemini-1.5-flash
gemini-1.5-pro
Claude
claude-sonnet-4-20250514
claude-3-7-sonnet-20250219
claude-3-5-sonnet-20241022
claude-3-5-sonnet-20240620
claude-opus-4-20250514
claude-3-opus-20240229
claude-3-5-haiku-20241022
claude-3-haiku-20240307
Grok
grok-4
grok-4-0709
grok-3
grok-3-fast
grok-3-mini
grok-3-mini-fast
grok-2
grok-2-1212
grok-2-vision-1212
grok-2-image-1212
To use a model not listed in these presets, you must explicitly set the
provider
:
- Python
- JavaScript(Node)
agent = Agent(
rt,
APIModel(
"gpt-4o-audio-preview",
provider="openai",
api_key="<OPENAI_API_KEY>"
)
)
const agent = await ai.defineAgent(
rt,
ai.APIModel({
id: "gpt-4o-audio-preview",
provider: "openai",
apiKey: "<OPENAI_API_KEY>",
})
);
Agent Queries
Agents can be queried with natural language prompts and optionally with multimodal inputs like images and audio. Responses are streamed in real-time for an interactive experience.
Single Prompt
The simplest form is sending a single text string:
- Python
- JavaScript(Node)
for resp in agent.query("Please give me a short poem about AI"):
agent.print(resp)
for await (const resp of agent.query("Please give me a short poem about AI")) {
agent.print(resp);
}
Multi-Modal Inputs
Multi-modal queries combine text with images or audio files for richer input.
Images Inputs
Images can be passed via URLs or loaded directly using libraries like Pillow (Python) or Sharp (Node.js).
- Python
- JavaScript(Node)
from ailoy import ImageContent
# Image from public URL
for resp in agent.query([
"What is in this image?",
ImageContent.from_url("https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"),
]):
agent.print(resp)
# Image from Pillow
from PIL import Image
image = Image.open("path/to/image.png")
for resp in agent.query([
"What is in this image?",
# You can provide image as-is, or via ImageContent.from_pillow()
image,
# ImageContent.from_pillow(image),
]):
agent.print(resp)
import { ImageContent } from "ailoy-node";
import sharp from "sharp";
// Image from public URL
for await (const resp of agent.query([
"What is in this image?",
ImageContent.fromUrl(
"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
),
])) {
agent.print(resp);
}
// Image from Sharp
const image = sharp("path/to/image.png");
for await (const resp of agent.query([
"What is in this image?",
// You can provide image as-is, or via ImageContent.fromSharp()
image,
// await ImageContent.fromSharp(image),
])) {
agent.print(resp);
}
Audios
Audio files can be passed using byte streams with format metadata:
- Python
- JavaScript(Node)
from ailoy import AudioContent
with open("path/to/audio.wav", "rb") as f:
data = f.read()
for resp in agent.query([
"What's in these recording?",
AudioContent.from_bytes(data=data, format="wav"),
]):
agent.print(resp)
import { AudioContent } from "ailoy-node";
import fs from "node:fs";
const buffer = fs.readFileSync("path/to/audio.wav");
for await (const resp of agent.query([
"What's in these recording?",
await AudioContent.fromBytes(buffer, "wav"),
])) {
agent.print(resp);
}
Multi-modal support is available only for API models. Capabilities vary by provider:
Model provider | Image (base64) | Image (public URL) | Audio (base64) |
---|---|---|---|
OpenAI | ✅ | ✅ | ✅ |
Gemini | ✅ | ❌ | ✅ |
Claude | ✅ | ❌ | ❌ |
Grok | ✅ | ✅ | ❌ |
Reasoning
Some models support step-by-step reasoning for complex tasks. Enable this
with the reasoning
flag:
- Python
- JavaScript(Node)
for resp in agent.query(
"Please solve me a simultaneous equation: x+y=3, 4x+3y=12",
reasoning=True
):
agent.print(resp)
for await (const resp of agent.query(
"Please solve me a simultaneous equation: x+y=3, 4x+3y=12",
{ reasoning: true }
)) {
agent.print(resp);
}
Agent Responses
Agent responses are the streamed output of the agent runs. Since Ailoy is designed to stream output on-the-fly, each part of the response can be treated as a real-time output.
Basically, an agent response has the following structure:
{
type: One of ["output_text" | "tool_call" | "tool_call_result" | "reasoning" | "error"]
role: One of ["assistant" | "tool"];
is_type_switched: boolean
content: Depends on type;
}
- The
type
field indicates what kind of output the agent is currently producing. Depending on the type, the structure of the response may vary. - The
role
field specifies who is speaking—either the Assistant (LLM model) or a Tool. - The
is_type_switched
flag indicates whether this response is the first message of a new type. You can use this flag to detect when a new type of message has arrived and trigger actions in your application, such as creating a new message box. See our Gradio chatbot example for a detailed use case.
Here are the descriptions of each response type:
- output_text: This is the main textual output from the assistant. The
content
field contains a string with the generated text. - tool_call: A message indicating that the assistant is requesting a tool to
be invoked. Within the agent system, tools automatically receive this call and
are expected to return a corresponding
tool_call_result
. Thecontent
contains a JSON-compatible dictionary describing the tool call. - tool_call_result: The result returned by the tool in response to a
tool_call
. The assistant receives this result and uses it to produce a final response to the user. Thecontent
contains a JSON-compatible dictionary with the tool's output. - reasoning: Intermediate reasoning steps produced by a reasoning-enabled
model. The
content
field contains a string with the generated reasoning. If theignore_reasoning_messages
flag is enabled, these messages are omitted from the output. - error: Indicates that an error has occurred.
content
field contains the reason of the error. After error raised, no more responses will be generated.
Handling Messages
Agent automatically track conversation history. You can access or clear this internal message list as needed.
- Python
- JavaScript(Node)
# Get the list of messages
messages = agent.get_messages()
print(messages)
# Clear messages
agent.clear_messages()
// Get the list of messages
const messages = agent.getMessages();
console.log(messages);
// Clear messages
agent.clearMessages();
Using Tools
Agents can be extended with custom tools, allowing them to perform tasks like database access, API requests, or file manipulation. For more details on tool creation and usage, see the Tools page