Agent

The Agent is the core building block in Ailoy for developing agentic AI applications. It serves as the primary interface for interacting with large language models (LLMs), allowing users to send queries, receive streamed responses, and integrate multi-modal inputs and external tools seamlessly.

Defining Agents

Before you can use an Agent, you must first initialize a Runtime, which is responsible for managing low-level execution, model inference, and resource coordination. This setup step is required regardless of whether you're using local or API-based models.

Python
JavaScript

from ailoy import Runtime

rt = Runtime()

import { startRuntime } from "ailoy-node";

const rt = await startRuntime();

Once the runtime is initialized, you can create an Agent instance by specifying the model you wish to use. In the example below, we instantiate an agent using a local model Qwen/Qwen3-0.6B. For a comprehensive list of models and usage examples, see the Agent Models section.

Python
JavaScript

from ailoy import Agent, LocalModel

agent = Agent(rt, LocalModel("Qwen/Qwen3-0.6B"))

import { defineAgent, LocalModel } from "ailoy-node";

const agent = await defineAgent(rt, LocalModel({ id: "Qwen/Qwen3-0.6B" }));

System Messages

Agents can be initialized with a system message, which acts as an initial instruction to guide the assistant's behavior throughout the session. This message sets the tone, persona, or rules for the AI's responses.

Python
JavaScript

agent = Agent(
    rt,
    LocalModel("Qwen/Qwen3-0.6B"),
    system_message="You are a friendly chatbot who always responds in the style of a pirate.",
)

const agent = await defineAgent(rt, LocalModel({ id: "Qwen/Qwen3-0.6B" }), {
  systemMessage:
    "You are a friendly chatbot who always responds in the style of a pirate.",
});

Cleaning Up Agents

To release the resources used by an agent, you should call .delete() when the agent is no longer needed.

Python
JavaScript

agent.delete()

await agent.delete();

In Python, a more robust and idiomatic approach is to use the agent as a context manager. This ensures automatic cleanup when the context exits.

Python

with Agent(rt, LocalModel("Qwen/Qwen3-0.6B")) as agent:
    ...

Agent Models

The models used by agents can be either local models (running on your machine) or API-based models (hosted by external providers like OpenAI, Google, Anthropic or xAI). The setup differs slightly depending on the model type.

Local Models

Local models run entirely on your local machine, offering greater control over performance and privacy. You can define an agent with a local model as shown below:

Python
JavaScript

from ailoy import Runtime, Agent, LocalModel

rt = Runtime()
agent = Agent(rt, LocalModel("Qwen/Qwen3-0.6B"))

import { startRuntime, defineAgent, LocalModel } from "ailoy-node";

const rt = await startRuntime();
const agent = await defineAgent(rt, LocalModel({ id: "Qwen/Qwen3-0.6B" }));

Supported local models include:

note

Ensure that the model is compatible with your hardware. For system requirements and setup instructions, refer to the Devices & Environments page.

API Models

API models are accessed via third-party services. You’ll need an API key from the respective provider. Here’s how to define an agent using API models:

Python
JavaScript

from ailoy import Runtime, Agent, APIModel

rt = Runtime()
# Use OpenAI
agent = Agent(rt, APIModel("gpt-4o", api_key="<OPENAI_API_KEY>"))
# Use Gemini
agent = Agent(rt, APIModel("gemini-2.5-flash", api_key="<GEMINI_API_KEY>"))
# Use Claude
agent = Agent(rt, APIModel("claude-sonnet-4-20250514", api_key="<CLAUDE_API_KEY>"))
# Use Grok
agent = Agent(rt, APIModel("grok-4", api_key="<XAI_API_KEY>"))

import { startRuntime, defineAgent, APIModel } from "ailoy-node";

const rt = await startRuntime();
// Use OpenAI
const agent = await defineAgent(
  rt,
  APIModel({ id: "gpt-4o", apiKey: "<OPENAI_API_KEY>" })
);
// Use Gemini
const agent = await defineAgent(
  rt,
  APIModel({ id: "gemini-2.5-flash", apiKey: "<GEMINI_API_KEY>" })
);
// Use Claude
const agent = await defineAgent(
  rt,
  APIModel({ id: "claude-sonnet-4-20250514", apiKey: "<CLAUDE_API_KEY>" })
);
// Use Grok
const agent = await defineAgent(
  rt,
  APIModel({ id: "grok-4", apiKey: "<XAI_API_KEY>" })
);

Supported API model providers and model IDs:

OpenAI

gpt-5
gpt-5-mini
gpt-5-nano
gpt-5-chat-latest
o4-mini
o3
o3-pro
o3-mini
gpt-4o
gpt-4o-mini
gpt-4.1
gpt-4.1-mini
gpt-4.1-nano

Gemini

gemini-2.5-flash
gemini-2.5-pro
gemini-2.0-flash
gemini-1.5-flash
gemini-1.5-pro

Claude

claude-sonnet-4-20250514
claude-3-7-sonnet-20250219
claude-3-5-sonnet-20241022
claude-3-5-sonnet-20240620
claude-opus-4-1-20250805
claude-opus-4-20250514
claude-3-opus-20240229
claude-3-5-haiku-20241022
claude-3-haiku-20240307

Grok

grok-4
grok-4-0709
grok-3
grok-3-fast
grok-3-mini
grok-3-mini-fast
grok-2
grok-2-1212
grok-2-vision-1212
grok-2-image-1212

To use a model not listed in these presets, you must explicitly set the provider:

Python
JavaScript

agent = Agent(
    rt,
    APIModel(
        "gpt-4o-audio-preview",
        provider="openai",
        api_key="<OPENAI_API_KEY>"
    )
)

const agent = await ai.defineAgent(
  rt,
  ai.APIModel({
    id: "gpt-4o-audio-preview",
    provider: "openai",
    apiKey: "<OPENAI_API_KEY>",
  })
);

Agent Queries

Agents can be queried with natural language prompts and optionally with multimodal inputs like images and audio. Responses are streamed in real-time for an interactive experience.

Single Prompt

The simplest form is sending a single text string:

Python
JavaScript

for resp in agent.query("Please give me a short poem about AI"):
    agent.print(resp)

for await (const resp of agent.query("Please give me a short poem about AI")) {
  agent.print(resp);
}

Multi-modal queries combine text with images or audio files for richer input.

Images Inputs

Images can be passed via URLs or loaded directly using libraries like Pillow (Python) or Sharp (Node.js).

Python
JavaScript

from ailoy import ImageContent

# Image from public URL
for resp in agent.query([
    "What is in this image?",
    ImageContent.from_url("https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"),
]):
    agent.print(resp)

# Image from Pillow
from PIL import Image

image = Image.open("path/to/image.png")
for resp in agent.query([
    "What is in this image?",
    # You can provide image as-is, or via ImageContent.from_pillow()
    image,
    # ImageContent.from_pillow(image),
]):
    agent.print(resp)

import { ImageContent } from "ailoy-node";
import sharp from "sharp";

// Image from public URL
for await (const resp of agent.query([
  "What is in this image?",
  ImageContent.fromUrl(
    "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
  ),
])) {
  agent.print(resp);
}

// Image from Sharp
const image = sharp("path/to/image.png");
for await (const resp of agent.query([
  "What is in this image?",
  // You can provide image as-is, or via ImageContent.fromSharp()
  image,
  // await ImageContent.fromSharp(image),
])) {
  agent.print(resp);
}

Audios

Audio files can be passed using byte streams with format metadata:

Python
JavaScript

from ailoy import AudioContent

with open("path/to/audio.wav", "rb") as f:
    data = f.read()

for resp in agent.query([
    "What's in these recording?",
    AudioContent.from_bytes(data=data, format="wav"),
]):
    agent.print(resp)

import { AudioContent } from "ailoy-node";
import fs from "node:fs";

const buffer = fs.readFileSync("path/to/audio.wav");
for await (const resp of agent.query([
  "What's in these recording?",
  await AudioContent.fromBytes(buffer, "wav"),
])) {
  agent.print(resp);
}

info

Multi-modal support is available only for API models. Capabilities vary by provider:

Model provider	Image (base64)	Image (public URL)	Audio (base64)
OpenAI	✅	✅	✅
Gemini	✅	❌	✅
Claude	✅	❌	❌
Grok	✅	✅	❌

Reasoning

Some models support step-by-step reasoning for complex tasks. Enable this with the reasoning flag:

Python
JavaScript

for resp in agent.query(
    "Please solve me a simultaneous equation: x+y=3, 4x+3y=12",
    reasoning=True
):
    agent.print(resp)

for await (const resp of agent.query(
  "Please solve me a simultaneous equation: x+y=3, 4x+3y=12",
  { reasoning: true }
)) {
  agent.print(resp);
}

Agent Responses

Agent responses are the streamed output of the agent runs. Since Ailoy is designed to stream output on-the-fly, each part of the response can be treated as a real-time output.

Basically, an agent response has the following structure:

{
  type: One of ["output_text" | "tool_call" | "tool_call_result" | "reasoning" | "error"]
  role: One of ["assistant" | "tool"];
  is_type_switched: boolean
  content: Depends on type;
}

The type field indicates what kind of output the agent is currently producing. Depending on the type, the structure of the response may vary.
The role field specifies who is speaking—either the Assistant (LLM model) or a Tool.
The is_type_switched flag indicates whether this response is the first message of a new type. You can use this flag to detect when a new type of message has arrived and trigger actions in your application, such as creating a new message box. See our Gradio chatbot example for a detailed use case.

Here are the descriptions of each response type:

output_text: This is the main textual output from the assistant. The content field contains a string with the generated text.
tool_call: A message indicating that the assistant is requesting a tool to be invoked. Within the agent system, tools automatically receive this call and are expected to return a corresponding tool_call_result. The content contains a JSON-compatible dictionary describing the tool call.
tool_call_result: The result returned by the tool in response to a tool_call. The assistant receives this result and uses it to produce a final response to the user. The content contains a JSON-compatible dictionary with the tool's output.
reasoning: Intermediate reasoning steps produced by a reasoning-enabled model. The content field contains a string with the generated reasoning. If the ignore_reasoning_messages flag is enabled, these messages are omitted from the output.
error: Indicates that an error has occurred. content field contains the reason of the error. After error raised, no more responses will be generated.

Handling Messages

Agent automatically track conversation history. You can access or clear this internal message list as needed.

Python
JavaScript

# Get the list of messages
messages = agent.get_messages()
print(messages)

# Clear messages
agent.clear_messages()

// Get the list of messages
const messages = agent.getMessages();
console.log(messages);

// Clear messages
agent.clearMessages();

Using Tools

Agents can be extended with custom tools, allowing them to perform tasks like database access, API requests, or file manipulation. For more details on tool creation and usage, see the Tools page

Defining Agents​

System Messages​

Cleaning Up Agents​

Agent Models​

Local Models​

API Models​

Agent Queries​

Single Prompt​

Multi-Modal Inputs​

Images Inputs​

Audios​

Reasoning​

Agent Responses​

Handling Messages​

Using Tools​