Skip to main content

Agent

The Agent is the core building block in Ailoy for developing agentic AI applications. It serves as the primary interface for interacting with large language models (LLMs), allowing users to send queries, receive streamed responses, and integrate multi-modal inputs and external tools seamlessly.

Defining Agents

Before you can use an Agent, you must first initialize a Runtime, which is responsible for managing low-level execution, model inference, and resource coordination. This setup step is required regardless of whether you're using local or API-based models.

from ailoy import Runtime

rt = Runtime()

Once the runtime is initialized, you can create an Agent instance by specifying the model you wish to use. In the example below, we instantiate an agent using a local model Qwen/Qwen3-0.6B. For a comprehensive list of models and usage examples, see the Agent Models section.

from ailoy import Agent, LocalModel

agent = Agent(rt, LocalModel("Qwen/Qwen3-0.6B"))

System Messages

Agents can be initialized with a system message, which acts as an initial instruction to guide the assistant's behavior throughout the session. This message sets the tone, persona, or rules for the AI's responses.

agent = Agent(
rt,
LocalModel("Qwen/Qwen3-0.6B"),
system_message="You are a friendly chatbot who always responds in the style of a pirate.",
)

Cleaning Up Agents

To release the resources used by an agent, you should call .delete() when the agent is no longer needed.

agent.delete()

In Python, a more robust and idiomatic approach is to use the agent as a context manager. This ensures automatic cleanup when the context exits.

with Agent(rt, LocalModel("Qwen/Qwen3-0.6B")) as agent:
...

Agent Models

The models used by agents can be either local models (running on your machine) or API-based models (hosted by external providers like OpenAI, Google, Anthropic or xAI). The setup differs slightly depending on the model type.

Local Models

Local models run entirely on your local machine, offering greater control over performance and privacy. You can define an agent with a local model as shown below:

from ailoy import Runtime, Agent, LocalModel

rt = Runtime()
agent = Agent(rt, LocalModel("Qwen/Qwen3-0.6B"))

Supported local models include:

note

Ensure that the model is compatible with your hardware. For system requirements and setup instructions, refer to the Devices & Environments page.

API Models

API models are accessed via third-party services. You’ll need an API key from the respective provider. Here’s how to define an agent using API models:

from ailoy import Runtime, Agent, APIModel

rt = Runtime()
# Use OpenAI
agent = Agent(rt, APIModel("gpt-4o", api_key="<OPENAI_API_KEY>"))
# Use Gemini
agent = Agent(rt, APIModel("gemini-2.5-flash", api_key="<GEMINI_API_KEY>"))
# Use Claude
agent = Agent(rt, APIModel("claude-sonnet-4-20250514", api_key="<CLAUDE_API_KEY>"))
# Use Grok
agent = Agent(rt, APIModel("grok-4", api_key="<XAI_API_KEY>"))

Supported API model providers and model IDs:

OpenAI
  • o4-mini
  • o3
  • o3-pro
  • o3-mini
  • gpt-4o
  • gpt-4o-mini
  • gpt-4.1
  • gpt-4.1-mini
  • gpt-4.1-nano
Gemini
  • gemini-2.5-flash
  • gemini-2.5-pro
  • gemini-2.0-flash
  • gemini-1.5-flash
  • gemini-1.5-pro
Claude
  • claude-sonnet-4-20250514
  • claude-3-7-sonnet-20250219
  • claude-3-5-sonnet-20241022
  • claude-3-5-sonnet-20240620
  • claude-opus-4-20250514
  • claude-3-opus-20240229
  • claude-3-5-haiku-20241022
  • claude-3-haiku-20240307
Grok
  • grok-4
  • grok-4-0709
  • grok-3
  • grok-3-fast
  • grok-3-mini
  • grok-3-mini-fast
  • grok-2
  • grok-2-1212
  • grok-2-vision-1212
  • grok-2-image-1212

To use a model not listed in these presets, you must explicitly set the provider:

agent = Agent(
rt,
APIModel(
"gpt-4o-audio-preview",
provider="openai",
api_key="<OPENAI_API_KEY>"
)
)

Agent Queries

Agents can be queried with natural language prompts and optionally with multimodal inputs like images and audio. Responses are streamed in real-time for an interactive experience.

Single Prompt

The simplest form is sending a single text string:

for resp in agent.query("Please give me a short poem about AI"):
agent.print(resp)

Multi-Modal Inputs

Multi-modal queries combine text with images or audio files for richer input.

Images Inputs

Images can be passed via URLs or loaded directly using libraries like Pillow (Python) or Sharp (Node.js).

from ailoy import ImageContent

# Image from public URL
for resp in agent.query([
"What is in this image?",
ImageContent.from_url("https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"),
]):
agent.print(resp)

# Image from Pillow
from PIL import Image

image = Image.open("path/to/image.png")
for resp in agent.query([
"What is in this image?",
# You can provide image as-is, or via ImageContent.from_pillow()
image,
# ImageContent.from_pillow(image),
]):
agent.print(resp)

Audios

Audio files can be passed using byte streams with format metadata:

from ailoy import AudioContent

with open("path/to/audio.wav", "rb") as f:
data = f.read()

for resp in agent.query([
"What's in these recording?",
AudioContent.from_bytes(data=data, format="wav"),
]):
agent.print(resp)
info

Multi-modal support is available only for API models. Capabilities vary by provider:

Model providerImage (base64)Image (public URL)Audio (base64)
OpenAI
Gemini
Claude
Grok

Reasoning

Some models support step-by-step reasoning for complex tasks. Enable this with the reasoning flag:

for resp in agent.query(
"Please solve me a simultaneous equation: x+y=3, 4x+3y=12",
reasoning=True
):
agent.print(resp)

Agent Responses

Agent responses are the streamed output of the agent runs. Since Ailoy is designed to stream output on-the-fly, each part of the response can be treated as a real-time output.

Basically, an agent response has the following structure:

{
type: One of ["output_text" | "tool_call" | "tool_call_result" | "reasoning" | "error"]
role: One of ["assistant" | "tool"];
is_type_switched: boolean
content: Depends on type;
}
  • The type field indicates what kind of output the agent is currently producing. Depending on the type, the structure of the response may vary.
  • The role field specifies who is speaking—either the Assistant (LLM model) or a Tool.
  • The is_type_switched flag indicates whether this response is the first message of a new type. You can use this flag to detect when a new type of message has arrived and trigger actions in your application, such as creating a new message box. See our Gradio chatbot example for a detailed use case.

Here are the descriptions of each response type:

  • output_text: This is the main textual output from the assistant. The content field contains a string with the generated text.
  • tool_call: A message indicating that the assistant is requesting a tool to be invoked. Within the agent system, tools automatically receive this call and are expected to return a corresponding tool_call_result. The content contains a JSON-compatible dictionary describing the tool call.
  • tool_call_result: The result returned by the tool in response to a tool_call. The assistant receives this result and uses it to produce a final response to the user. The content contains a JSON-compatible dictionary with the tool's output.
  • reasoning: Intermediate reasoning steps produced by a reasoning-enabled model. The content field contains a string with the generated reasoning. If the ignore_reasoning_messages flag is enabled, these messages are omitted from the output.
  • error: Indicates that an error has occurred. content field contains the reason of the error. After error raised, no more responses will be generated.

Handling Messages

Agent automatically track conversation history. You can access or clear this internal message list as needed.

# Get the list of messages
messages = agent.get_messages()
print(messages)

# Clear messages
agent.clear_messages()

Using Tools

Agents can be extended with custom tools, allowing them to perform tasks like database access, API requests, or file manipulation. For more details on tool creation and usage, see the Tools page