Ailoy Python API Reference
Agent
The Agent is the central orchestrator that connects the language model, tools, and knowledge components. It manages the entire reasoning and action loop, coordinating how each subsystem contributes to the final response.
In essence, the Agent:
- Understands user input
- Interprets structured responses from the language model (such as tool calls)
- Executes tools as needed
- Retrieves and integrates contextual knowledge before or during inference
Public APIs
run_delta: Runs a user query and streams incremental deltas (partial outputs)run: Runs a user query and returns a complete message once all deltas are accumulated
Delta vs. Complete Message
A delta represents a partial piece of model output, such as a text fragment or intermediate reasoning step. Deltas can be accumulated into a full message using the provided accumulation utilities. This allows real-time streaming while preserving the ability to reconstruct the final structured result.
See MessageDelta.
Components
- Language Model: Generates natural language and structured outputs. It interprets the conversation context and predicts the assistant’s next action.
- Tool: Represents external functions or APIs that the model can dynamically invoke. The
Agentdetects tool calls and automatically executes them during the reasoning loop. - Knowledge: Provides retrieval-augmented reasoning by fetching relevant information from stored documents or databases. When available, the
Agentenriches model input with these results before generating an answer.
Source code in ailoy/_core.pyi
remove_knowledge
Source code in ailoy/_core.pyi
remove_tool
Source code in ailoy/_core.pyi
remove_tools
Source code in ailoy/_core.pyi
run
run(messages: str | list[Message], config: Optional[AgentConfig] = None) -> MessageOutputIterator
Source code in ailoy/_core.pyi
run_delta
run_delta(messages: str | list[Message], config: Optional[AgentConfig] = None) -> MessageDeltaOutputIterator
Source code in ailoy/_core.pyi
run_delta_sync
run_delta_sync(messages: str | list[Message], config: Optional[AgentConfig] = None) -> MessageDeltaOutputSyncIterator
Source code in ailoy/_core.pyi
run_sync
run_sync(messages: str | list[Message], config: Optional[AgentConfig] = None) -> MessageOutputSyncIterator
Source code in ailoy/_core.pyi
AgentConfig
Configuration for running the agent.
See InferenceConfig and KnowledgeConfig for more details.
Source code in ailoy/_core.pyi
from_dict
classmethod
from_dict(config: dict) -> AgentConfig
CacheProgress
Source code in ailoy/_core.pyi
Document
Source code in ailoy/_core.pyi
DocumentPolyfill
Provides a polyfill for LLMs that do not natively support the Document feature.
Source code in ailoy/_core.pyi
get
classmethod
get(kind: Literal['Qwen3']) -> DocumentPolyfill
EmbeddingModel
Source code in ailoy/_core.pyi
infer
async
Source code in ailoy/_core.pyi
infer_sync
Source code in ailoy/_core.pyi
new_local
classmethod
new_local(model_name: str, device_id: Optional[int] = None, progress_callback: Callable[[CacheProgress], None] = None) -> Awaitable[EmbeddingModel]
new_local_sync
classmethod
new_local_sync(model_name: str, device_id: Optional[int] = None, progress_callback: Callable[[CacheProgress], None] = None) -> EmbeddingModel
FinishReason
Bases: Enum
Explains why a language model's streamed generation finished.
Source code in ailoy/_core.pyi
Refusal
class-attribute
instance-attribute
Content was refused/filtered; string provides reason.
Stop
class-attribute
instance-attribute
The model stopped naturally (e.g., EOS token or stop sequence).
Grammar
Source code in ailoy/_core.pyi
CFG
JSON
JSONSchema
Plain
InferenceConfig
Configuration parameters that control the behavior of model inference.
InferenceConfig encapsulates all the configuration, controlling behavior of `LanguageModel`` inference.
Fields
document_polyfill
Configuration describing how retrieved documents are embedded into the model input.
If None, it does not perform any polyfill, (ignoring documents).
think_effort
Controls the model’s reasoning intensity.
In local models, low, medium, high is ignored.
In API models, it is up to it's API. See API parameters.
Possible values: disable, enable, low, medium, high.
temperature
Sampling temperature controlling randomness of output. Lower values make output more deterministic; higher values increase diversity.
top_p
Nucleus sampling parameter (probability mass cutoff).
Limits token sampling to a cumulative probability ≤ top_p.`
max_tokens
Maximum number of tokens to generate for a single inference.
grammar
Optional grammar constraint that restricts valid output forms.
Supported types include:
- Plain: unconstrained text
- JSON: ensures valid JSON output
- JSONSchema { schema }: validates JSON against the given schema
- Regex { regex }: constrains generation by a regular expression
- CFG { cfg }: uses a context-free grammar definition
Source code in ailoy/_core.pyi
think_effort
property
writable
from_dict
classmethod
from_dict(config: dict) -> InferenceConfig
Knowledge
Source code in ailoy/_core.pyi
new_vector_store
classmethod
new_vector_store(store: VectorStore, embedding_model: EmbeddingModel) -> Knowledge
retrieve
async
retrieve(query: str, config: KnowledgeConfig) -> list[Document]
Source code in ailoy/_core.pyi
KnowledgeConfig
Source code in ailoy/_core.pyi
from_dict
classmethod
from_dict(config: dict) -> KnowledgeConfig
LangModel
Source code in ailoy/_core.pyi
infer
infer(messages: str | list[Message], tools: Optional[Sequence[ToolDesc]] = None, documents: Optional[Sequence[Document]] = None, config: Optional[InferenceConfig] = None) -> Awaitable[MessageOutput]
Source code in ailoy/_core.pyi
infer_delta
infer_delta(messages: str | list[Message], tools: Optional[Sequence[ToolDesc]] = None, documents: Optional[Sequence[Document]] = None, config: Optional[InferenceConfig] = None) -> MessageDeltaOutputIterator
Source code in ailoy/_core.pyi
infer_delta_sync
infer_delta_sync(messages: str | list[Message], tools: Optional[Sequence[ToolDesc]] = None, documents: Optional[Sequence[Document]] = None, config: Optional[InferenceConfig] = None) -> MessageDeltaOutputSyncIterator
Source code in ailoy/_core.pyi
infer_sync
infer_sync(messages: str | list[Message], tools: Optional[Sequence[ToolDesc]] = None, documents: Optional[Sequence[Document]] = None, config: Optional[InferenceConfig] = None) -> MessageOutput
Source code in ailoy/_core.pyi
new_local
classmethod
new_local(model_name: str, device_id: Optional[int] = None, progress_callback: Callable[[CacheProgress], None] = None) -> Awaitable[LangModel]
new_local_sync
classmethod
new_local_sync(model_name: str, device_id: Optional[int] = None, progress_callback: Callable[[CacheProgress], None] = None) -> LangModel
new_stream_api
classmethod
new_stream_api(spec: Literal['ChatCompletion', 'OpenAI', 'Gemini', 'Claude', 'Responses', 'Grok'], model_name: str, api_key: str) -> LangModel
MCPClient
Source code in ailoy/_core.pyi
Message
A chat message generated by a user, model, or tool.
Message is the concrete, non-streaming container used by the application to store, transmit, or feed structured content into models or tools.
It can represent various kinds of messages, including user input, assistant responses, tool-call outputs, or signed thinking metadata.
Note that many different kinds of messages can be produced.
For example, a language model may internally generate a thinking trace before emitting its final output, in order to improve reasoning accuracy.
In other cases, a model may produce function calls — structured outputs that instruct external tools to perform specific actions.
This struct is designed to handle all of these situations in a unified way.
Example
Rust
let msg = Message::new(Role::User).with_contents([Part::text("hello")]);
assert_eq!(msg.role, Role::User);
assert_eq!(msg.contents.len(), 1);
Source code in ailoy/_core.pyi
264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 | |
contents
property
writable
contents: list[Part]
Primary parts of the message (e.g., text, image, value, or function).
signature
property
writable
Optional signature for the thinking field.
This is only applicable to certain LLM APIs that require a signature as part of the thinking payload.
thinking
property
writable
Internal “thinking” text used by some models before producing final output.
tool_calls
property
writable
tool_calls: Optional[list[Part]]
Tool-call parts emitted alongside the main contents.
MessageDelta
A streaming, incremental update to a [Message].
MessageDelta accumulates partial outputs (text chunks, tool-call fragments, IDs, signatures, etc.) until they can be materialized as a full [Message].
It implements [Delta] to support accumulation.
Accumulation Rules
role: merging two distinct roles fails.thinking: concatenated in arrival order.contents/tool_calls: last element is accumulated with the incoming delta when both are compatible (e.g., Text+Text, Function+Function with matching ID policy), otherwise appended as a new fragment.id/signature: last-writer-wins.
Finalization
finish()converts the accumulated deltas into a fully-formed [Message]. Fails if required fields (e.g.,role) are missing or inner deltas cannot be finalized.
Examples
let d1 = MessageDelta::new().with_role(Role::Assistant).with_contents([PartDelta::Text { text: "Hel".into() }]);
let d2 = MessageDelta::new().with_contents([PartDelta::Text { text: "lo".into() }]);
let merged = d1.accumulate(d2).unwrap();
let msg = merged.finish().unwrap();
assert_eq!(msg.contents[0].as_text().unwrap(), "Hello");
Source code in ailoy/_core.pyi
MessageDeltaOutput
A container for a streamed message delta and its termination signal.
During streaming, delta carries the incremental payload; once a terminal
condition is reached, finish_reason may be populated to explain why.
Examples
let mut out = MessageOutput::new();
out.delta = MessageDelta::new().with_role(Role::Assistant).with_contents([PartDelta::Text { text: "Hi".into() }]);
assert!(out.finish_reason.is_none());
Lifecycle
- While streaming:
finish_reasonis typicallyNone. - On completion:
finish_reasonis set; callers can thenfinish()the delta to obtain a concrete [Message].
Source code in ailoy/_core.pyi
MessageDeltaOutputIterator
Source code in ailoy/_core.pyi
MessageDeltaOutputSyncIterator
Source code in ailoy/_core.pyi
MessageOutput
Source code in ailoy/_core.pyi
MessageOutputIterator
Source code in ailoy/_core.pyi
MessageOutputSyncIterator
Source code in ailoy/_core.pyi
Part
Represents a semantically meaningful content unit exchanged between the model and the user.
Conceptually, each Part encapsulates a piece of data that contributes
to a chat message — such as text, a function invocation, or an image.
For example, a single message consisting of a sequence like
(text..., image, text...) is represented as a Message containing
an array of three Part elements.
Note that a Part does not carry "intent", such as "reasoning" or "tool call".
These higher-level semantics are determined by the context of a [Message].
Example
Rust
Source code in ailoy/_core.pyi
470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 | |
Function
Bases: Part
Represents a structured function call to an external tool.
Many language models (LLMs) use a function calling mechanism to extend their capabilities.
When an LLM decides to use external tools, it produces a structured output called a function.
A function conventionally consists of two fields: a name, and an arguments field formatted as JSON.
This is conceptually similar to making an HTTP POST request, where the request body carries a single JSON object.
This struct models that convention, representing a function invocation request from an LLM to an external tool or API.
Examples
let f = PartFunction {
name: "translate".to_string(),
arguments: Value::from_json(r#"{"source": "hello", "lang": "cn"}"#).unwrap(),
};
Source code in ailoy/_core.pyi
Image
Bases: Part
Contains an image payload or reference used within a message part. The image may be provided as raw binary data or an encoded format (e.g., PNG, JPEG), or as a reference via a URL. Optional metadata can be included alongside the image.
Source code in ailoy/_core.pyi
Text
Value
Bases: Part
Holds a structured data value, typically considered as a JSON structure.
Source code in ailoy/_core.pyi
image_from_base64
classmethod
image_from_base64(data: str) -> Part
image_from_bytes
classmethod
image_from_bytes(data: bytes) -> Part
PartDelta
Represents a partial or incremental update (delta) of a [Part].
This type enables composable, streaming updates to message parts. For example, text may be produced token-by-token, or a function call may be emitted gradually as its arguments stream in.
Example
Rust
let d1 = PartDelta::Text { text: "Hel".into() };
let d2 = PartDelta::Text { text: "lo".into() };
let merged = d1.accumulate(d2).unwrap();
assert_eq!(merged.to_text().unwrap(), "Hello");
Error Handling
Accumulation or finalization may return an error if incompatible deltas (e.g. mismatched function IDs) are combined or invalid JSON arguments are given.
Source code in ailoy/_core.pyi
Function
Bases: PartDelta
Incremental function call fragment.
Source code in ailoy/_core.pyi
Null
Text
Value
PartDeltaFunction
Represents an incremental update (delta) of a function part.
This type is used during streaming or partial message generation, when function calls are being streamed as text chunks or partial JSON fragments.
Variants
Verbatim(String)— Raw text content, typically a partial JSON fragment.WithStringArgs { name, arguments }— Function name and its serialized arguments as strings.WithParsedArgs { name, arguments }— Function name and parsed arguments as aValue.
Use Case
When the model streams out a function call response (e.g., "function_call":{"name":...}),
the incremental deltas can be accumulated until the full function payload is formed.
Example
let delta = PartDeltaFunction::WithStringArgs {
name: "translate".into(),
arguments: r#"{"text":"hi"}"#.into(),
};
Source code in ailoy/_core.pyi
Verbatim
WithParsedArgs
Bases: PartDeltaFunction
Source code in ailoy/_core.pyi
WithStringArgs
Bases: PartDeltaFunction
Source code in ailoy/_core.pyi
PartFunction
Represents a function call contained within a message part.
Source code in ailoy/_core.pyi
PartImage
Represents the image data contained in a [Part].
PartImage provides structured access to image data.
Currently, it only implments "binary" types.
Example
let part = Part::image_binary(640, 480, "rgb", (0..640*480*3).map(|i| (i % 255) as u8)).unwrap();
if let Some(img) = part.as_image() {
assert_eq!(img.height(), 640);
assert_eq!(img.width(), 480);
}
Source code in ailoy/_core.pyi
Binary
Bases: PartImage
Source code in ailoy/_core.pyi
Tool
Source code in ailoy/_core.pyi
__call__
Source code in ailoy/_core.pyi
call
Source code in ailoy/_core.pyi
call_sync
Source code in ailoy/_core.pyi
new_builtin
classmethod
new_builtin(kind: Literal['terminal', 'web_search_duckduckgo', 'web_fetch'], **kwargs: Any) -> Tool
ToolDesc
Describes a tool (or function) that a language model can invoke.
ToolDesc defines the schema, behavior, and input/output specification of a callable
external function, allowing an LLM to understand how to use it.
The primary role of this struct is to describe to the LLM what a tool does,
how it can be invoked, and what input (parameters) and output (returns) schemas it expects.
The format follows the same schema conventions used by Hugging Face’s
transformers library, as well as APIs such as OpenAI and Anthropic.
The parameters and returns fields are typically defined using JSON Schema.
We provide a builder [ToolDescBuilder] helper for convenient and fluent construction.
Please refer to [ToolDescBuilder].
Example
use crate::value::{ToolDescBuilder, to_value};
let desc = ToolDescBuilder::new("temperature")
.description("Get the current temperature for a given city")
.parameters(to_value!({
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city name"
},
"unit": {
"type": "string",
"description": "Temperature unit (default: Celsius)",
"enum": ["Celsius", "Fahrenheit"]
}
},
"required": ["location"]
}))
.returns(to_value!({
"type": "number"
}))
.build();
assert_eq!(desc.name, "temperature");
Source code in ailoy/_core.pyi
VectorStore
Source code in ailoy/_core.pyi
add_vectors
add_vectors(inputs: Sequence[VectorStoreAddInput]) -> list[str]
Source code in ailoy/_core.pyi
batch_retrieve
batch_retrieve(query_embeddings: Sequence[list[float]], top_k: int) -> list[list[VectorStoreRetrieveResult]]
Source code in ailoy/_core.pyi
get_by_ids
get_by_ids(ids: Sequence[str]) -> list[VectorStoreGetResult]
Source code in ailoy/_core.pyi
new_chroma
classmethod
new_chroma(url: str, collection_name: Optional[str]) -> VectorStore
new_faiss
classmethod
new_faiss(dim: int) -> VectorStore
remove_vector
Source code in ailoy/_core.pyi
remove_vectors
Source code in ailoy/_core.pyi
retrieve
retrieve(query_embedding: list[float], top_k: int) -> list[VectorStoreRetrieveResult]