Skip to main content

Chat Completion Format

Conventionally, in a chat completion setup, both input and output follow a structured format.
Messages are typically represented as follows:

[
{
"role": "system",
"contents": [
{ "type": "text", "text": "You are a friendly and knowledgeable assistant." }
]
},
{
"role": "user",
"contents": [
{ "type": "text", "text": "Can you explain how photosynthesis works?" }
]
}
]

When it is executed, the output might look like:

{
"role": "assistant",
"contents": [
{
"type": "text",
"text": "Photosynthesis is the process by which plants convert sunlight, water, and carbon dioxide into energy. They use sunlight to produce glucose (a form of sugar) and release oxygen as a byproduct."
}
]
}
info

Please refer to followings for more about this schema.

Message

A Message represents one conversational turn — what one participant (system, user, or assistant) says or does.

Each message contains:

  • a role, indicating who is speaking,
  • a set of contents, describing what was said or sent.

Role

RoleDescription
SystemSystem instructions and constraints provided to the assistant. It usually defines the model’s behavior or persona.
UserContents authored by the user.
AssistantContents automatically generated by the assistant / AI model.
ToolExecution results produced by external tools / functions.

Contents

What is said from each role is generally referred to as content.
However, a model can operate in different modes, producing outputs that serve different purposes. These outputs reflect the intention behind what the model says or performs.

IntentDescription
ContentGeneral output.
Thinking(Reasoning)Some model generates intermediate thoughts or thinking traces before producing the final answer. These are stored in the thinking field (often hidden from the user) and can help trace or visualize the model’s internal decision process.
Tool callWhen the model decides to invoke an external function or API instead of generating plain text. These are represented as structured objects that describe which function to call(name) and with what arguments.

These types of outputs are stored in separate fields within a message, making it possible to distinguish from general conversation.

Example

[
Message {
role: "assistant",
thinking: "Let’s reason step by step: photosynthesis converts light energy into chemical energy...",
contents: [
{ type: "text", text: "Photosynthesis is the process by which plants convert sunlight, water, and carbon dioxide into energy." }
],
tool_calls: [
{
type: "function",
function: {
id: "func_call_1234abcd",
name: "get_current_location",
arguments: ...
}
}
],
},
]

Part

While Contents describe the intention of a message, Part defines the data type of that message.

Part can be considered as a smallest semantic units within a conversation. Each Message contains a list of Part objects. A Part can represent text, images, function calls, or any structured values, enabling rich multimodal communication.

PartDescription
TextNatural-language text content
ImageVisual data or reference (e.g., binary, URL, or metadata)
FunctionStructured tool or function invocation
ValueArbitrary data (numbers, objects, JSON values, etc.)

For example, if a user asks about an image, the message could look like this:

contents: [
{type: "image", "image": {data: "..."}}
{type: "text", text: "What you can see in this image?"}
]

Together, these parts express Ailoy’s multimodal conversation.

Delta

The inference of a language model can take a significant amount of time. To improve real-time responsiveness, many AI systems stream tokens as they are generated. These streamed outputs are typically delivered in the form of deltas.

A delta(MessageDelta or PartDelta) represents an output of this step(inference) updated during a streaming response. As the model generates text token by token, each incremental addition is emitted as a delta, which is later merged into a complete Message.

Ailoy provides a simple way to retrieve and aggregate deltas. One can use accumulate and finish function.

Tool

A Tool enables the model to act — to execute external functions, access APIs, or perform any operation beyond plain text generation.

Each tool has two halves:

  • Description (declarative) — defines what the tool is and how it can be called.
  • Behavior (imperative) — defines what your code actually does when the tool is invoked.

Together, they allow the model to dynamically invoke external capabilities while keeping reasoning and execution logically separated.

info

Please refer to following resources for more about the tool schema convention.

Tool Description

Ailoy follows a JSON-Schema-like convention to describe tool arguments and optional return schemas. This ensures that language models can reliably construct valid function calls — the schema precisely defines the allowed parameters, required fields, and expected return structure.

{
"name": "get_temperature",
"description": "Retrieve current temperature for a specific city.",
"parameters": {
"type": "object",
"required": ["city"],
"properties": {
"city": { "type": "string", "description": "City name" },
"unit": { "type": "string", "enum": ["celcius", "fahrenheit"], "default": "celcius" }
},
"additionalProperties": false
},
"returns": {
"type": "number"
}
}

Tool Call

When a model decides to use a tool instead of generating plain text, it emits a tool call. This occurs inside an assistant message — since the assistant is the one deciding to call a tool — but the call is placed inside the tool_calls field (not contents).

A typical tool call message looks like this:

{
"role": "assistant",
"contents": [
{ "type": "text", "text": "Let me check the current weather for you." }
],
"tool_calls": [
{
"type": "function",
"function": {
"name": "get_weather",
"arguments": { "city": "Seoul", "unit": "celcius" }
},
"id": "call_01HZX2..."
}
]
}

Explanation:

  • The assistant outputs a structured tool call in tool_calls segment.
  • The runtime then executes the corresponding function based on its name and arguments.
  • The id field can optionally be tagged, which uniquely identifies this tool call, allowing the tool’s response to be correctly linked.

Tool Response

Once the runtime executes the tool’s behavior, it must append a new message to the conversation, with the role set to "tool". Also, the result is stored inside the contents field.

For example

{
"role": "tool",
"tool_call_id": "call_01HZX2...",
"name": "get_weather",
"contents": [
{ "type": "text", "text": "12.3" }
]
}

When an error occurs in a tool, it should still be passed along so that the model can recognize and handle it.

{
"role": "tool",
"tool_call_id": "call_01HZX2...",
"name": "get_weather",
"contents": [
{ "type": "text", "text": "{ \"code\": \"NOT_FOUND\" }" }
]
}

Document

A Document is the normalized representation of any retrievable knowledge item. It consists of two components: a title and a text body.

{
"title": "...",
"text": "..."
}
  • title A short, descriptive label that helps identify the document. The title is not used as part of model inference — it is primarily for indexing, display, and retrieval ranking. You can think of it as metadata or a summary, similar to a filename or headline.

  • text The actual content of the document. This field is fed directly into the language model during retrieval-augmented inference. It contains the meaningful information that the model can read, reason about, and use to generate responses.

All retrieved knowledge sources (e.g., from vector databases, APIs, or local files) are normalized into this unified "document" format before being passed to the model. This ensures that, regardless of the original source or schema, the model always receives consistent input.

For example:

SourceNormalized as
Web article{ "title": "Article Title", "text": "Full article content..." }
PDF extract{ "title": "File: research.pdf", "text": "Extracted paragraph..." }
Knowledge base entry{ "title": "FAQ: Model Loading", "text": "To load a model, use..." }

This design simplifies retrieval and unifies downstream processing within Ailoy’s knowledge pipeline.