Calling Low-Level APIs
As mentioned earlier, the Agent
is essentially a wrapper around the Runtime
.
In a nutshell, a component called the VM
is responsible for handling
compute-intensive tasks, such as LLM inference. The Runtime
serves as a bridge
between the user and the VM
, delegating user requests to the VM
and
forwarding results back from the VM
to the user.
By using the Runtime
directly, you can send requests to the VM
yourself.
This is useful when you need more advanced configuration or want to call APIs
that are not available through the high-level interfaces.
See Internal Runtime APIs for the full list of available APIs.
Operator
Let’s take a look at how the lower-level API works, with the simplest example
echo
. The echo
operator is a basic operator that returns the user's input
as-is.
Here’s how to call the echo operator:
- Python
- JavaScript(Node)
from ailoy import Runtime
rt = Runtime()
result = rt.call("echo", {"text": "Hello world"})
print(result)
rt.stop()
import { startRuntime } from "ailoy-node";
(async () => {
const rt = await startRuntime();
const result = await rt.call("echo", { text: "Hello world" });
console.log(results);
await rt.stop();
})();
Text chunking
Next, let’s try a more practical operator. The
split_text
operator
transforms a long document into a set of smaller chunks. This operation is
essential for tasks like Retrieval-Augmented Generation (RAG).
We’ll use Lev Tolstoy’s What Men Live by to break a long passage into multiple chunks.
- Python
- JavaScript(Node)
from ailoy import Runtime
rt = Runtime()
with open("what_men_live_by.txt") as f:
text = f.read()
result = rt.call("split_text", {"text": text})
chunks = result["chunks"]
print(len(chunks)) # == 12
# Print chunks
print("********************** First chunk **********************")
print(chunks[0])
print("*********************************************************")
rt.stop()
import { startRuntime } from "ailoy-node";
import { readFile } from "fs/promises";
(async () => {
const rt = await startRuntime();
const text = await readFile("what_men_live_by.txt", "utf-8");
const result = await rt.call("split_text", { text });
const chunks = result.chunks;
console.log(chunks.length); // == 12
// Print chunks
console.log("********************** First chunk **********************");
console.log(chunks[0]);
console.log("*********************************************************");
await rt.stop();
})();
Component
Some jobs may require state, like member variables in a class. Ailoy provides
support for this kind of stateful structure through a type called a Component
.
Direct language model inference
In this example, we’ll directly invoke an LLM using the low-level Runtime API.
To begin, a Component
must first be initialized using the define
call. In
here, we define a Component
of type
tvm_language_model
with the name lm0
. It can later be removed using the delete
call.
The tvm_language_model Component
provides a method
infer
,
which can be used to perform LLM inference.
- Python
- JavaScript(Node)
from ailoy import Runtime
rt = Runtime()
rt.define("tvm_language_model", "lm0", {"model": "Qwen/Qwen3-0.6B"})
for resp in rt.call_iter_method(
"lm0", "infer", {"messages": [{"role": "user", "content": [{"type": "text", "text": "What's your name?"}]}]}
):
print(resp)
rt.delete("lm0")
rt.stop()
import { startRuntime } from "ailoy-node";
(async () => {
const rt = await startRuntime();
await rt.define("tvm_language_model", "lm0", {
model: "Qwen/Qwen3-0.6B",
});
for await (const resp of rt.callIterMethod("lm0", "infer", {
messages: [
{ role: "user", content: [{ type: "text", text: "What's your name?" }] },
],
})) {
console.log(resp);
}
await rt.delete("lm0");
await rt.stop();
})();
You can put the context of freely structured conversations into Ailoy's
low-level Runtime
API calls for more advanced multi-turn conversation. e.g.
Chain-of-Thought Prompting
In theory, an LLM generates responses based only on the context provided at the moment—it doesn't retain memory of past interactions by itself. Therefore, to receive an appropriate response in an ongoing conversation with an LLM, you must provide the entire conversation history up to that point each time you make a request.
- Python
- JavaScript(Node)
with Runtime() as rt:
rt.define("tvm_language_model", "lm0", {"model": "Qwen/Qwen3-8B"})
messages = [
{"role": "user", "content": [{"type": "text", "text": "if a>1, what is the sum of the real solutions of 'sqrt(a-sqrt(a+x))=x'?"}]},
{"role": "assistant", "content": [{"type": "text", "text": "Let's think step by step."}]}, # Chain-of-Thought prompt
]
for resp in rt.call_iter_method("lm0", "infer", {"messages": messages}):
print(resp["message"]["content"][0]["text"], end='')
rt.delete("lm0")
const rt = await startRuntime();
await rt.define("tvm_language_model", "lm0", {
model: "Qwen/Qwen3-8B",
});
const messages = [
{
role: "user",
content: [
{
type: "text",
text: "if a>1, what is the sum of the real solutions of 'sqrt(a-sqrt(a+x))=x'?",
},
],
},
{
role: "assistant",
content: [{ type: "text", text: "Let's think step by step." }],
}, // Chain-of-Thought prompt
];
for await (const resp of rt.callIterMethod("lm0", "infer", {
messages: messages,
})) {
process.stdout.write(resp.message.content[0].text);
}
await rt.delete("lm0");
await rt.stop();
To override the system message, you can include a message with
"role": "system"
as the first element in the context messages array.
- Python
- JavaScript(Node)
with Runtime() as rt:
rt.define("tvm_language_model", "lm0", {"model": "Qwen/Qwen3-8B"})
messages = [
{"role": "system", "content": [{"type": "text", "text": "You are a friendly chatbot who always responds in the style of a pirate."}]},
{"role": "user", "content": [{"type": "text", "text": "Who are you?"}]},
]
for resp in rt.call_iter_method("lm0", "infer", {"messages": messages}):
print(resp["message"]["content"][0]["text"], end='')
rt.delete("lm0")
const rt = await startRuntime();
await rt.define("tvm_language_model", "lm0", {
model: "Qwen/Qwen3-8B",
});
const messages = [
{
role: "system",
content: [
{
type: "text",
text: "You are a friendly chatbot who always responds in the style of a pirate.",
},
],
},
{ role: "user", content: [{ type: "text", text: "Who are you?" }] },
];
for await (const resp of rt.callIterMethod("lm0", "infer", {
messages: messages,
})) {
process.stdout.write(resp.message.content[0].text);
}
await rt.delete("lm0");
await rt.stop();