Skip to main content

Calling Low-Level APIs

As mentioned earlier, the Agent is essentially a wrapper around the Runtime. In a nutshell, a component called the VM is responsible for handling compute-intensive tasks, such as LLM inference. The Runtime serves as a bridge between the user and the VM, delegating user requests to the VM and forwarding results back from the VM to the user.

RuntimeVMMy App.RequestResponseLLM Inference

By using the Runtime directly, you can send requests to the VM yourself. This is useful when you need more advanced configuration or want to call APIs that are not available through the high-level interfaces.

info

See Internal Runtime APIs for the full list of available APIs.

Operator

Let’s take a look at how the lower-level API works, with the simplest example echo. The echo operator is a basic operator that returns the user's input as-is.

Here’s how to call the echo operator:

from ailoy import Runtime

rt = Runtime()

result = rt.call("echo", {"text": "Hello world"})

print(result)

rt.stop()
Hello world

Text chunking

Next, let’s try a more practical operator. The split_text operator transforms a long document into a set of smaller chunks. This operation is essential for tasks like Retrieval-Augmented Generation (RAG).

We’ll use Lev Tolstoy’s What Men Live by to break a long passage into multiple chunks.

from ailoy import Runtime

rt = Runtime()

with open("what_men_live_by.txt") as f:
text = f.read()

result = rt.call("split_text", {"text": text})
chunks = result["chunks"]

print(len(chunks)) # == 12

# Print chunks
print("********************** First chunk **********************")
print(chunks[0])
print("*********************************************************")

rt.stop()

12
********************** First chunk **********************
WHAT MEN LIVE BY
A shoemaker named Simon, who had neither house nor land of his own, lived with his wife and children in a peasant’s hut, and earned his living by his work. Work was cheap, but bread was dear, and what he earned he spent for food. The man and his wife had but one sheepskin coat between them for winter wear, and even that was torn to tatters, and this was the second year he had been wanting to buy sheep-skins for a new coat. Before winter Simon saved up a little money: a three-rouble note lay hidden in his wife’s box, and five roubles and twenty kopeks were owed him by customers in the village.

So one morning he prepared to go to the village to buy the sheep-skins. He put
on over his shirt his wife’s wadded nankeen jacket, and over that he put his own
cloth coat. He took the three-rouble note in his pocket, cut himself a stick to
serve as a staff, and started off after breakfast. “I’ll collect the five
roubles that are due to me,” thought he, “add the three I have got, and that
will be enough to buy sheep-skins for the winter coat.”

He came to the village and called at a peasant’s hut, but the man was not at
home. The peasant’s wife promised that the money should be paid next week, but
she would not pay it herself. Then Simon called on another peasant, but this one
swore he had no money, and would only pay twenty kopeks which he owed for a pair
of boots Simon had mended. Simon then tried to buy the sheep-skins on credit,
but the dealer would not trust him.

“Bring your money,” said he, “then you may have your pick of the skins. We know
what debt-collecting is like.” So all the business the shoemaker did was to get
the twenty kopeks for boots he had mended, and to take a pair of felt boots a
peasant gave him to sole with leather.

Simon felt downhearted. He spent the twenty kopeks on vodka, and started
homewards without having bought any skins. In the morning he had felt the frost;
but now, after drinking the vodka, he felt warm, even without a sheep-skin coat.
He trudged along, striking his stick on the frozen earth with one hand, swinging
the felt boots with the other, and talking to himself.

I “I’m quite warm,” said he, “though I have no sheep-skin coat. I’ve had a drop,
and it runs through all my veins. I need no sheep-skins. I go along and don’t
worry about anything. That’s the sort of man I am! What do I care? I can live
without sheep-skins. I don’t need them. My wife will fret, to be sure. And, true
enough, it is a shame; one works all day long, and then does not get paid. Stop
a bit! If you don’t bring that money along, sure enough I’ll skin you, blessed
if I don’t. How’s that? He pays twenty kopeks at a time! What can I do with
twenty kopeks? Drink it-that’s all one can do! Hard up, he says he is! So he may
be—but what about me? You have a house, and cattle, and everything; I’ve only
what I stand up in! You have corn of your own growing; I have to buy every
grain. Do what I will, I must spend three roubles every week for bread alone. I
come home and find the bread all used up, and I have to fork out another rouble
and a half. So just pay up what you owe, and no nonsense about it!”

By this time he had nearly reached the shrine at the bend of the road. Looking
up, he saw something whitish behind the shrine. The daylight was fading, and the
shoemaker peered at the thing without being able to make out what it was. “There
was no white stone here before. Can it be an ox? It’s not like an ox. It has a
head like a man, but it’s too white; and what could a man be doing there?”

He came closer, so that it was clearly visible. To his surprise it really was a
man, alive or dead, sitting naked, leaning motionless against the shrine. Terror
seized the shoemaker, and he thought, “Some one has killed him, stripped him,
and left him there. If I meddle I shall surely get into trouble.”

---

Component

Some jobs may require state, like member variables in a class. Ailoy provides support for this kind of stateful structure through a type called a Component.

Direct language model inference

In this example, we’ll directly invoke an LLM using the low-level Runtime API. To begin, a Component must first be initialized using the define call. In here, we define a Component of type tvm_language_model with the name lm0. It can later be removed using the delete call.

The tvm_language_model Component provides a method infer, which can be used to perform LLM inference.

from ailoy import Runtime

rt = Runtime()

rt.define("tvm_language_model", "lm0", {"model": "Qwen/Qwen3-0.6B"})

for resp in rt.call_iter_method(
"lm0", "infer", {"messages": [{"role": "user", "content": [{"type": "text", "text": "What's your name?"}]}]}
):
print(resp)

rt.delete("lm0")

rt.stop()

{'message': {'role': 'assistant', 'content': 'My'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' name'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' is'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' an'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' AI'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' assistant'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ','}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' and'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' I'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': "'m"}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' here'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' to'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' help'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' you'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' with'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' any'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' questions'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' or'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' tasks'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': '.'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' If'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' you'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' have'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' something'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' you'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': "'d"}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' like'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' to'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' discuss'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ','}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' feel'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' free'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' to'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' ask'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': '!'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': ' 😊'}, 'finish_reason': None}
{'message': {'role': 'assistant', 'content': '\n'}, 'finish_reason': 'stop'}

You can put the context of freely structured conversations into Ailoy's low-level Runtime API calls for more advanced multi-turn conversation. e.g. Chain-of-Thought Prompting

In theory, an LLM generates responses based only on the context provided at the moment—it doesn't retain memory of past interactions by itself. Therefore, to receive an appropriate response in an ongoing conversation with an LLM, you must provide the entire conversation history up to that point each time you make a request.

with Runtime() as rt:
rt.define("tvm_language_model", "lm0", {"model": "Qwen/Qwen3-8B"})

messages = [
{"role": "user", "content": [{"type": "text", "text": "if a>1, what is the sum of the real solutions of 'sqrt(a-sqrt(a+x))=x'?"}]},
{"role": "assistant", "content": [{"type": "text", "text": "Let's think step by step."}]}, # Chain-of-Thought prompt
]

for resp in rt.call_iter_method("lm0", "infer", {"messages": messages}):
print(resp["message"]["content"][0]["text"], end='')

rt.delete("lm0")

To override the system message, you can include a message with "role": "system" as the first element in the context messages array.

with Runtime() as rt:
rt.define("tvm_language_model", "lm0", {"model": "Qwen/Qwen3-8B"})

messages = [
{"role": "system", "content": [{"type": "text", "text": "You are a friendly chatbot who always responds in the style of a pirate."}]},
{"role": "user", "content": [{"type": "text", "text": "Who are you?"}]},
]

for resp in rt.call_iter_method("lm0", "infer", {"messages": messages}):
print(resp["message"]["content"][0]["text"], end='')

rt.delete("lm0")