Devices & Environments

On-device AI is one of biggest feature of Ailoy that make it different from other AI tools.
To make it feasible, you might need to check the conditions for the devices & environments to run AIs.

Supported environments

note

To see the lastest details on system requirements, refer to the official repository's README.md.

macOS devices running on Apple Silicon with Metal support
Windows PC running on latest x8664 CPU & GPU with _Vulkan 1.3 support
Linux PC running on latest x8664 CPU & GPU with _Vulkan 1.3 support

...and WebGPU and Mobile devices supports are on plan!

VRAM requirements

warning

These values may vary depending on the environment and circumstances.

Requirements for available VRAM size by models are estimated as follows:

Model	Context length	VRAM (params)	VRAM (total)
`BAAI/bge-m3`	8k	≈ 0.3 GB	≈ 0.3 GB
`Qwen/Qwen3-0.6B`	40k	≈ 0.5 GB	≈ 5.0 GB
`Qwen/Qwen3-1.7B`	40k	≈ 1.0 GB	≈ 5.5 GB
`Qwen/Qwen3-4B`	40k	≈ 2.4 GB	≈ 8.0 GB
`Qwen/Qwen3-8B`	40k	≈ 4.5 GB	≈ 10.5 GB
`Qwen/Qwen3-14B`	40k	≈ 8.0 GB	≈ 14.5 GB
`Qwen/Qwen3-32B`	40k	≈ 17.6 GB	≈ 25 GB
`Qwen/Qwen3-30B-A3B`	40k	≈ 16.5 GB	≈ 24 GB

Device selection

In some cases, you may want to exploit Ailoy in an environment with multiple accelerators, in which case you may want to select the accelerator on which to run AI.
e.g.) Running Embedding model and Language model on separate devices in a RAG application

Each AI model exists as a component in Ailoy, and components can be created by giving them a device ID to load each model onto the corresponding device.
The default device ID is 0, which uses the first device in the system.

Using Agent & VectorStore

Python
JavaScript

agent = Agent(runtime, LocalModel("Qwen/Qwen3-8B", device=1))
vs = VectorStore(runtime, "BAAI/bge-m3", "faiss", embedding_model_attrs={"device": 0})

...

agent.delete()
vs.delete()

const agent = await defineAgent(rt, LocalModel({id: "Qwen/Qwen3-8B", device: 1}));
const vs = await defineVectorStore(rt, "BAAI/bge-m3", "faiss", { device : 0 });

...

await agent.delete()
await vs.delete()
...

Using Runtime APIs

Python
JavaScript

rt.define("tvm_language_model", "lm0", {"model": "Qwen/Qwen3-8B", "device": 1})
rt.define("tvm_embedding_model", "em0", {"model": "BAAI/bge-m3", "device": 0})

await rt.define("tvm_language_model", "lm0", {
  model: "Qwen/Qwen3-8B",
  device: 1,
});
await rt.define("tvm_embedding_model", "em0", {
  model: "BAAI/bge-m3",
  device: 0,
});

Supported environments​

VRAM requirements​

Device selection​

Using Agent & VectorStore​

Using Runtime APIs​

Supported environments

VRAM requirements

Device selection

Using Agent & VectorStore

Using Runtime APIs