Skip to main content

Devices & Environments

On-device AI is one of biggest feature of Ailoy that make it different from other AI tools.
To make it feasible, you might need to check the conditions for the devices & environments to run AIs.

Supported environments

note

To see the lastest details on system requirements, refer to the official repository's README.md.

  • macOS devices running on Apple Silicon with Metal support
  • Windows PC running on latest x86_64 CPU & GPU with Vulkan 1.3 support
  • Linux PC running on latest x86_64 CPU & GPU with Vulkan 1.3 support

...and WebGPU and Mobile devices supports are on plan!

VRAM requirements

warning

These values may vary depending on the environment and circumstances.

Requirements for available VRAM size by models are estimated as follows:

ModelContext lengthVRAM (params)VRAM (total)
Qwen/Qwen3-0.6B32k≈ 0.5GB≈ 4.5GB
Qwen/Qwen3-1.7B32k≈ 1.0GB≈ 5.0GB
Qwen/Qwen3-4B32k≈ 2.4GB≈ 6.4GB
Qwen/Qwen3-8B32k≈ 4.5GB≈ 8.5GB
BAAI/bge-m38k≈ 0.3GB≈ 0.3GB

Device selection

In some cases, you may want to exploit Ailoy in an environment with multiple accelerators, in which case you may want to select the accelerator on which to run AI.
e.g.) Running Embedding model and Language model on separate devices in a RAG application

Each AI model exists as a component in Ailoy, and components can be created by giving them a device ID to load each model onto the corresponding device.
The default device ID is 0, which uses the first device in the system.

Using Agent & VectorStore

agent = Agent(runtime, "Qwen/Qwen3-8B", attrs={"device": 1})
vs = VectorStore(runtime, "BAAI/bge-m3", "faiss", embedding_model_attrs={"device": 0})

...

agent.delete()
vs.delete()

Using Runtime APIs

rt.define("tvm_language_model", "lm0", {"model": "Qwen/Qwen3-8B", "device": 1})
rt.define("tvm_embedding_model", "em0", {"model": "BAAI/bge-m3", "device": 0})