The inference runtime for the age of personal agents — reproducible, source-verified MoE/LLM decode across consumer & edge Blackwell: from RTX Spark on your desk to Jetson Thor in the robot.

sm_120 + edge sm_121 (not datacenter sm_100)
"A new class of processor for the age of personal agents." Run agents locally and privately on a Windows PC.

"The ultimate platform for physical AI and robotics" — agentic reasoning at the edge, in humanoid robots.

Desktop flagship — big-model local inference on a 32 GB card. Where the frontier is measured today.

96 GB Blackwell — a full MoE plus all experts and a large KV cache stay resident, no paging.

Qwen3-30B-A3B / 35B-A3B — 128–256 experts, top-8, GQA head_dim 128, uniform full attention. Runs end-to-end at the current frontier with 100% token-match vs llama.cpp.

Gemma 4 26B-A4B — 128 experts top-8 + shared, interleaved local-SWA / global attention, head_dim 512, dual RoPE. Deliberately unlike Qwen, so an optimization can't overfit one architecture.
| PR | area | label | decode | vs frontier |
|---|