
Inference from a Python function to a billion devices.
@compile AI models into native binaries. Deploy on cloud GPU fleets, personal devices, and everything in-between.
0b00Open-weight & Proprietary Models
Any open model.
Every modality.
One drop-in SDK.
We provide an OpenAI-compatible client across every framework you build in: Python, JavaScript, Kotlin, Unity, and Swift.
Large Language Models
Audio and Voice
Embeddings
0b01Inference Placement and Cost
Tune latency & cost per request.
Serve 3× more.
Decide where each inference runs at call-time. Prioritize latency, throughput, or cost with extremely fine control.
Price · vs hosted inference
embedding = muna.beta.openai.embeddings.create(
input="I can choose where each and every inference runs?",
model="@nomic/nomic-embed-text-v1.5",
acceleration="..."
)
Traditional ProviderH100 · $6.50/hr
Muna
0b10Cold starts
No containers.
No cold starts.
Boot 45× faster.
By removing everything between your model and the GPU, cold starts disappear. The first call lands as fast as the millionth.
Cold start · container vs binary
Traditional
Muna

0b11Quickstart
From pip install to your first prediction in one minute.
Literally two commands. No sign up required to start.
terminal
# Install the Muna CLI and Python client
$ pip install muna
# Create speech with Supertonic 2 TTS
$ muna predict @supertone/supertonic-2 \
--text "It was the best of times" \
--voice "M2"