Inference from a Python function to a billion devices.

@compile AI models into native binaries. Deploy on cloud GPU fleets, personal devices, and everything in-between.

0b00Open-weight & Proprietary Models

Any open model.
Every modality.
One drop-in SDK.

We provide an OpenAI-compatible client across every framework you build in: Python, JavaScript, Kotlin, Unity, and Swift.

0b01Inference Placement and Cost

Tune latency & cost per request.
Serve 3× more.

Decide where each inference runs at call-time. Prioritize latency, throughput, or cost with extremely fine control.

Price · vs hosted inference

embedding = muna.beta.openai.embeddings.create(
    input="I can choose where each and every inference runs?",
    model="@nomic/nomic-embed-text-v1.5",
    acceleration="..."
)

Traditional ProviderH100 · $6.50/hr

Muna

0b10Cold starts

No containers.
No cold starts.
Boot 45× faster.

By removing everything between your model and the GPU, cold starts disappear. The first call lands as fast as the millionth.

Cold start · container vs binary

Traditional

Muna

0b11Quickstart

From `pip install` to your first prediction in one minute.

Literally two commands. No sign up required to start.

terminal

# Install the Muna CLI and Python client
$ pip install muna

# Create speech with Supertonic 2 TTS
$ muna predict @supertone/supertonic-2  \
  --text "It was the best of times"     \
  --voice "M2"

Read the docs Get Started

Inference from a Python function to a billion devices.

Any open model. Every modality. One drop-in SDK.

Large Language Models

Audio and Voice

Vision

Embeddings

Tune latency & cost per request.Serve 3× more.

No containers. No cold starts. Boot 45× faster.

From pip install to your first prediction in one minute.

Any open model.
Every modality.
One drop-in SDK.

Tune latency & cost per request.
Serve 3× more.

No containers.
No cold starts.
Boot 45× faster.

From `pip install` to your first prediction in one minute.