Simple, transparent pricing

Unlimited tokens on every plan. Pay by number of API Keys, not by tokens consumed. No lock-in.

Starter

399€ /month

For small teams starting to integrate AI into their workflows.

5 API Keys (users/agents)
Unlimited tokens on open-source models
2.5B tokens/month on Open SOTA Models
OpenAI-compatible API
Full privacy. Zero logs
Data in the EU
Email support

Get started

Growth

1.299€ /month

For companies that need AI at scale with multiple teams or agents.

15 API Keys (users/agents)
Unlimited tokens on open-source models
7.5B tokens/month on Open SOTA Models
OpenAI-compatible API
Full privacy. Zero logs
Data in the EU
Priority support
99.5% SLA

Get started

Scale

3.199€ /month

For organizations with intensive AI usage and advanced needs.

40 API Keys (users/agents)
Unlimited tokens on open-source models
20B tokens/month on Open SOTA Models
OpenAI-compatible API
Full privacy. Zero logs
Data in the EU
Priority support
99.9% SLA
Early access to new models

Get started

Enterprise

Custom

For organizations that need dedicated GPUs and a custom setup.

+60 API Keys (users/agents)
Unlimited tokens on open-source models
Custom cap on Open SOTA Models
Dedicated GPUs
Custom models
OpenAI-compatible API
Full privacy. Zero logs
Data in the EU
Custom SLA
Dedicated onboarding

All plans include RPM (requests per minute) and concurrency limits per API Key to guarantee service quality.

Dedicated infrastructure

Your own inference stack, in your datacenter

If your use case requires full data sovereignty, we deploy and operate the complete inference stack inside your company's own infrastructure. Your models, your data and your prompts never leave your network.

We also advise you on buying the right hardware (GPUs, memory and network), sized according to your use case, inference volume and the budget you are working with.

Talk to the team

On-premise deployment

We install and operate the turnkey stack on your servers, with the same OpenAI-compatible API.

Hardware advisory

We help you choose the optimal GPUs, memory and network for your use case and budget.

Full data sovereignty

Your data and prompts never leave your network. Ideal for regulated sectors: banking, healthcare or defense.

Frequently asked questions

Which models are available?

The best open-source models available right now: LLMs, embeddings, TTS and STT. Models are updated regularly so you always get the latest from the open-source ecosystem.

Can I use the service with my current tools?

Yes. Access is through a 100% OpenAI-compatible API. It works with OpenCode, Zed, OpenClaw, Hermes, SDKs and any client that accepts a base URL and API key.

Is my data used to train models?

No. Your code and your prompts do not train any model. There are no prompt logs. There is no fine-tuning with customer data. Full privacy by design.

Are there token limits?

There are no token caps on open-source models. The only limits are RPM and concurrency per API Key, designed to protect the shared experience of the cluster. But don't worry, to give you an example: we have users burning more than 500 million tokens with a single API Key in under 24h. On the Open SOTA Models (such as DeepSeek V4-Flash) there is a monthly cap at the organization level: 2.5B on Starter, 7.5B on Growth, 20B on Scale. The counter resets on the 1st of each month (UTC).

What SLA do you offer?

It depends on the plan. Growth includes a 99.5% SLA, Scale 99.9%, and Enterprise a custom SLA. Starter does not include a contractual SLA.

Can I cancel anytime?

Yes. All plans are month to month, with no lock-in. Cancel whenever you want, no penalty.

Need something different?

If your company needs dedicated GPUs, custom models or a specific setup, let's talk.

Talk to the team