On May 17 the platform completed its first month, and in this post we are going to look at how everything went in numbers.
Data window: 30 days (17/04/2026 to 17/05/2026)
1. The scale
In 30 days NaN served:
| Metric | Value |
|---|---|
| Successful requests | 3.678.787 (≈ 3,68 M) |
| Total tokens | 117.674.297.968 (≈ 117,7 B) |
| Input tokens | 116.222.578.716 |
| Output tokens | 1.451.719.252 |
| Embedding tokens | 697.810.776 |
| Days with continuous traffic | 31 / 31 |
More than 117 billion tokens generated. Roughly the equivalent of reading 235,000 complete copies of Don Quixote in a month.
2. The community
For now, signing up for the community happens through a waitlist that has not stopped growing over the last month. These are the numbers:
| Status | Members |
|---|---|
| Joined the waitlist | 1.027 |
| Currently on the waitlist | 151 |
| Subscribed | 305 |
Almost a quarter of those who joined the waitlist ended up entering the community.
Geographic distribution
NaN has been used from 21 countries. The top of the map looked like this:
| Country | % requests |
|---|---|
| 🇨🇴 Colombia | 30,38 % |
| 🇲🇽 Mexico | 21,95 % |
| 🇪🇸 Spain | 15,10 % |
| 🇺🇸 USA | 13,05 % |
| 🇫🇮 Finland | 6,96 % |
| 🇫🇷 France | 4,07 % |
| 🇩🇪 Germany | 3,06 % |
| 🇨🇦 Canada | 1,35 % |
| 🇵🇱 Poland | 1,33 % |
| 🇦🇷 Argentina | 1,31 % |
| Rest (11 countries) | 1,43 % |
LATAM plus Spain add up to 67 % of the traffic. It is predominantly a Spanish-speaking platform for coding agents, with a real presence in Colombia, Mexico, Spain, Argentina, Ecuador, Peru, Uruguay, Chile, Puerto Rico, and El Salvador.
3. The real savings
If those same 115.5 B input tokens plus 1.45 B output (chat completions) had gone through closed providers, the monthly bill would be:
| Provider (in/out price per 1M tokens) | Equivalent cost (30 days) |
|---|---|
| Claude Sonnet 4 ($3 / $15) | $368.374 USD |
| GPT-4o ($2,50 / $10) | $303.348 USD |
| Gemini 2.5 Pro ($1,25 / $10) | $158.935 USD |
| DeepSeek V3 ($0,27 / $1,10) | $32.791 USD |
| GPT-4o-mini ($0,15 / $0,60) | $18.201 USD |
Depending on the model, had we used a private provider we would have spent between ~$18K and over $360K.
What each user saves
| User type | Tokens/month | Costs in Claude Sonnet 4 | Costs in GPT-4o | Pays in NaN |
|---|---|---|---|---|
| P50 (median) | 112,6 M | $347,13 | $287,36 | 70€ / $75 |
| P90 (power user) | 1,11 B | $3.509,97 | $2.869,71 | 70€ / $75 |
35 members exceeded 1 billion tokens during the month.
The typical NaN user already consumes between $287 and $347 USD/month of value equivalent to GPT-4o or Claude Sonnet 4. The most active 10% sits between $2,800 and $3,500 USD/month of equivalent value. Everyone pays the same: 70€ or $75 depending on the region.
4. Performance
The section we are proudest of from the first month.
| Metric | Value |
|---|---|
| Uptime (excluding client errors) | 99,986 % |
| Global success rate | 99,556 % |
| Our own 5xx errors | 505 / 3.695.485 (0,014 %) |
| Client 4xx errors | 13.378 (0,36 %) |
Aggregate throughput
| Metric | Value |
|---|---|
| Tokens/second (sustained avg) | ~46.056 |
| Tokens/second (peak) | 285.270 |
| Tokens/minute (peak) | 17.116.195 |
Latency (chat completions, user view)
| Metric | Value |
|---|---|
| TTFT (time to first token) P50 | 1.013 ms |
| TTFT P95 | 21.066 ms |
| Total request duration P50 | 2.660 ms |
| Total request duration P95 | 37.245 ms |
Roughly 1 second from your request to the first token.
5. Available models
Every member has access to all the models in the stack:
| Model | Function | Requests | Tokens |
|---|---|---|---|
| Qwen 3.6 (35B-A3B) | Main chat and coding | 3.282.599 | 114,36 B |
| Gemma 4 (26B-A4B) | Fast chat, low latency | 277.602 | 2,62 B |
| Qwen3 Embedding | Vector search, RAG | 113.564 | 698 M |
| Whisper | Speech-to-text | 3.993 | N/A |
| Kokoro | Text-to-speech (af_heart, ef_dora, em_alex) | 1.565 | N/A |
We offer a complete stack of models: LLMs, embeddings, transcription, and speech synthesis, all under the same membership. On top of that, this month we have started to explore the possibility of adding SOTA models.
The first one to arrive is DeepSeek V4 Flash. Next month there will be reports on this new tier of models that we have unlocked.
6. How NaN is used
Distribution by client / SDK:
| Client | Requests | % |
|---|---|---|
| OpenAI Python SDK (sync + async) | 1.666.766 | 45,32 % |
| opencode (coding agent in Bun) | 742.336 | 20,18 % |
| OpenAI JS / Node / Bun | 614.231 | 16,70 % |
| Python (httpx / requests raw) | 378.851 | 10,30 % |
| Other | 142.366 | 3,87 % |
| Go (SDK + raw) | 86.142 | 2,34 % |
| Anthropic SDK (via proxy) | 21.023 | 0,57 % |
| PHP (GuzzleHttp) | 17.520 | 0,48 % |
| Cursor | 5.378 | 0,15 % |
| Cline | 3.513 | 0,10 % |
Two takeaways:
- The official OpenAI SDK works against NaN with no changes. You just need to point it at a
base_urland aapi_key. Most clients use this same communication standard. That explains 45% of the traffic. - opencode has established itself as the community's favorite coding agent: 20% of all traffic, with typically large prompts.
NaN is being used to do coding tasks in languages like Python, JS, Go, PHP, and Rust.
7. Usage patterns
Prompt size (chat completions, tokens)
| Percentile | Tokens |
|---|---|
| P10 | 140 |
| P50 | 4.443 |
| P90 | 100.890 |
| P99 | 202.467 |
| Maximum | 262.052 |
Half of the calls send more than 4,400 tokens of context. The largest 10% send more than 100,000. NaN is used for coding agents, with entire projects as context.
Day of the week
| Day | Requests |
|---|---|
| Wednesday | 644.104 |
| Tuesday | 620.146 |
| Monday | 561.809 |
| Thursday | 527.489 |
| Sunday | 466.320 |
| Friday | 434.390 |
| Saturday | 425.534 |
Weekdays are when NaN gets used the most, but usage does not drop below 66% on weekends either. So while the presence during working hours is higher, it never stops being used outside of them.
Day-by-day growth
| Date | Requests | Tokens | Active users |
|---|---|---|---|
| 17/04 (day 1) | 28.843 | 1,06 B | 25 |
| 30/04 | 60.634 | 4,47 B | 81 |
| 08/05 | 218.620 | 4,09 B | 127 |
| 15/05 | 107.014 | 4,58 B | 170 |
| 16/05 | 257.787 | 6,70 B | 177 |
| 17/05 (peak) | 278.492 | 5,95 B | 179 |
~10x in requests/day and ~7x in daily active users in the first 30 days.
8. Agents and Spaces
- Two weeks ago we enabled the option to deploy a hermes agent for each user in their own private Sandbox (microVM). There are currently 128 active agents.
- The latest feature released in NaN Cloud is that every community member now gets a private space with 2 vCPU, 4GB of RAM, and 20 GB of disk to deploy applications. There are currently 66 Spaces and 12 user applications deployed on the platform.
9. What is coming
- DeepSeek V4 Flash is already available as on-demand SOTA for members who need it.
- More inference capacity to sustain the pace of growth.
- More open models as they appear, without changing the membership.
- A project by and for the community. We will start driving Open Source projects to improve the community experience, especially around the current documentation, support, and the Discord bot.
10. A few recommendations
- It is important to understand that Gemma and Qwen have a 256K context window. It is essential to set this limit correctly in the client you use (OpenCode, Pi, etc.) and likewise to define a margin to compact that context before reaching the limit. Example in Opencode.
- Try not to drag out or reuse sessions unnecessarily. Do atomic tasks with a beginning and an end that should be born and die within a single session.
- Find the right workflow. Something that has worked for several community users is using more powerful models to plan and validate code, and using Qwen or Gemma to execute all the tasks that you need. Now with DeepSeek we can use it as orchestrator/leader.
- Using the clients (OpenCode, Pi, Hermes, etc.) exactly as they come by default does not work. The most important thing is your harness, because depending on it the model will get better or worse results.
- Given the previous point, make the most of the different Discord channels. Explore and try out new skills, tools, CLIs, clients, and agents. The community is extremely active in answering doubts and questions and giving recommendations.
- Remember that every month we will hold two sessions. Either an event or a workshop that you can watch recorded whenever you want on NaN .
- Take advantage of Spaces to deploy applications or custom agents! (a short tutorial on how to set up a chatbot deployed on Spaces is coming soon)
NaN was born to bring together people who are building things and who can also take advantage of open inference models. That is how we set up our first server. Today it is already a cluster of 7 servers and 11 GPUs dedicated exclusively to serving models for NaN.
It has been a month of absolute madness and a lot of work to make all of this happen. For my part, all that is left is to thank you for your trust, and know that this is only the beginning. Onward! 🚀