NaN: the community's first month in numbers

117 billion tokens, 3.68 million requests, 21 countries, and 99.98% uptime. NaN is a community of builders with its own inference infrastructure and a private platform to deploy apps and agents.

On May 17 the platform completed its first month, and in this post we are going to look at how everything went in numbers.

Data window: 30 days (17/04/2026 to 17/05/2026)

1. The scale

In 30 days NaN served:

MetricValue
Successful requests3.678.787 (≈ 3,68 M)
Total tokens117.674.297.968 (≈ 117,7 B)
Input tokens116.222.578.716
Output tokens1.451.719.252
Embedding tokens697.810.776
Days with continuous traffic31 / 31

More than 117 billion tokens generated. Roughly the equivalent of reading 235,000 complete copies of Don Quixote in a month.

2. The community

For now, signing up for the community happens through a waitlist that has not stopped growing over the last month. These are the numbers:

StatusMembers
Joined the waitlist1.027
Currently on the waitlist151
Subscribed305

Almost a quarter of those who joined the waitlist ended up entering the community.

Geographic distribution

NaN has been used from 21 countries. The top of the map looked like this:

Country% requests
🇨🇴 Colombia30,38 %
🇲🇽 Mexico21,95 %
🇪🇸 Spain15,10 %
🇺🇸 USA13,05 %
🇫🇮 Finland6,96 %
🇫🇷 France4,07 %
🇩🇪 Germany3,06 %
🇨🇦 Canada1,35 %
🇵🇱 Poland1,33 %
🇦🇷 Argentina1,31 %
Rest (11 countries)1,43 %

LATAM plus Spain add up to 67 % of the traffic. It is predominantly a Spanish-speaking platform for coding agents, with a real presence in Colombia, Mexico, Spain, Argentina, Ecuador, Peru, Uruguay, Chile, Puerto Rico, and El Salvador.

3. The real savings

If those same 115.5 B input tokens plus 1.45 B output (chat completions) had gone through closed providers, the monthly bill would be:

Provider (in/out price per 1M tokens)Equivalent cost (30 days)
Claude Sonnet 4 ($3 / $15)$368.374 USD
GPT-4o ($2,50 / $10)$303.348 USD
Gemini 2.5 Pro ($1,25 / $10)$158.935 USD
DeepSeek V3 ($0,27 / $1,10)$32.791 USD
GPT-4o-mini ($0,15 / $0,60)$18.201 USD

Depending on the model, had we used a private provider we would have spent between ~$18K and over $360K.

What each user saves

User typeTokens/monthCosts in Claude Sonnet 4Costs in GPT-4oPays in NaN
P50 (median)112,6 M$347,13$287,3670€ / $75
P90 (power user)1,11 B$3.509,97$2.869,7170€ / $75

35 members exceeded 1 billion tokens during the month.

The typical NaN user already consumes between $287 and $347 USD/month of value equivalent to GPT-4o or Claude Sonnet 4. The most active 10% sits between $2,800 and $3,500 USD/month of equivalent value. Everyone pays the same: 70€ or $75 depending on the region.

4. Performance

The section we are proudest of from the first month.

MetricValue
Uptime (excluding client errors)99,986 %
Global success rate99,556 %
Our own 5xx errors505 / 3.695.485 (0,014 %)
Client 4xx errors13.378 (0,36 %)

Aggregate throughput

MetricValue
Tokens/second (sustained avg)~46.056
Tokens/second (peak)285.270
Tokens/minute (peak)17.116.195

Latency (chat completions, user view)

MetricValue
TTFT (time to first token) P501.013 ms
TTFT P9521.066 ms
Total request duration P502.660 ms
Total request duration P9537.245 ms

Roughly 1 second from your request to the first token.

5. Available models

Every member has access to all the models in the stack:

ModelFunctionRequestsTokens
Qwen 3.6 (35B-A3B)Main chat and coding3.282.599114,36 B
Gemma 4 (26B-A4B)Fast chat, low latency277.6022,62 B
Qwen3 EmbeddingVector search, RAG113.564698 M
WhisperSpeech-to-text3.993N/A
KokoroText-to-speech (af_heart, ef_dora, em_alex)1.565N/A

We offer a complete stack of models: LLMs, embeddings, transcription, and speech synthesis, all under the same membership. On top of that, this month we have started to explore the possibility of adding SOTA models.

The first one to arrive is DeepSeek V4 Flash. Next month there will be reports on this new tier of models that we have unlocked.

6. How NaN is used

Distribution by client / SDK:

ClientRequests%
OpenAI Python SDK (sync + async)1.666.76645,32 %
opencode (coding agent in Bun)742.33620,18 %
OpenAI JS / Node / Bun614.23116,70 %
Python (httpx / requests raw)378.85110,30 %
Other142.3663,87 %
Go (SDK + raw)86.1422,34 %
Anthropic SDK (via proxy)21.0230,57 %
PHP (GuzzleHttp)17.5200,48 %
Cursor5.3780,15 %
Cline3.5130,10 %

Two takeaways:

  • The official OpenAI SDK works against NaN with no changes. You just need to point it at a base_url and a api_key. Most clients use this same communication standard. That explains 45% of the traffic.
  • opencode has established itself as the community's favorite coding agent: 20% of all traffic, with typically large prompts.

NaN is being used to do coding tasks in languages like Python, JS, Go, PHP, and Rust.

7. Usage patterns

Prompt size (chat completions, tokens)

PercentileTokens
P10140
P504.443
P90100.890
P99202.467
Maximum262.052

Half of the calls send more than 4,400 tokens of context. The largest 10% send more than 100,000. NaN is used for coding agents, with entire projects as context.

Day of the week

DayRequests
Wednesday644.104
Tuesday620.146
Monday561.809
Thursday527.489
Sunday466.320
Friday434.390
Saturday425.534

Weekdays are when NaN gets used the most, but usage does not drop below 66% on weekends either. So while the presence during working hours is higher, it never stops being used outside of them.

Day-by-day growth

DateRequestsTokensActive users
17/04 (day 1)28.8431,06 B25
30/0460.6344,47 B81
08/05218.6204,09 B127
15/05107.0144,58 B170
16/05257.7876,70 B177
17/05 (peak)278.4925,95 B179

~10x in requests/day and ~7x in daily active users in the first 30 days.

8. Agents and Spaces

  • Two weeks ago we enabled the option to deploy a hermes agent for each user in their own private Sandbox (microVM). There are currently 128 active agents.
  • The latest feature released in NaN Cloud is that every community member now gets a private space with 2 vCPU, 4GB of RAM, and 20 GB of disk to deploy applications. There are currently 66 Spaces and 12 user applications deployed on the platform.

9. What is coming

  • DeepSeek V4 Flash is already available as on-demand SOTA for members who need it.
  • More inference capacity to sustain the pace of growth.
  • More open models as they appear, without changing the membership.
  • A project by and for the community. We will start driving Open Source projects to improve the community experience, especially around the current documentation, support, and the Discord bot.

10. A few recommendations

  • It is important to understand that Gemma and Qwen have a 256K context window. It is essential to set this limit correctly in the client you use (OpenCode, Pi, etc.) and likewise to define a margin to compact that context before reaching the limit. Example in Opencode.
  • Try not to drag out or reuse sessions unnecessarily. Do atomic tasks with a beginning and an end that should be born and die within a single session.
  • Find the right workflow. Something that has worked for several community users is using more powerful models to plan and validate code, and using Qwen or Gemma to execute all the tasks that you need. Now with DeepSeek we can use it as orchestrator/leader.
  • Using the clients (OpenCode, Pi, Hermes, etc.) exactly as they come by default does not work. The most important thing is your harness, because depending on it the model will get better or worse results.
  • Given the previous point, make the most of the different Discord channels. Explore and try out new skills, tools, CLIs, clients, and agents. The community is extremely active in answering doubts and questions and giving recommendations.
  • Remember that every month we will hold two sessions. Either an event or a workshop that you can watch recorded whenever you want on NaN .
  • Take advantage of Spaces to deploy applications or custom agents! (a short tutorial on how to set up a chatbot deployed on Spaces is coming soon)

NaN was born to bring together people who are building things and who can also take advantage of open inference models. That is how we set up our first server. Today it is already a cluster of 7 servers and 11 GPUs dedicated exclusively to serving models for NaN.

It has been a month of absolute madness and a lot of work to make all of this happen. For my part, all that is left is to thank you for your trust, and know that this is only the beginning. Onward! 🚀