Speed Up Your AI App: One Header, 8 Seconds → 1.5 Seconds
Introducing the X-Thinking-Mode header. For tools, translation, and lightweight chat apps, one extra line cuts AI response time by 5×.
TL;DR
If your app is a tool, translator, simple chatbot, JSON generator, or anything similarly lightweight — add this header to your /api/ai/gemini requests:
X-Thinking-Mode: fastAverage response time drops from ~8s to ~1.5s. Don't send the header and nothing changes.
Why is your AI call slow?
We dug into our production logs recently and found something surprising: on most AI calls, the model spends 60–80% of its time "thinking" — internal reasoning the user never sees.
Concretely: Gemini 3 Flash generates an average of 1,001 thinking tokens per call, while the user-facing answer is only 434 tokens. The model is doing 2.3× more thinking than answering.
For some workloads this helps — complex role-play, multi-step reasoning, long-context callbacks. But for most lightweight tasks ("translate this," "summarize this," "give me JSON"), thinking doesn't help much. It's just latency.
Now you can opt out
We added an opt-in header: X-Thinking-Mode. Three values:
| Mode | What it does | Best for |
|---|---|---|
fast | thinkingLevel: minimal | Tools, translation, short chat, classifiers, JSON generation |
balanced | thinkingBudget: 200 | Medium-complexity tasks that benefit from a little reasoning |
| _(omit)_ | Current default behavior | Long-form RPG, multi-turn role-play, complex narratives |
`fast` doesn't kill thinking entirely — it's Google's adaptive setting. The model uses 0 thinking tokens on simple prompts and the minimum necessary on complex ones. Quality loss is smaller than you'd expect.
How to use it
Add one header:
const response = await fetch('/api/ai/gemini', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Thinking-Mode': 'fast', // 👈 add this
},
body: JSON.stringify({
path: '/v1beta/models/gemini-3-flash-preview:generateContent',
contents: [{ parts: [{ text: 'Translate "hello world" to French' }], role: 'user' }],
}),
})Per-request — you can mix modes in the same app:
- Player input → quick NPC reply → use
fast - Critical plot turn / ending resolution → omit the header, let the model think
- Structured JSON tool calls → use
balanced
When should you use it?
Strongly recommend `fast` for:
- Translation, summarization, paraphrasing
- Simple chatbots
- Utility apps (calculators, document tools, code snippets)
- Classification, tagging
- Short NPC dialogue
- Anything where the prompt is clear and the output is short
Use `balanced` for:
- Longer creative writing
- Medium-stakes role-play turns
- Structured output (JSON tool calls)
Stay on default (no header) for:
- Long-form narrative generation
- Multi-character consistency in one segment
- Long-context callbacks ("remember the X from chapter 5?")
- Critical decisions (ending triggers, rule resolution)
Trade-offs
fast mode may slightly degrade quality on:
- Multi-step reasoning (math, logic puzzles)
- Outputs that need long-range coherence (consistent details across paragraphs)
- Structured outputs with many fields (occasional missing keys when the schema is complex)
If you see noticeable quality drops, just remove the header to restore default behavior. Or step down to balanced.
How does it interact with thinkingConfig you already set?
If you've explicitly set `thinkingConfig` in generationConfig, the platform will not override it with the header. Your code wins.
Precedence (high → low):
- Your
thinkingConfigin the request body X-Thinking-Modeheader- Platform defaults
Why now?
A lot of creators have asked us "why is AI a bit slow?" When we looked at the data, we found that almost no one had made a deliberate choice about thinking — 97% of calls were running on whatever the default happened to be. Giving you a simple opt-in is the cleanest way to hand control back.
Next up we're adding a graphical "performance mode" toggle to the creator dashboard for non-developers. The API layer comes first.
Try it out. If you hit issues or notice quality drops, let us know →