AI Cookbook
← Volver al blog

Google launches Gemini 3 Ultra — first model with true 10M token context, beats GPT-5.5 on long-document tasks

Google DeepMind shipped Gemini 3 Ultra yesterday — the first commercial frontier model with a 10 million token context window that actually performs at full quality across the entire window, not just the first 200k tokens.

On the Needle-in-a-Haystack benchmark at 10M tokens, Gemini 3 Ultra achieves 94.7% recall accuracy. The closest competitor (GPT-5.5 at 1M context) drops to 78% recall past 800k tokens.

What 10M tokens actually means

For practical context:

  • **Entire codebase analysis**: load the full Linux kernel source (~28M lines, ~12M tokens) and ask architectural questions in single prompt
  • **Long-form writing**: feed an entire 1,200-page PhD thesis and ask for cross-chapter logical inconsistencies
  • **Legal review**: load all 16,000 pages of an M&A document set and identify contradictory clauses
  • **Video analysis**: process up to 11 hours of video as part of a single inference

This is not theoretical capability. Google is shipping it via Gemini API today.

Multimodal reasoning got a real upgrade

Beyond context, Gemini 3 Ultra introduces:

  • **Native video understanding** at 1 fps for up to 6 hours per inference
  • **3D scene understanding** from photo inputs (depth, occlusion, object relationships)
  • **Audio with semantic alignment**: transcribe + identify speakers + detect emotions in single pass
  • **Cross-modal grounding**: ask a question in text, the model can cite an exact frame in a video as evidence

The multimodal benchmarks show clear leadership. On MMMU (graduate-level multimodal exam), Gemini 3 Ultra hits 84.2% — the first model to clear 80% on this benchmark.

Pricing reality check

Gemini 3 Ultra is not cheap:

  • $7 input / $28 output per million tokens (under 200k context)
  • $14 input / $56 output per million tokens (200k-2M context tier)
  • $21 input / $84 output per million tokens (2M-10M context tier — usage-priced)

The 10M tier is expensive enough that most users will use it sparingly. But for teams that today run RAG pipelines with chunking, embedding, and reranking — the simpler approach of "stuff everything into context" is now economically viable for high-stakes tasks.

The strategic angle

Google is doing what only Google can do: throw insane amounts of TPU compute at making context cheap. A 10M token inference requires roughly 1,800 TPU-seconds. At Google's internal cost, that's still profitable. At AWS H200 prices, it would be 3-4x more expensive to serve.

For OpenAI and Anthropic: Gemini 3 Ultra is the first model where they cannot copy the headline feature without burning their margins. The TPU advantage finally shows up in product differentiation, not just internal training cost.

Sources

  • Google DeepMind Blog (April 27, 2026): Introducing Gemini 3 Ultra
  • Reuters (April 28, 2026): Google's Gemini 3 ships first true 10M context window
  • Vertex AI Pricing Update (April 27, 2026)