AI Cookbook
← Voltar ao blog

OpenAI launches o5 reasoning model — solves PhD physics problems 73% of the time, prices crash on o3

OpenAI released o5 yesterday — the next evolution of its reasoning model line, replacing o3 and o4. On the GPQA Diamond benchmark (PhD-level physics, biology, and chemistry questions), o5 scores 73.2%, up from o4's 58% and beating Claude Opus 4.7's 71%.

More importantly: o3 just dropped to one-tenth of its previous price, and o5 ships at the price-point where o3 launched eight months ago.

The benchmarks that moved

On standard reasoning benchmarks:

  • **GPQA Diamond**: 73.2% (was 58% on o4)
  • **AIME 2025**: 96.8% (was 89% on o4)
  • **Codeforces ELO**: 2,847 (was 2,420 on o4) — top 0.1% of human competitive programmers
  • **FrontierMath**: 41.2% (was 24% on o4) — research-level math, prior models max around 5%

The FrontierMath jump is the killer. Tao, Gowers, and other Fields medalists who designed those problems estimated they would resist AI for "years". o5 cracks 41% of them in single-pass with chain-of-thought.

Pricing structure

OpenAI restructured the entire reasoning tier:

  • **o3**: now $0.50 input / $2.00 output per million tokens (was $5 / $20)
  • **o4**: now $2.00 input / $8.00 output per million tokens (was $15 / $60)
  • **o5**: $8.00 input / $32.00 output per million tokens (premium tier)
  • **o5-mini**: $0.30 input / $1.20 output per million tokens

The o5-mini variant is the dark horse. It hits 65% on GPQA Diamond at a tenth of o5's price — making advanced reasoning available for routine production workloads.

What changed under the hood

Sam Altman in the launch livestream highlighted three architectural improvements:

  • **Compressed reasoning traces**: o5 produces 40% shorter chains for equivalent accuracy
  • **Tool calling within reasoning**: o5 can invoke search, code execution, and file read mid-reasoning without breaking the chain
  • **Self-correction loops**: when an early step is wrong, the model now backtracks rather than committing forward

The third point is the architectural shift. Previous reasoning models built monotonically — each step assumed prior steps were correct. o5 explicitly verifies intermediate conclusions and revises.

Why this matters

For research, science, and complex engineering: this is the first model that can reliably help solve graduate-level technical problems instead of requiring an expert to verify every step.

For everyone else: the pricing crash on o3 means anyone can route routine tasks to a model that scores 58% on GPQA. That's college senior level reasoning at $0.50/M tokens — cheaper than GPT-3.5 was 18 months ago.

Sources

  • OpenAI Blog (April 27, 2026): Introducing OpenAI o5
  • The Information (April 28, 2026): OpenAI's o5 cracks PhD-level physics
  • OpenAI Pricing Update (April 27, 2026)