AI Cookbook
← Back to blog

xAI ships Grok 4 — first frontier model with native real-time research, beats GPT-5.5 on news QA

xAI released Grok 4 yesterday. The headline feature: native real-time research integrated at the model layer, not bolted on as a tool call. Grok 4 has continuous access to the X (formerly Twitter) firehose plus a Brave Search agent that fires automatically when the model determines a query needs current information.

On the NewsQA benchmark (questions about events from the last 7 days), Grok 4 scores 89.4%. GPT-5.5 with web search enabled scores 71%. Claude Opus 4.7 with browsing scores 68%.

What Grok 4 does differently

Three architectural choices:

  • **Continuous knowledge ingestion**: the model is fine-tuned every 6 hours on the previous 6 hours of X posts plus public news; knowledge cutoff is 6 hours, not 6 months
  • **Search-as-attention**: web search is not a separate tool — it's an attention mechanism; Grok knows which tokens to fetch from search vs. its weights at inference time
  • **Voice-first interaction**: Grok 4 ships with a real-time voice mode that interrupts naturally and takes follow-up questions while still responding

The voice mode is the unlock for many users. Conversation latency is 280ms (was 1.4s in Grok 3), making it the first AI voice mode that doesn't feel like a walkie-talkie.

Pricing and availability

  • $20/month X Premium+ subscribers get full Grok 4 access
  • $40/month X Premium AI tier gets unlimited Grok 4 + Grok 4 Heavy
  • API access launching May 15: pricing $4 input / $16 output per million tokens
  • Free tier: 5 queries per day for X Free users

How it ranks against frontier

Grok 4 is roughly third place on most general benchmarks, behind Claude Opus 4.7 and OpenAI o5. But on tasks involving current events, real-time data, or social context, it leads:

  • **NewsQA** (last 7 days): 89% (frontier average: 70%)
  • **TwitterContext** (X-specific cultural queries): 94% (frontier: 35%)
  • **TimeAware-1k** (time-sensitive math/finance): 87% (frontier: 65%)

For pure reasoning, math, and coding: Grok 4 is competitive but not leading. It's a different bet — depth of current-events reasoning over peak quality.

The Musk angle

Worth noting: Grok 4 is positioned to be less censored than competing models. Elon explicitly trained the model to engage with controversial topics, conspiracy theories, and explicit content within X's guidelines. Whether that's a feature or a bug depends on use case.

For research and journalism: a model that doesn't reflexively refuse to discuss difficult topics is genuinely useful. For enterprise deployment: the looser content guardrails will require additional safety layers in production.

What this means for the field

Grok 4 makes a structural argument: the frontier in 2026 isn't just about parameter count or eval scores. It's about who has the best data pipeline. xAI controls X. That's 500M+ daily active users producing real-time signal that Anthropic, OpenAI, and Google can't access at the same density.

If real-time AI becomes the default expectation (and Apple Intelligence, Gemini 3 with Google Search, and ChatGPT Search are all moving that direction), Grok's data pipeline matters more than its peak benchmark scores.

Sources

  • xAI announcement (April 27, 2026): Grok 4 release notes
  • The Information (April 28, 2026): Grok 4's real-time research changes the game
  • TechCrunch (April 28, 2026): Inside Grok 4's voice mode