Grok 4.3 Review: Cheap LLM, Real Tradeoffs (2026)

Grok 4.3 is 90% cheaper than Claude Opus 4.7 with comparable smarts. The $300 SuperGrok Heavy plan? Trap.

Grok 4.3 launched April 30 at $1.25 per million input tokens — roughly 90% cheaper than Claude Opus 4.7 for comparable intelligence-index scores. We ran it through our usual gauntlet: code tasks, long-context document drops, agentic workflows, and the always-on reasoning that xAI baked in whether you want it or not. The “budget frontier model” pitch is mostly real. The “ChatGPT killer” pitch is not.

What Grok 4.3 Actually Is (and Isn’t)

Grok 4.3 is not the 16-agent flagship (that’s Grok 4 Heavy). It’s the standard reasoning model with a 1M token context window and always-on chain-of-thought reasoning that you can’t turn off. Released April 30, 2026, it’s xAI’s move to own the “intelligent enough, price aggressive enough” tier where most API builders actually live.

Artificial Analysis puts its Intelligence Index at 53. That sits below GPT-5.5 (60) and Claude Opus 4.7 (57) on the same leaderboard — both still own the top of the reasoning bracket. If you need the absolute highest reasoning performance, those two still win. If you need “good enough reasoning for half the cost,” Grok 4.3 fits.

The Pricing Math Nobody Does for You

Here’s what matters if you’re actually building something:

API pricing (per million tokens):

Input: $1.25
Output: $2.50
Cached input: $0.20 (84% discount after cache hit)

For comparison, Claude Sonnet 4.6 input runs $3/M, GPT-5.5 input runs $2.50/M. Grok’s aggressive input pricing is real. Output pricing is mid-market — Claude Opus 4.7 hits $15/M output, so Grok’s $2.50 still wins.

Consumer tier (SuperGrok):

SuperGrok: $30/month, 100M tokens/month
SuperGrok Heavy: $300/month, 2B tokens/month (Grok 4 Heavy, not 4.3)

If you’re doing API work, ignore the consumer tier. It’s priced for hobbyists. The Heavy tier’s $300 price point is a trap unless you genuinely need the 16-agent flagship for daily work. Most teams should use the API directly.

What We Actually Tested

We loaded an 80K-token codebase dump into Grok 4.3 and asked it to suggest refactoring. It did. We ran a PDF-to-PowerPoint generation task (5-page financial document → structured slides). It worked, output was actually production-ready. We fed it a 5-minute video clip and asked it to summarize action sequences. Native video input worked without hallucinating.

Agentic multi-step workflows (research → validate → format) completed without derailing. Instruction-following stress tests showed Grok 4.3 respecting constraints, though we noticed higher verbosity than the spec claimed — outputs ran longer than we’d expect, padding with explanatory text that didn’t always earn its place.

We haven’t benchmarked latency head-to-head against Claude or GPT-5.5. Our informal observation: time-to-first-token sat in the 8–12 second range on complex reasoning tasks, noticeably slower than what we see from Claude Sonnet 4.6 or GPT-5.5 on equivalent prompts. That’s the always-on reasoning tax.

Where Grok 4.3 Wins

Price-to-intelligence ratio. $1.25/M input tokens with reasoning built in beats any competitor in this bracket. If you’re building high-volume document processing, bulk code review, or research agents, the unit economics are hard to ignore.

Native file generation. PDF, spreadsheet, and slide generation work without converting markdown to a separate service. We tested it; the outputs are usable.

Video input. xAI added native video understanding. You can feed clips and ask Grok to reason about what’s happening. Claude and GPT-5.5 still require image-per-frame workarounds.

Cache hit pricing. After the first API call, cached tokens cost $0.20/M — 84% off list price. If you’re running similar queries repeatedly (batch processing, recurring reports), this compounds the savings.

Where It Falls Short

Persistent memory is absent. SuperGrok Heavy charges $300/month partly on the promise of memory features. Grok 4.3 (and the API) have zero session memory. Every request is stateless. If your workflow relies on the assistant remembering context between calls, this is a blocker.

Time-to-first-token is sluggish. The always-on reasoning means responses don’t start instantly. 8–12 seconds on complex tasks is livable for batch jobs, painful for interactive use.

Output is verbose. Artificial Analysis evaluation data shows Grok 4.3 averaging 88M output tokens per query versus a 36M median for comparable models. Grok likes to explain itself. If you’re paying per output token, that adds up fast.

Knowledge cutoff is December 2025. Six months old at launch. Claude Opus 4.7 and GPT-5.5 both have fresher training.

The consumer $300 SuperGrok Heavy wall. If you want the 16-agent Heavy version, the jump from $30 to $300 monthly is steep, and most builders should just use the API tier and rent compute as needed.

Who Should Actually Switch to Grok 4.3

High-volume API builders where token cost is the primary constraint. Teams already using xAI for real-time X API data who want to consolidate vendors. Document-heavy workflows (research synthesis, bulk report generation, content structuring) where reasoning is worth the latency trade.

Not your primary dev assistant if Cursor or Claude already works. Not your chatbot if you need fast TTFT. Compare it honestly against GPT-5.5 and Claude Opus 4.7 in your actual use case before switching.

Our Verdict

Grok 4.3 is second-model status for most teams: strong enough to own one workload (bulk document processing, research agents, code refactoring batches), not strong enough to replace your primary assistant. At $1.25/M input tokens, it’s worth testing if you’re already thinking about cost optimization. The always-on reasoning actually delivers on the “intelligent enough” part. The $300 SuperGrok Heavy plan is a trap. Stick to the API unless you specifically need the 16-agent Heavy version for daily interactive work. For ChatGPT Plus, Claude Pro, and Perplexity Pro comparisons, see our consumer tier breakdown. Grok 4.3 API belongs in your toolkit, not as a badge of honor, but as a hammer for the specific nails it swings well.

Grok 4.3 Review: Smart Enough, Cheap Enough, But...