We cry about AI tools so you don't have to.

Pricing Watch

Grok API Pricing: The Cheap Token Rates Are Real. The Tool-Call Bill You're About to Get Is Not.

Grok API tokens cost $0.20/$0.50 per million. But web search, file attachments, and tool calls add $5-10 per 1,000 invocations. Here's the actual bill.

Updated
pricinggrokxaiapi-costsagentic-ai

The headline number is $0.20 per million input tokens on Grok 4.1 Fast. The bill you open at month-end is three times that. xAI built it that way on purpose.

It’s a classic playbook: lead with the loss leader. Grok’s token rates are genuinely cheap—cheaper than OpenAI, Claude, Gemini, everything. According to xAI’s developer docs, Grok 4.1 Fast sits at $0.20 input / $0.50 output per million tokens, roughly an order of magnitude cheaper than Claude Opus 4.6 at $5/$25. That gets shared in every pricing conversation. What doesn’t get front-and-center is the line item below: web search at $5 per 1,000 calls, file attachments at $10 per 1,000 calls, code execution at $5 per 1,000 calls. Agentic workloads trigger all three, constantly.

Here’s what actually happens.

The Token Rates Are Real (And Basically Irrelevant)

Grok 4.1 Fast pricing from the xAI docs is straightforward: $0.20 per million input tokens, $0.50 per million output tokens. Even their flagship Grok 4.3 runs $1.25 / $2.50—still cheaper than most competitors’ base models. Send 10 million tokens a month, pay $2–6 in token cost. Sounds good.

But agentic Grok doesn’t work that way. An agent doesn’t chat; it runs loops. Each loop calls tools: web search to fact-check, file attachment parsing to extract data, code execution to validate syntax. Each tool invocation is a separate line item.

The Tool-Call Layer Nobody Wants to Discuss

According to Mem0’s breakdown of xAI’s billing structure, tool calls are metered separately:

  • Web Search & X Search: $5 per 1,000 calls
  • Code Execution: $5 per 1,000 calls
  • File Attachments: $10 per 1,000 calls
  • Collections Search: $2.50 per 1,000 calls

One moderate agent making a web search every loop, plus one file read per query, plus occasional code runs. That’s 50–100 tool invocations per user request. Scale that to 1,000 daily requests and you’re looking at 50,000–100,000 tool calls a month.

At $5–10 per 1,000, that’s $250–1,000 in tool cost alone.

The token bill for the same work might be $50.

The Actual Bill: A Worked Example

Let’s build a modest agentic workload: a development assistant that researches APIs, reads documentation files, and suggests code implementations.

Month assumptions:

  • 10,000 user requests
  • Average 50 tool invocations per request (web search, file reads, code checks)
  • 500,000 total tool invocations
  • 3 million tokens of context and generation

Token cost: (3M tokens × $0.35 blended rate) = $1,050

Tool cost: (500K invocations ÷ 1,000) × $7 blended rate = $3,500

Total: $4,550

The token math says $1,050. The real bill is $4,550. That’s not an edge case—that’s standard agentic billing on Grok. Costgoat’s pricing calculator flags this gap explicitly: “tool calls compound rapidly in production agents.”

What to Do About It

Caching reduces re-read cost. xAI’s caching model prices cached input at roughly 10% of the miss rate on Grok 4.20 and 25% on Grok 4. If your agent re-queries the same documentation set repeatedly, cache it. One cached file read costs 90% less than the first read.

Batch mode gives 50% off. For non-urgent work, the Batch API discounts all token costs by half. If your agent can tolerate a 24-hour SLA, batch everything. That’s $525 instead of $1,050 in token cost—real money even if tool calls stay flat.

Pick the right model. Grok 4.1 Fast triggers the same tool fees as Grok 4.3, but tokens cost a fifth as much. If your agent doesn’t need flagship reasoning, Fast pays for itself immediately.

Reduce tool invocation density. The hardest lever: redesign the agent to make fewer tool calls per request. That means longer context windows (so you push more facts into system prompts upfront) and smarter retrieval (search once, parse multiple times before re-searching). It’s architectural work, not a pricing hack.

How Grok Stacks Against OpenAI and Claude at Agentic Scale

The headline makes Grok look unbeatable. At agentic scale, the comparison flattens. OpenAI’s tool-use pricing matches Grok’s ($5 per 1,000 calls for web search), and Claude’s token rates are steeper but include function-calling overhead in the base token cost—no separate line item. If your agent runs 500K tool invocations a month, you’re paying $2,500–4,500 across all three vendors. Token cheapness doesn’t move the needle.

The real wins come from architectural choices. See our pricing-watch comparison of hidden costs across LLM APIs for how to structure agents to minimize tool overhead. For the consumer-vs-API pricing split and standalone Grok comparisons, check what we learned testing Grok’s consumer product.

If you’re building agents and Grok’s token rates excited you, read the fine print on tool calls. Then rethink your loop density. The difference between a $1,000 month and a $5,000 month lives there. See also how agentic billing stacks up across Codex, Claude, and other reasoning models when you’re choosing a vendor for production work.

We cry about the line items xAI buried so you don’t have to.

← More Pricing Watchs

What we don't know is documented at the end of this article. We update when we learn more.