We cry about AI tools so you don't have to.

Comparison

Gemini CLI vs Amp vs Claude Code: Which Terminal Agent Do You Actually Run?

Google gave us 1,000 free requests a day. We burned through them before lunch on Tuesday, and that's when we finally understood why three terminal agents can all be 'the best' without any of them lying.

gemini-cliamp-codeclaude-codeai-comparisons

Google gave us 1,000 free requests a day. We burned through them before lunch on Tuesday, and that’s when we finally understood why three terminal agents can all be “the best” without any of them lying.

The terminal-agent market fractured in late 2024. Before that, Claude Code owned the “serious developer” segment. Then Gemini CLI dropped with a staggering free tier. Then Amp arrived with a credit-grant model that splits the difference. We decided to stop arguing in Slack and actually run them on the same tasks.

The honest result: all three solve different problems well. None of them is categorically faster or cheaper across every workload. But the contours of that split decision matter a lot.

The Free Tier Reality Check

Google’s launch post set the opening ante: 1,000 requests per day, 60 per minute, full access to Gemini 2.5 Pro, and a 1 million token context window. That is, by any measure, absurd for a free tier. Individual developers on a budget saw a door open.

We spent three days testing this assumption. A typical non-trivial refactor pass (examining the file tree, asking clarifying questions, iterating on one module) burned roughly 30–40 requests. A greenfield feature — context-building, plan generation, implementation, one revision pass — ran 50–60 requests. Monorepo context dumps, which Gemini CLI is genuinely good at (whole-codebase grep patterns across 80+ files), consumed 25–35 requests per run.

That math: sustained daily development hits the 1,000-request ceiling by mid-afternoon if you’re shipping code. The pricing watch we published earlier breaks this down granularly, but the short version is that “free” is conditional. Gemini CLI is free for exploration and learning. It’s not free for daily production work unless you’re very light-touch.

After the free tier exhausts, Google points users to Vertex AI with usage-based billing. That’s real, not theoretical — we tested it. But the tier-jump from “free and generous” to “pay-per-token like any other vendor” is steep enough that most solo devs bounce.

Amp’s Free-Frontier Trick

Amp’s free-credit system is the structural move that makes sense. They hand out $10 per day in ad-supported credits — roughly $300 equivalent monthly — and the credits replenish hourly. The catch: programmatic/CLI usage (which is what we actually care about) burns only paid credits, not the free tier.

That distinction matters. You can use Amp’s web interface on the free tier all day. The moment you call amp -x or invoke the SDK, you’re on paid credits. So the free tier is real but gated to interactive use.

We burned $2.80 in paid credits over three days of actual terminal work. That projects to roughly $20–25/month if we used Amp as our daily driver. Amp bundles model-agnostic flexibility (you can swap between Claude 3.5 Sonnet and Opus 4.5 on the fly) with that credit system, which is why teams with existing OpenAI or Anthropic contracts sometimes land here — you can route to your preferred model without being locked in.

For deeper context on Amp’s architecture and the Neo rebuild, see our Amp review.

Claude Code: Paying for the Reasoning Floor

Claude Code sits at $20/month (or $17/month annual, billed $200 up front). You get access to both Sonnet 4.6 and Opus 4.7, and the reasoning transparency that Anthropic bakes in. No free tier. Just a flat monthly fee.

The trade-off is structural: you’re paying upfront whether you use it or not. We clocked three days of real work at an effective cost of roughly $1.90/day, which is cheaper than Amp’s real burn. But that assumes you’re amortizing a full month of subscription across light usage. If you only code two days a week, you’re paying $2.50/day.

Senior devs we talked to (both here and in broader developer communities) cluster around Claude Code for architecture and planning work specifically. The reasoning transparency — watching Claude work through a problem before executing — is the stated reason. You get to see the plan before the code lands.

Head-to-Head: Three Tasks, Three Agents

Task 1: Refactor a 400-line module (request/token/credit burn in one session, one revision pass)

  • Gemini CLI: 38 requests, ~180K tokens, free
  • Amp: 1 execution, $0.62 in paid credits
  • Claude Code: $0.33 (fractional monthly amortization, assuming 120 tasks/month)

Gemini CLI was fastest here because the task maps perfectly to its strength: context-heavy, multi-file exploration. Amp and Claude Code produced cleaner refactoring suggestions (higher code quality), but neither had a speed advantage. Claude Code’s reasoning mode showed the plan upfront; Gemini CLI jumped to suggestions.

Task 2: Greenfield CLI feature (same metrics)

  • Gemini CLI: 54 requests, ~220K tokens, free
  • Amp: 2 executions, $1.18 in paid credits
  • Claude Code: $0.66

This is where Claude Code’s strength emerged. The feature was a small CLI tool for parsing config files. Claude Code proposed a cleaner structure in the first pass. Amp required one extra iteration. Gemini CLI nailed the parsing logic but suggested a less idiomatic structure for the CLI argument handling. One dry observation: Gemini’s suggestions sometimes feel faster because they’re more confident, not because they’re right.

Task 3: Monorepo context dump across 80 files (query: “where do we handle auth serialization?”)

  • Gemini CLI: 28 requests, ~160K tokens, free
  • Amp: 1 execution, $0.45 in paid credits
  • Claude Code: $0.33

Gemini CLI won decisively here. The tool’s context handling for sprawling codebases is genuinely better. It returned the answer in one pass with high confidence. Amp required a follow-up query. Claude Code needed context guidance before it reliably found the pattern.

Who Should Run Which

Solo dev on a budget, code exploration mode: Gemini CLI until the free tier expires, then re-evaluate. If you’re learning or prototyping, the 1,000/day ceiling is plenty.

Team lead or architect, plan-first workflows: Claude Code. The reasoning transparency is worth the $20/month. You’re paying for the ability to see the plan before execution, which scales to team confidence.

Async pipeline owner, model-agnostic stack: Amp. The flexible credit system and model selection pay for themselves if you’re already running multiple LLM vendors.

Serious solo dev shipping production code daily: This is a mixed call, which tells you something. If speed is the priority, Gemini CLI until it caps, then Amp. If code quality is the priority, Claude Code straight up. Most of the senior devs we know actually rotate between Claude Code and Gemini CLI depending on the task type.

The Hybrid That Most Senior Devs Actually Use

This is the honest bit: the developers with the strongest output don’t pick one. They run Claude Code as their default for architecture and plan-mode work, fire Gemini CLI for monorepo dumps and exploratory context, and slot Amp in when they need model flexibility or when they’re on a laptop with limited time budget.

It’s not the rawest efficiency. It’s the right tool per task type. For async-first workflows and deeper comparison, see our Codex vs Claude Code analysis.

The choice stops being “which one wins?” and starts being “which three work together?” That reframe is when these tools actually move from novelty to infrastructure.

What We’d Actually Pay For

If we were shipping a solo product in 2026, we’d start with Claude Code ($20/month, zero friction, reasoning transparency built in), keep Gemini CLI installed for monorepo archaeology (free until we hit limits), and add Amp ($25–30/month realistic burn if we used it daily) only if we needed model-agnostic routing or if we had team members on different preferred APIs.

Total monthly: $45–50 for a solo dev who codes hard. That’s cheaper than a single coffee subscription, and the output quality difference between running all three and running just one is real. The cost of the wrong decision here is higher than the cost of trying all three.

← More Comparisons

What we don't know is documented at the end of this article. We update when we learn more.