Three weeks running Cursor 3's background agents on production code. The honest split verdict: where the parallel agent workflow earns its keep and where it makes a mess.

Cursor 3 Background Agents: We Ran 8 in Parallel. Here’s What Breaks.

Cursor shipped version 3 on April 2 and version 3.2 on April 24. We put background agents on real production tasks for three weeks to find out whether “up to 8 agents in parallel” is a workflow upgrade or a new way to accumulate bad PRs.

What Actually Changed in Cursor 3

Cursor rebuilt its interface from scratch rather than extending the existing VS Code fork. The Composer pane is gone. In its place: the Agents Window, a full-screen workspace for orchestrating multiple coding agents simultaneously. This isn’t a feature add—it’s a philosophical reframe. Cursor stopped being a code editor that happens to have AI and became a workspace for managing agents that write code.

The shift is backed by usage data. By March 2025, tab completion dominated with 2.5x more users than those using agents. That ratio has completely inverted. Autonomous agent usage now leads by 2x. We saw this firsthand in our earlier six-month Cursor review—agents were optional then. Now they’re the main event. Cloud clones, autonomous runs, PRs-when-done. The risk surface got much larger.

The Credit System Is Now the Real Variable

The old 500-request pricing model is dead. Cursor moved to a credit-pool system where your monthly subscription is your monthly budget. Pro is $20 (roughly 200–250 Sonnet requests). Pro+ is $60. Ultra is $200. Each agent session consumes credits based on model choice and session length.

This changes the math fundamentally. A well-scoped agent task on Pro costs ~$0.50 in credits. A vague, multi-file refactor on Ultra that loops six times costs $15–20. We killed three agents mid-task to stop the bleeding. On May 4, Cursor shipped spend-management controls—soft limits with alerts at 50%, 80%, 100%—but by then the damage was done on a few runs.

Understanding where you fall in how it stacks against Windsurf on cost and the broader IDE pricing shift happening right now matters if you’re comparing tools at this tier. The credit burn is real.

Where Background Agents Earned Their Credits

Small-surface, well-scoped tasks succeed 70–80% of the time: test generation, dependency updates, lint fixes, simple refactors within a single file or a tight module. We sent agents out to write Jest suites for utility functions. We asked them to bump npm packages and run test suites. We pointed them at linting violations. These came back clean or close-to-clean more often than not.

The killer use case: close your laptop Friday evening with a ticket description, wake up Monday to a drafted PR. Cursor’s own engineering team ships 35% of their pull requests via autonomous agents, and we believe it. For maintenance work and small features, the parallel-agent workflow genuinely reduces cycle time. The prompt matters—it needs to be as precise as a ticket description, not a wish—but when it works, it works.

Where They Made a Mess

Vague prompts produce bloated, conflict-heavy PRs. Architecturally complex asks (cross-service refactors, data model changes) degrade fast because long agent sessions erode reasoning quality. Monorepos without a strict .cursorignore hammer the indexer—agents hallucinate file paths and miss dependencies. A handful of community posts noted the switch from “developer drives” (tab completion) to autonomous agents feels like abandonment; a vocal minority wants the old editor back.

We threw a refactor prompt at a four-service system. The agent looped six times, burned $18 in credits, and shipped a PR that touched eleven files with conflicts in three of them. We rewrote the prompt as a sequence of smaller tickets—one per service—and the agents succeeded. The constraint, not the agent, was the issue. But the burn happened first.

Multi-file refactors need heavy human review. Vague or architecturally ambitious prompts are not just slower; they’re more expensive. The trade-off between speed and scrutiny shifted.

The Multitask / 3.2 Update Changes the Math

April 24 brought /multitask and worktrees. Instead of one agent running sequentially, Cursor now spins up parallel subagents on isolated branches, merging results at the end. This is the cleanest version of the feature we’ve tested. Isolated execution means one agent can’t corrupt another’s context. April 29, Cursor dropped the SDK as TypeScript on npm, so you can embed agents in your own CLI or automation.

The competitive picture shifted. Claude Code’s background execution also runs async, but Cursor’s Agents Window is more visible and the parallel subagent model (3.2+) is more mature. Both are worth watching.

Who Should Turn These On

Solo builders on maintenance work: yes, at $20 Pro. The time savings on lint, test gen, and dependency churn justify the subscription upgrade over GitHub Copilot at $10.

Monorepo architects: only if you write strict .cursorignore files and keep prompts to single-module scope. Loose scoping breaks everything.

Occasional coders: Pro+ ($60) in practice. You’ll burn through Pro’s $20 credit pool faster than you expect on experimental refactors.

High-velocity teams: Ultra ($200) is the realistic tier if you’re running 3+ parallel agents per day. The credit pool keeps up; cheaper tiers degrade into throttling frustration.

The Verdict

We’re running background agents in production. We’ve killed three mid-task. We’ve shipped clean PRs on maintenance work. We’ve also shipped bloated conflicts that required full rewrites. The 70–80% one-shot success rate on scoped work is real. So is the credit burn on vague prompts.

If you’re on Pro and spending 4+ hours per day on agent work, the $20 tier tips the value case. If you’re on Pro+ or Ultra, you need serious volume to justify the cost. The agents are not magic—they’re another tool with a clear failure mode (vague prompts, complex scope). Treat them like you’d treat a junior engineer: scope tightly, review thoroughly, pay attention to the burn rate.

Cursor 3 Background Agents: We Ran 8 in Parallel. Here's What Breaks.