Skip to main content
AI Tool Review

Qwen Code Review: Alibaba's Free CLI Agent With 69.6% SWE-bench

Alibaba forked Gemini CLI, added Qwen3-Coder as the default model, and released it as a free, open-source terminal coding agent. Here is what that means for developers who want a zero-cost alternative to Claude Code and Cursor.

March 20, 2026·12 min read·OpenAI Tools Hub Team

Key Takeaways

  • Free and open-source — no subscription. Use your own Qwen API key, any OpenAI-compatible endpoint, or local Ollama models at zero recurring cost
  • Qwen3-Coder scores 69.6% on SWE-bench Verified — above Gemini 2.5 Pro (63.8%) and competitive with lower tiers of Claude and GPT offerings
  • Gemini CLI fork — similar architecture and UX to Gemini CLI, meaning model-agnostic design and a familiar terminal interface
  • Model-agnostic: switch to GPT-5.4, Claude Opus 4.6, or Gemini 3.1 Pro via environment variable — no code changes
  • Downside: Less mature agentic loop than Claude Code — autonomous multi-file editing and test-run-fix workflows require more manual intervention
  • Honest verdict: A strong pick for developers who want free terminal AI with a competitive model, especially for solo projects and local/private codebases

What Is Qwen Code?

Qwen Code is Alibaba's open-source terminal coding agent, released in early 2026. It is a fork of Google's Gemini CLI — the same architecture, same TUI design, same project-context system — but with Qwen3-Coder substituted as the default model and Alibaba's DashScope API as the default backend.

The key differentiator from other free terminal tools: Qwen3-Coder, Alibaba's code-specialized model, scores 69.6% on SWE-bench Verified. That puts it above Gemini 2.5 Pro (63.8%) — the model behind the tool it was forked from — and within range of Claude Sonnet-tier performance. For a freely available model, this is a meaningful benchmark result.

Qwen Code is model-agnostic by design. The Gemini CLI fork architecture accepts any OpenAI-compatible endpoint, which means you can run it against GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, or any local model via Ollama. The "Qwen" branding is about the default configuration, not a lock-in.

The project is open-source under the Apache 2.0 license at github.com/QwenLM/qwen-code. As of March 2026, the repository had accumulated roughly 18,000 GitHub stars since launch — fast adoption for a developer tool targeting a market already dominated by Claude Code and Cursor.

How We Tested

We ran Qwen Code (with Qwen3-Coder default configuration) alongside Claude Code, Cursor, and Aider across a standardized task set over two weeks in March 2026:

  • TypeScript/Next.js project (~35K lines): Component generation, API route implementation, bug fixing, and multi-file refactors
  • Python FastAPI service (~12K lines): Endpoint creation, Pydantic schema generation, test writing, and documentation
  • New feature implementation: Building a simple authentication module from a specification in both projects
  • Bug fix tasks: 15 isolated bugs across both codebases with known correct solutions, used to measure first-pass accuracy

We also tracked setup time, terminal experience quality, model switching workflow, and subjective ease of use. We evaluated Aider with its default GPT-4o configuration (not Claude, to keep cost comparisons fair) and Cursor on its free tier where applicable.

Benchmark references: SWE-bench Verified scores from the official SWE-bench leaderboard. GitHub star counts from respective repositories. Pricing verified at provider websites at time of writing.

Installation and Setup

Qwen Code installs via npm and is running in under three minutes:

# Install globally
npm install -g @qwen-ai/qwen-code

# Set DashScope API key (Alibaba Cloud)
export DASHSCOPE_API_KEY=sk-your-key-here

# Run in your project directory
qwen

You can get a DashScope API key from dashscope.aliyun.com. Alibaba provides a free tier for Qwen3-Coder that covers moderate usage — the specific token limits depend on account type and are listed in the DashScope console.

On first run, Qwen Code reads your project directory structure and loads any QWEN.md file in the root (analogous to CLAUDE.md or GEMINI.md) for project-specific instructions. The terminal interface launches immediately — no browser authentication required, unlike Gemini CLI's Google OAuth flow.

Switching to a different model at startup

# Use with OpenAI GPT-5.4
export OPENAI_API_KEY=sk-your-openai-key
export OPENAI_MODEL=gpt-5.4
qwen

# Use with local Ollama
export OPENAI_API_BASE=http://localhost:11434/v1
export OPENAI_API_KEY=ollama
export OPENAI_MODEL=qwen2.5-coder:32b
qwen

The model-switching workflow is simple enough that you can maintain different shell aliases for different model configurations — useful for testing whether a specific task benefits from a stronger (and more expensive) model.

SWE-bench 69.6%: How Qwen3-Coder Performs

Qwen3-Coder's 69.6% on SWE-bench Verified is a genuinely competitive result for an open-weight model. SWE-bench Verified tests 500 real GitHub issues — the model must produce a correct patch without human guidance. Here is where it sits in the competitive landscape:

ModelSWE-bench VerifiedOpen WeightAPI Cost (Input/M)
GPT-5.480%No$2.50
Claude Opus 4.672–77%No$15.00
Gemini 3.1 Pro~68%No$3.50
Qwen3-Coder69.6%Yes (Apache 2.0)Free tier / Ollama
Gemini 2.5 Pro63.8%NoFree tier (60 req/min)
GPT-4o~50%No$2.50

Qwen3-Coder at 69.6% is notably above Gemini 2.5 Pro (the tool it was forked from) and approaching Claude Opus 4.6's lower range — while being available as an open-weight model. The closest paid-only comparison is Gemini 3.1 Pro at approximately 68%, which costs $3.50/M input tokens. Qwen3-Coder on DashScope's free tier beats it on both cost and benchmark.

The important caveat: SWE-bench tests Python repository tasks. Qwen3-Coder's performance on TypeScript, Go, Rust, and other languages is not independently benchmarked at this granularity. In our TypeScript tests, Qwen3-Coder was somewhat weaker on complex generic type inference than on Python tasks — consistent with the training data distribution patterns common in code models.

Model-Agnostic Design: Switching Models

Qwen Code's most underappreciated feature is its model-agnostic architecture. Because it accepts any OpenAI-compatible endpoint, you can treat it as a universal terminal AI interface rather than a Qwen-specific tool.

In practice, this means:

  • Run Qwen3-Coder via DashScope for free during exploratory work
  • Switch to GPT-5.4 for the specific tasks where 80% SWE-bench performance matters
  • Run Qwen2.5-Coder:32B locally via Ollama for air-gapped environments or private codebases
  • Use Claude Opus 4.6 via the Anthropic API (through an OpenAI-compatible proxy) when MCP tool integrations are needed

The flexibility is practical rather than theoretical. In our testing, we switched models mid-session on a complex refactoring task — started with Qwen3-Coder for the initial exploration and planning phase (free), then switched to GPT-5.4 for the execution phase (paid but cheaper than Opus 4.6). The workflow was smooth: set environment variables, restart Qwen Code, continue from notes.

Code Generation Quality in Practice

Across our 15 bug fix tasks (known correct solutions), Qwen3-Coder resolved 10 on the first pass. The five failures split into: two cases of correct logic but incorrect TypeScript type annotations, two cases where the model identified the wrong root cause and fixed a symptom rather than the bug, and one case where it hallucinated a non-existent method on a utility library.

For comparison on the same task set: Claude Code (Opus 4.6) resolved 13/15 on first pass. Aider with GPT-4o resolved 8/15. The 10/15 result for Qwen3-Coder is competitive considering it was the free-tier option.

Feature implementation tasks were stronger. Building the authentication module from specification, Qwen3-Coder produced working code with correct JWT handling, proper error responses, and reasonable middleware structure on the first attempt. The output required minor style adjustments and one type annotation fix, but was functionally correct. This matches the 69.6% SWE-bench score in feel — strong on standard patterns, weaker on edge cases.

Multi-file coordination is where the gap with Claude Code becomes visible. When asked to implement a feature touching five or more files, Qwen Code (with Qwen3-Coder) needed explicit guidance about which files to edit. It would correctly implement changes in the files we pointed it to, but did not autonomously search and identify all files that needed updating. Claude Code handled the same task by searching the codebase, identifying all 7 affected files, and coordinating changes across them without prompting.

Qwen Code vs Claude Code vs Cursor vs Aider

FeatureQwen CodeClaude CodeCursorAider
PriceFree$20/mo$20/moFree (+ API)
Open sourceYes (Apache 2.0)NoNoYes (Apache 2.0)
Model-agnosticYesAnthropic onlyYes (bring your key)Yes
Default model SWE-bench69.6%72–77%Varies by modelVaries by model
Terminal-basedYesYesIDE onlyYes
Local model supportYes (Ollama)NoNoYes (Ollama)
Autonomous multi-file editPartialYes (agent loop)Yes (Composer)Partial
Git integrationBasicDeepBasicStrong
IDE integrationTerminal onlyTerminal onlyFull VS Code forkTerminal only

Claude Code is the clear choice for developers who need autonomous, multi-file agentic workflows and are willing to pay $20/month. Its SWE-bench lead (72–77% vs 69.6%), deeper git integration, and mature MCP ecosystem justify the cost for professional use.

Cursor at $20/month is a different category — it is an IDE fork, not a terminal tool, and brings auto-complete, inline suggestions, and a visual editor experience that terminal tools cannot match. It is not really a direct competitor to Qwen Code for terminal-first developers.

The most direct comparison is Aider — both are free, open-source, terminal-based, and model-agnostic. Aider has stronger git integration and a more mature community (50K+ GitHub stars versus Qwen Code's ~18K). Qwen Code has a more modern TUI (inherited from Gemini CLI), a competitive default model, and a cleaner model-switching workflow. For developers who want a modern UI and do not need Aider's deep git features, Qwen Code is a viable alternative.

Real Downsides

Qwen Code is genuinely useful, but it has limitations worth knowing before committing to it for production workflows.

Weaker autonomous multi-file coordination

The tool does not automatically search and identify all files affected by a change request. For tasks spanning more than three or four files, you need to specify which files to include in context. Claude Code's agent loop handles this automatically. This is a meaningful productivity difference for complex features.

DashScope API requires an Alibaba Cloud account

The default free tier requires an Alibaba Cloud account with identity verification in some regions. Developers outside China may find the registration process non-trivial. The workaround is to use a different provider (OpenAI, Anthropic via proxy, or local Ollama), but this adds setup friction compared to Gemini CLI's Google OAuth or Claude Code's Anthropic flow.

TypeScript/non-Python performance is untested on benchmarks

SWE-bench uses Python repositories exclusively. Qwen3-Coder's 69.6% result does not tell us how it performs on TypeScript, Go, or Rust codebases. In our TypeScript tests, it was noticeably weaker on advanced generics and complex type manipulation — a gap that matters for TypeScript-heavy frontend projects.

Younger ecosystem and community

At ~18K GitHub stars, Qwen Code is newer than Aider (50K+) and much newer than Claude Code's user base. Community resources, Stack Overflow answers, and third-party integrations are limited. You will hit undocumented behavior more frequently than with more established tools.

Free tier limits on DashScope are not clearly documented

The DashScope free tier exists but the token quotas are listed in the console rather than prominently in the documentation. Heavy usage sessions — particularly long-context multi-file tasks — can exhaust free tier allocations unexpectedly. Monitor your usage in the DashScope console if you are doing intensive coding sessions.

Using Qwen Code With Local Models

For developers who need air-gapped environments or want truly zero-cost operation, running Qwen Code with local models via Ollama is straightforward:

# Install Ollama (if not already installed)
# https://ollama.ai

# Pull Qwen2.5-Coder (32B for best performance, 7B for faster/lower VRAM)
ollama pull qwen2.5-coder:32b

# Configure Qwen Code to use local Ollama
export OPENAI_API_BASE=http://localhost:11434/v1
export OPENAI_API_KEY=ollama
export OPENAI_MODEL=qwen2.5-coder:32b

qwen

Qwen2.5-Coder:32B requires approximately 20GB of VRAM to run at full precision. On a 24GB GPU (RTX 4090 or equivalent), it runs at a usable speed for interactive coding sessions. The 7B variant runs on 8GB VRAM and is suitable for faster-feedback workflows where quality can be slightly lower.

Local model performance is lower than the cloud-hosted Qwen3-Coder — Qwen2.5-Coder:32B is roughly equivalent to Qwen3-Coder in earlier generations, scoring approximately 62–65% on SWE-bench compared to the newer model's 69.6%. For private codebases or offline use cases, this trade-off is often worth it.

Verdict

Qwen Code is the most compelling free terminal coding agent available today that comes with a competitive default model. Qwen3-Coder's 69.6% SWE-bench score exceeds Gemini 2.5 Pro (the model it was forked from), and the model-agnostic design means you are not locked into Alibaba's ecosystem.

For solo developers and small teams working on Python-heavy projects, Qwen Code plus Qwen3-Coder is a free alternative that gets you within 3–10 percentage points of Claude Code on benchmark performance, at zero cost. That gap matters on complex autonomous tasks, but for the majority of daily coding work — code generation, bug fixing, documentation, targeted refactors — the performance difference is small enough that many developers will not notice it.

The honest limitations: Qwen Code is weaker than Claude Code on autonomous multi-file coordination, has a younger ecosystem, and the DashScope registration requirement adds friction for non-China developers. If you are already paying for Claude Pro and using Claude Code's agentic features daily, switching to Qwen Code is not a clear upgrade.

But if you are evaluating free terminal coding tools, want local model support, or need model-agnostic flexibility, Qwen Code belongs on the shortlist alongside Gemini CLI and Aider.

For a broader comparison of terminal coding tools including Opencode, see our OpenCode review and Claude Code vs Cursor breakdown.

FAQ

Is Qwen Code free?

Yes. Qwen Code is free and open-source under Apache 2.0. You provide your own model API keys. Qwen3-Coder is available via Alibaba Cloud DashScope with a free tier. For fully zero-cost operation, run local models via Ollama — Qwen2.5-Coder:7B is free if you have the hardware.

What is Qwen Code's SWE-bench score?

Qwen3-Coder, the default model for Qwen Code, scores 69.6% on SWE-bench Verified. This beats Gemini 2.5 Pro (63.8%) and approaches the lower range of Claude Opus 4.6 (72–77%). It is the highest SWE-bench score among freely available open-weight code models as of March 2026.

How does Qwen Code compare to Claude Code?

Claude Code is stronger on autonomous multi-file coordination, has a 3–8 point SWE-bench advantage, and comes with a mature MCP ecosystem for tool integrations. Claude Code costs $20/month. Qwen Code is free, open-source, and model-agnostic. For developers who need a free terminal coding agent with competitive performance, Qwen Code is the stronger option. For professional agentic coding workflows, Claude Code's capabilities justify the cost.

What models does Qwen Code support?

Qwen Code supports any OpenAI-compatible API endpoint. Default configuration uses Qwen3-Coder via DashScope. You can switch to GPT-5.4, Claude Opus 4.6 (via compatible proxy), Gemini 3.1 Pro, or local models via Ollama using environment variables.

How do I install Qwen Code?

Install via npm install -g @qwen-ai/qwen-code. Set DASHSCOPE_API_KEY for the default Qwen3-Coder configuration, or configure any OpenAI-compatible endpoint via OPENAI_API_BASE and OPENAI_MODEL. Run with qwen.

Is Qwen Code better than Aider?

They are close. Qwen Code has a more modern TUI and a stronger default free model (Qwen3-Coder 69.6% vs Aider's default GPT-4o at ~50%). Aider has a larger community, more mature git integration features, and more repository-level edit modes. Both are free, open-source, and support local models. Try both with your actual codebase for a week — they are different enough in workflow that personal preference will dominate.

GamsGo

Already paying for Claude Pro or ChatGPT Plus? Get shared-plan access at 30-70% off through GamsGo — covers most developers' actual usage.

Cut AI Subscription Costs

NeuronWriter

Writing technical content? Benchmark your articles against top-ranking Google results before publishing — used by 50,000+ creators.

Analyze Your Content Free