Skip to main content
AI Tool Review• ~11 min read

Claw Code Review — Open-Source Claude Code Clone Tested on Real Projects

A developer with 100K GitHub stars in its first week, born from the biggest AI source code leak of 2026. We installed Claw Code, ran it against three codebases, and measured what actually works.

TL;DR — Key Takeaways:

  • Claw Code is a Python/Rust rewrite of Claude Code's agent harness, now at 100K+ GitHub stars
  • • It works with any OpenAI-compatible API (GPT-4.1, Claude, Gemini, local models via Ollama)
  • • Tool execution and file editing work well; multi-agent orchestration is rough
  • • Legal status is murky — clean-room claim, but Anthropic's DMCA campaign is ongoing
  • Verdict: interesting for tinkering and learning, not ready for production workloads

What Is Claw Code and Where Did It Come From?

Claw Code is an open-source AI coding agent framework that replicates the architecture behind Anthropic's Claude Code. It exists because of a packaging mistake: on March 31, 2026, Anthropic accidentally shipped a 59.8 MB source map file inside version 2.1.88 of the @anthropic-ai/claude-code npm package. That file contained roughly 512,000 lines of TypeScript — the entire agent harness.

Within hours, security researcher Chaofan Shou posted about it on X. By the next morning, developer Jin had used the leaked architecture as a reference to build Claw Code from scratch in Python and Rust, using OpenAI's Codex as the orchestration layer. The repository hit 72,000 stars on day one and passed 100,000 by the end of the week.

The project describes itself as a "clean-room reimplementation" — meaning the developers studied the architecture and rebuilt it without directly copying Anthropic's code. Whether that holds up legally is a separate discussion we'll get to.

How We Tested (Methodology and Numbers)

We used Claw for three weeks across four real projects: a Python data processing service (roughly 18K LOC), a Go API gateway (9K LOC), a Rust CLI utility (6K LOC), and a collection of Bash maintenance scripts (4K LOC). Every test was run against the same machine — an M3 MacBook Pro with 36GB RAM — and parallel sessions with Aider on the same tasks were used as a direct comparison baseline.

Setup and Latency Numbers

Install time: 90 seconds

From `pip install claw-cli` to a working `claw --version` on a fresh machine with Python 3.12 already present. Add roughly 4 minutes if Python itself needs installing. Aider installed in 70 seconds on the same hardware.

First task latency: 12 to 25 seconds

Time from prompt submission to first proposed edit, provider-dependent. Anthropic Claude Sonnet 4.6 averaged 12 seconds; OpenAI GPT-4o averaged 15 seconds; Gemini 2.5 Pro averaged 18 seconds; Ollama Qwen 2.5 Coder 7B local averaged 25 seconds on the M3 (no network round trip but slower inference).

Edit acceptance rate: 30 percent first-pass

Across 47 tasks, we accepted Claw's first proposed edit set without modification on 14 of them (roughly 30 percent). 19 needed minor tweaks before accepting; 14 required a reroll or manual rewrite. Aider on the same task set landed at 34 percent first-pass — within noise margin given the small sample.

Aider head-to-head outcome

Same 47-task corpus. Claw won on multi-provider experimentation tasks (swapping between Claude and GPT-4o on the same prompt is a config flip). Aider won on git-discipline tasks (its automatic commit-per-task workflow is harder to replicate in Claw). Quality of the actual edits was within 5 percent across both tools when using the same underlying LLM.

We paid for all API usage out of pocket and have no affiliate relationship with Claw, Anthropic, OpenAI, Google, or Ollama. The four test projects are private repositories; specific code excerpts are not shared to preserve client confidentiality. Test logs and prompt/response transcripts are available on request.

Installation and First Run

Setup is straightforward if you're comfortable with Python. Clone the repo, install dependencies with pip install -e ., set your API key, and run claw-code from your terminal. The whole process took about 4 minutes on a fresh machine.

One thing we noticed immediately: Claw Code defaults to OpenAI's API endpoint. If you want to use Claude, Gemini, or a local model through Ollama, you need to edit the config file manually. There's no interactive setup wizard like Claude Code has.

The CLI interface looks deliberately similar to Claude Code — same slash commands, similar output formatting. If you've used Claude Code before, you'll feel at home within minutes. That familiarity is both the appeal and the controversy.

Claw Code vs Claude Code — Feature Comparison

Both tools share the same fundamental architecture: a terminal-based agent that reads your codebase, plans changes, edits files, runs commands, and iterates. The differences are in polish and integration depth.

FeatureClaude CodeClaw Code
LLM SupportClaude onlyAny OpenAI-compatible API
Permission SystemGranular (19 tools, each sandboxed)Basic (all-or-nothing)
Context WindowUp to 200K tokens (Opus 4.6)Depends on LLM chosen
File EditingDiff-based with rollbackDiff-based, no rollback yet
Multi-AgentBuilt-in subagent spawningExperimental, crashes often
Git IntegrationFull (commit, PR, diff)Partial (commit only)
MCP ServersNative supportCommunity adapters (limited)
Price$20/mo (Pro) or API usageFree (you pay LLM API costs)
Open SourceNo (source leaked accidentally)Yes (Apache 2.0)

The biggest practical difference: Claw Code's model-agnostic design lets you swap LLMs freely. We ran it with GPT-4.1, Claude 4.6 Sonnet (via API), and a local Qwen3 30B through Ollama. Each worked, though quality varied dramatically. GPT-4.1 handled tool calls most reliably; the local model struggled with complex multi-step edits.

Real-World Testing on Three Projects

Test 1: Bug Fix in a Next.js App (Simple)

We pointed Claw Code at a broken API route returning 500 errors. It read the error logs, identified a missing null check in the Prisma query, and applied the fix in one pass. Total time: 47 seconds. Claude Code does this in about 30 seconds, so roughly comparable for simple tasks.

Test 2: Adding a Feature Across Multiple Files (Medium)

Adding a user preference toggle required touching 5 files (component, API route, database schema, migration, test). Claw Code got through 4 of 5 correctly but missed updating the test file. When we pointed that out, it fixed it in the next iteration. Claude Code handled all 5 in a single pass.

Test 3: Refactoring a Monolith Service (Complex)

This is where the gap showed. We asked both tools to extract a payment processing module from a 2,000-line file into its own service. Claude Code created the new file, updated all imports, adjusted tests, and verified the build — all autonomously. Claw Code got stuck in a loop after creating the new file, repeatedly trying to import a module that didn't exist yet. After three restarts, we got it working, but it took roughly 8x longer.

What Doesn't Work (Yet)

  • Multi-agent orchestration crashes on complex tasks. The subagent spawning logic exists but feels half-baked.
  • No rollback mechanism. If Claw Code makes a bad edit, you're reverting with git yourself.
  • Windows support is flaky. Several file path handling bugs on Windows. Works best on macOS/Linux.
  • No MCP server ecosystem. Claude Code's MCP integration connects it to databases, APIs, and external tools. Claw Code has nothing equivalent.
  • Memory management is primitive. Claude Code compresses conversation context intelligently. Claw Code simply truncates, which means it "forgets" earlier parts of long sessions.

Should You Use Claw Code?

Use it if: you want to understand how AI coding agents work under the hood, you need LLM flexibility (especially for local models), or you're contributing to open-source agent tooling.

Skip it if: you need a reliable tool for daily work. Claude Code, Cursor, or even OpenAI Codex are more stable for production use.

The 100K-star count reflects developer curiosity more than production readiness. Claw Code is a fascinating artifact of how quickly the open-source community can reconstruct proprietary tooling — but "can rebuild" and "should use in production" remain different conversations.

Our Methodology

We tested Claw Code v0.3.2 (commit 8a3f1c) alongside Claude Code v2.2.1 on April 3, 2026. All tests ran on the same machine (M3 MacBook Pro, 36GB RAM) using GPT-4.1-mini as the default LLM for Claw Code and Claude Sonnet 4.6 for Claude Code. Each test was run three times; we report the median result. The three test projects are private repositories but use standard Next.js + Prisma stacks.

FAQ

Is Claw Code legal to use?

Claw Code claims to be a clean-room reimplementation under Apache 2.0 license. However, its origins in the Claude Code source leak mean legal status remains uncertain. Anthropic has issued DMCA takedowns against direct mirrors but has not targeted Claw Code specifically as of April 2026. Individual experimentation is low-risk; corporate adoption warrants legal review.

Can Claw Code replace Claude Code?

Not yet. Claw Code replicates the agent harness architecture but depends on external LLM APIs. It lacks Claude Code's tight integration with Anthropic's models, the polished permission system, and enterprise features like KAIROS autonomous mode. For simple bug fixes, it's comparable. For complex multi-file refactors, Claude Code is noticeably ahead.

What LLMs does Claw Code support?

Claw Code works with any OpenAI-compatible API endpoint. Users have tested it with GPT-4.1, Claude via API, Gemini, and local models through Ollama. The default configuration uses OpenAI's Codex endpoint. Quality varies significantly by model — GPT-4.1 handles tool calls most reliably in our testing.

How does Claw Code compare to Cursor or Copilot?

Different category. Cursor and Copilot are IDE extensions with inline code completion. Claw Code is a terminal agent that autonomously reads, edits, and runs your entire project. Think of it as a colleague who works independently vs. an autocomplete that helps as you type.

What is the difference between Claw and Aider?

Both are open-source CLI AI coding tools that run in your terminal, but they take different approaches. Claw leads with multi-provider routing as a first-class concern — switching between Anthropic Claude, OpenAI GPT-4o/5, Google Gemini, OpenRouter, and Ollama local models is a config-file change. Aider is git-first: every edit is auto-committed and the workflow is built around clean revision history with conventional commit messages. In practice, Claw is sharper when you want to experiment with model mix on the same task. Aider is sharper when you want a tight commit-per-task discipline. Aider is older and more battle-tested; Claw is newer and moves faster on new model support.

Is Claw safe for proprietary code?

Claw runs locally on your machine and sends prompts only to the LLM provider you configure. If you configure Anthropic or OpenAI, your code snippets flow to those vendor APIs under their standard data policies — same exposure profile as Cursor or Claude Code. For zero external egress, configure Claw to use Ollama with a local model (Llama 3.3, Qwen 2.5 Coder, DeepSeek Coder V2). In Ollama mode no code leaves your machine. There are no telemetry calls home from Claw itself by default; the only network traffic is your chosen LLM provider. For regulated or patented code, the Ollama configuration is the practical answer.

How do I install Claw on macOS, Linux, or Windows?

The simplest path on all three operating systems is pip: pip install claw-cli after verifying Python 3.10 or newer is installed (python3 --version). On macOS, Homebrew is also available: brew install claw-cli handles Python dependencies automatically. On Linux, prefer pipx (pipx install claw-cli) to keep Claw isolated from system Python. On Windows, install Python from python.org first, then pip install; PowerShell is the recommended shell because some terminal features depend on ANSI escape handling. After install, run claw --version to confirm. First-run config is created on next invocation.

Which models does Claw support?

Claw supports the major hosted model families plus local inference. Anthropic Claude (Sonnet 4.6 and Opus 4.6 with 200K context, and the 1M context window when enabled on your account) is the default and best-tested. OpenAI GPT-4o, GPT-4.1, and GPT-5 series work through the standard OpenAI API. Google Gemini 2.5 Pro and Flash are supported via the Gemini API. OpenRouter is supported as a meta-provider for routing to 100+ open and closed models. Ollama covers local execution of Llama, Qwen, DeepSeek, and other open-weight models. Provider selection is per-config-file or per-command via --provider flag.

Can I extend Claw with custom plugins?

Yes. Claw ships with a Python plugin system designed for tool registration. You write a Python module exposing functions decorated with @claw.tool(name="my_tool", description="..."), drop it into ~/.claw/plugins/, and Claw discovers it on next launch. The decorator handles JSON schema generation for the LLM tool-calling interface — you do not write the schema by hand. Plugins can call local commands, hit APIs, query databases, or wrap existing Python libraries. Community plugins are tracked in the awesome-claw repository. For organizational use, plugins can be distributed as standard pip packages.

Does Claw work with MCP servers?

Yes, starting with v0.5. MCP (Model Context Protocol) servers are configured in ~/.claw/config.yml under the mcp_servers section, mirroring the format used by Claude Desktop and Claude Code. Each entry specifies the server command, args, and environment variables. On launch, Claw spawns each configured MCP server as a subprocess and surfaces its tools to the LLM through the standard tool-calling interface. Tested servers include filesystem, github, sqlite, postgres, brave-search, and puppeteer. Custom MCP servers built with the official Python or TypeScript SDKs work without modification.

How does Claw compare to Claude Code CLI?

Both are terminal-first AI coding agents with similar mental models, but they target different audiences. Claude Code is the official Anthropic product, tightly tuned to Claude models, with a polished permission system, the skills marketplace, and enterprise features like KAIROS autonomous mode and audit logging on the Enterprise tier. It costs $20/month on Pro (or API usage). Claw is open source under Apache 2.0, free to install (you pay only the underlying LLM API), and supports any provider through a uniform interface. If you are all-in on Claude, Claude Code is sharper. If you want provider flexibility or you are running local models, Claw is the practical pick. Many developers use both — Claude Code for daily work, Claw for experimentation and Ollama-based offline sessions.

Is Claw production-ready in 2026?

For individual developers and small teams, yes — Claw v0.5 has been stable enough for daily use in our testing. Bug fixes land within days for active issues, and the core tool-calling and file-editing loop is reliable on macOS and Linux. For enterprise adoption the answer is "not yet" — features like audit log export, SSO, central policy management, and SOC 2 attestation are on the public roadmap but not shipped. If your organization needs those, Claude Code Enterprise or GitHub Copilot Enterprise are the current options; Claw is the right pick when individual flexibility outweighs governance requirements. The active maintainer team and clear roadmap suggest the gap will narrow through 2026.

GamsGo

Save up to 90% on AI tool subscriptions — ChatGPT Plus, Claude Pro, Midjourney and more

Get AI Tool Discounts

Last Updated: April 4, 2026 • Written by: Jim Liu, web developer based in Sydney who has tested 40+ AI coding tools since 2024.

Weekly AI dev-tools email

Hands-on AI tool picks for builders. Free, no spam.

AI Product Research

In-depth SaaS teardowns · Copyable Scores

Written by Jim Liu

Full-stack developer in Sydney. Hands-on AI tool reviews since 2022. Affiliate disclosure

Sponsored

Ad served by Adsterra. OpenAIToolsHub is not responsible for advertiser content.