Skip to main content
AI Tool Review• ~14 min read

Goose by Block — An Open-Source AI Agent That Goes Beyond Code Suggestions

Block (formerly Square) open-sourced an AI coding agent called Goose. It has roughly 27,000 GitHub stars, works with any LLM you want, and costs nothing. I spent a week using it on real projects to see whether “free and flexible” actually holds up against $20/month tools.

TL;DR — Key Takeaways:

  • Goose is a free, open-source AI agent (Apache 2.0) from Block with ~27K GitHub stars and 350+ contributors
  • • Works with any LLM — Claude, GPT, Gemini, or local models via Ollama. You bring your own API key
  • MCP server integration and a recipes system (YAML workflow macros) make it extensible beyond basic code edits
  • • Runs entirely locally — your code never leaves your machine unless you choose a cloud LLM API
  • Downsides: quality depends heavily on your model choice, recipes have a real learning curve, and terminal comfort is required
  • Verdict: compelling for developers who want full control over their AI stack; Claude Code and Cursor still outperform it on raw coding quality

What Is Goose and Why Did Block Build It?

Goose started as an internal tool at Block. Their engineering teams needed an AI agent that could automate entire workflows — running tests, debugging failures, modifying code across files — not just autocomplete suggestions in an editor. Rather than license a third-party tool and deal with data leaving their infrastructure, they built one and eventually open-sourced it under Apache 2.0.

The repository (github.com/block/goose) has accumulated roughly 27,000 stars and more than 350 contributors. That contributor count matters — it means the project is not just Block engineers maintaining their internal tool. Community members have added model integrations, shared recipes, and built MCP server plugins that Block's original team never planned for.

The pitch is straightforward: a coding agent that runs on your machine, works with whatever LLM you prefer, and automates multi-step engineering tasks through a composable extension system. No vendor lock-in, no subscription, no code uploaded to third-party servers.

How Goose Actually Works

Goose ships as both a desktop application and a CLI tool. The desktop app gives you a GUI where you type tasks in natural language and watch Goose work through them. The CLI does the same thing in your terminal. Both connect to whichever LLM you configure — you paste in an API key for Claude, GPT, Gemini, or point it at a local Ollama instance.

When you give Goose a task, it does not just generate code and hand it back. It plans a sequence of actions, executes them (reading files, running shell commands, editing code), observes the results, and iterates. If a test fails after a code change, Goose reads the error, adjusts the code, and re-runs the test. This loop continues until the task is complete or Goose decides it needs your input.

In my experience, this works well for contained tasks: fixing a failing test, adding a new API endpoint following existing patterns, or refactoring a function. It struggles with tasks that require understanding broad architectural context across many files — which is true of most AI coding agents, though the degree varies by model.

Installation and First Run

Getting started takes about five minutes. Install via Homebrew (brew install block/tap/goose) or download the desktop app. Add your API key to the config file. Point Goose at a project directory and give it a task. The first interaction feels responsive — Goose starts planning within seconds.

The learning curve for basic usage is gentle. Where it gets steep is when you start customizing: writing recipes, configuring MCP servers, or tuning model parameters. That part takes days, not minutes.

MCP Servers and the Recipes System

Goose uses MCP (Model Context Protocol) servers as its extension mechanism. An MCP server is a small service that gives Goose access to a specific capability — reading a database, interacting with a Jira board, querying your monitoring system, or fetching documentation. You configure which MCP servers Goose can access, and it calls them as needed during task execution.

This is the same protocol that Claude Code uses natively, which makes Goose one of the few open-source agents with real MCP support. The ecosystem of available MCP servers is growing — there are community-built servers for GitHub, Slack, PostgreSQL, and dozens of other tools.

Recipes: Reusable Workflow Macros

Recipes are YAML files that define multi-step workflows. A recipe might say: “Run the test suite, collect failures, fix each failing test, re-run to confirm, then create a commit.” You trigger a recipe with a single command and Goose executes the entire sequence.

The community has published hundreds of recipes on GitHub. Some are simple (format all files, update dependencies), others are ambitious (full CI pipeline debugging, automated code review with specific style guidelines). Writing your own requires understanding Goose's action model and YAML syntax — it took me about two days before I could write recipes that reliably did what I intended.

Recipes are where Goose differentiates itself from tools like Claude Code or Cursor. Neither of those has an equivalent “saved workflow” system. If you find yourself repeating the same multi-step task regularly, a Goose recipe can automate it. Whether that automation is worth the setup effort depends on how often you repeat the task.

Model Choice Matters More Than You Think

This is the part that most reviews of Goose gloss over, and it is the most important thing to understand before you commit time to it. Goose is a shell around an LLM. The quality of everything it does — planning, code generation, debugging, recipe execution — depends almost entirely on which model you connect.

In my testing, Claude Opus and GPT-4o produced results comparable to what you get from Claude Code or Cursor. They handled multi-file edits, understood project context from file reads, and recovered from errors intelligently. Claude Sonnet was slightly below that — functional for most tasks but occasionally needed nudging on complex refactors.

Local models through Ollama were a different story. Llama 3 (70B) managed simple single-file tasks but fell apart on anything requiring coordination across files. Smaller models (7B, 13B) were not useful for real engineering work — they generated plausible-looking code that failed tests and couldn't recover from the failures.

The implication: if you use Goose with a top-tier cloud model, your API costs run $5-20/month for moderate usage. At that point, the cost advantage over Claude Code ($20/month flat) or Cursor ($20/month) is modest to nonexistent. Goose's real advantage is flexibility and privacy, not price — unless you happen to have a local model that performs well enough for your use case.

Goose vs Claude Code vs Cursor vs Codex

The comparison is not as simple as “which one writes better code.” These tools make fundamentally different trade-offs between control, convenience, and capability. Here is how they stack up on the dimensions that actually affect daily work.

FeatureGooseClaude CodeCursorOpenAI Codex
PriceFree (+ API costs ~$5-20/mo)$20/mo (Max plan)$20/mo ProAPI usage-based
LicenseApache 2.0 (open-source)ProprietaryProprietaryProprietary
LLM SupportAny (Claude, GPT, Gemini, Ollama)Claude models onlyMultiple (built-in)OpenAI models only
InterfaceDesktop app + CLITerminal CLIGUI (VS Code-based)Web + CLI
MCP Server SupportYes (core feature)Native (full ecosystem)BasicNo
Workflow MacrosYes (recipes, YAML)No (manual prompts)Task templatesNo
Runs LocallyYes (fully local)Yes (API calls to Anthropic)Partial (cloud agents)Cloud only
Offline CapableYes (with local LLM)NoNoNo
Autonomous Task QualityModel-dependentStrongStrong (GUI-guided)Moderate
GitHub Stars~27KN/A (closed source)N/A (closed source)N/A (closed source)

The pattern is clear: Goose wins on openness and flexibility. You own your data, you pick your model, you extend it however you want. Claude Code and Cursor win on out-of-the-box quality — their tight integration with specific models means less configuration and more consistent results.

For a deeper look at how Claude Code and Codex compare directly, see our Codex vs Claude Code comparison. And if you are evaluating Cursor specifically, our Cursor 3 review covers the latest agent-first interface.

The Honest Downsides

I want to be direct about what does not work well, because the GitHub README and community discussions tend to focus on the happy path.

  • Output quality is a dice roll based on your model. If you connect a weaker model to save money, Goose becomes noticeably worse than paid alternatives. The tool itself does not compensate for model limitations — it faithfully passes whatever the model produces.
  • Terminal comfort is mandatory. The desktop app exists, but power features like recipes, MCP server configuration, and debugging Goose itself all happen in config files and the terminal. If you are not comfortable editing YAML and reading logs, you will hit a wall.
  • The recipe system has a real learning curve. Writing a recipe that reliably does what you intend takes trial and error. YAML syntax issues, action ordering problems, and unclear error messages make the first few days frustrating. The documentation is improving but still has gaps.
  • No built-in IDE integration. Unlike Cursor (which is an IDE) or Claude Code (which has VS Code extensions), Goose runs alongside your editor. You switch between your editor and Goose's interface, which adds context-switching overhead that adds up over a full day.
  • Community support, not enterprise support. If something breaks, you file a GitHub issue. There is no paid support tier, no SLA, no account manager. For solo developers and small teams, this is fine. For organizations with compliance requirements, it is a real gap.

None of these are dealbreakers for the right user. But they mean Goose is not a drop-in replacement for Claude Code or Cursor — it is a different kind of tool with different trade-offs. The developers who love Goose are the ones who value control over convenience and are willing to invest setup time for long-term flexibility.

Who Should Actually Use Goose?

After a week of daily use, I think Goose fits three profiles well and one poorly.

Good fit:

  • Developers who care about data privacy. If your code cannot leave your machine — government work, healthcare, financial services — Goose with a local LLM is one of the few viable options for an AI coding agent.
  • Teams that want to standardize workflows. The recipes system is genuinely useful for teams where everyone should follow the same debugging or deployment process. Write the recipe once, share it across the team.
  • Tinkerers and tool-builders. If you enjoy configuring and extending your development tools, Goose gives you more surface area to work with than any closed-source alternative. MCP servers, recipes, and the open codebase mean you can modify anything.

Not a good fit:

  • Developers who want the strongest coding AI with zero setup. Claude Code and Cursor both work out of the box with minimal configuration and produce consistently strong results. Goose requires more setup time and delivers variable quality depending on your model choice.

For a broader view of the current AI coding tool market, including where Goose fits alongside a dozen other options, check our AI coding tools comparison guide.

How We Tested

We ran Goose v0.9.x (latest stable) from March 31 to April 6, 2026 on two projects: a Next.js 15 SaaS application (~6K lines) and a Python FastAPI service (~3K lines). Models tested: Claude Opus 4.6, GPT-4o, Claude Sonnet 4.5, and Llama 3 70B via Ollama. Each task was run with at least two different models for comparison. API costs over the testing period totaled approximately $14 across all cloud models. Desktop app (macOS) and CLI (macOS + Linux) were both used.

FAQ

Is Goose by Block completely free?

Goose itself is free and open-source under Apache 2.0. However, you need to provide your own LLM API key. Light usage with Claude or GPT typically costs $5-20/month in API fees. You can also run it with free local models through Ollama, making the total cost zero.

What LLMs does Goose support?

Goose works with virtually any LLM: Claude (Anthropic), GPT (OpenAI), Gemini (Google), and local models via Ollama. You configure your preferred model and API key, then Goose handles the rest. Output quality varies significantly — Claude Opus and GPT-4o produce the strongest results, while smaller local models struggle with complex multi-step tasks.

How does Goose compare to Claude Code?

Goose is model-agnostic and free (Apache 2.0), while Claude Code is locked to Anthropic models at $20/month. Claude Code has deeper autonomous reasoning and tighter integration out of the box. Goose offers more flexibility — you choose your model, customize workflows with recipes, and keep all data local. For control, Goose wins. For raw coding performance, Claude Code currently leads.

What are Goose recipes?

Recipes are YAML-based workflow macros that define multi-step tasks Goose can execute automatically. For example, a recipe could run your test suite, analyze failures, fix the code, and re-run tests. The community has shared hundreds of recipes on GitHub, but writing custom ones takes a few days to learn well.

Does Goose work offline?

Yes, if you run a local LLM through Ollama or similar tools. Goose itself runs entirely on your machine — no data leaves your laptop unless you choose a cloud API like OpenAI or Anthropic. This makes it one of the few AI coding agents suitable for air-gapped or high-security environments.

Last Updated: April 7, 2026 • Written by: Jim Liu, web developer based in Sydney who has tested 40+ AI coding tools since 2024.

Written by Jim Liu

Full-stack developer in Sydney. Hands-on AI tool reviews since 2022. Affiliate disclosure