Goose by Block — Open-Source AI Agent Review (Apache 2.0, Any LLM)

What Is Goose and Why Did Block Build It?

Goose started as an internal tool at Block. Their engineering teams needed an AI agent that could automate entire workflows — running tests, debugging failures, modifying code across files — not just autocomplete suggestions in an editor. Rather than license a third-party tool and deal with data leaving their infrastructure, they built one and eventually open-sourced it under Apache 2.0.

The repository (github.com/block/goose) has accumulated roughly 27,000 stars and more than 350 contributors. That contributor count matters — it means the project is not just Block engineers maintaining their internal tool. Community members have added model integrations, shared recipes, and built MCP server plugins that Block's original team never planned for.

The pitch is straightforward: a coding agent that runs on your machine, works with whatever LLM you prefer, and automates multi-step engineering tasks through a composable extension system. No vendor lock-in, no subscription, no code uploaded to third-party servers. For context on the broader category Goose belongs to, our agentic AI tools explained guide covers the planning, tool use, and self-correction capabilities that define this class of software.

How Goose Actually Works

Goose ships as both a desktop application and a CLI tool. The desktop app gives you a GUI where you type tasks in natural language and watch Goose work through them. The CLI does the same thing in your terminal. Both connect to whichever LLM you configure — you paste in an API key for Claude, GPT, Gemini, or point it at a local Ollama instance.

When you give Goose a task, it does not just generate code and hand it back. It plans a sequence of actions, executes them (reading files, running shell commands, editing code), observes the results, and iterates. If a test fails after a code change, Goose reads the error, adjusts the code, and re-runs the test. This loop continues until the task is complete or Goose decides it needs your input.

In my experience, this works well for contained tasks: fixing a failing test, adding a new API endpoint following existing patterns, or refactoring a function. It struggles with tasks that require understanding broad architectural context across many files — which is true of most AI coding agents, though the degree varies by model.

Installation and First Run

Getting started takes about five minutes. Install via Homebrew (brew install block/tap/goose) or download the desktop app. Add your API key to the config file. Point Goose at a project directory and give it a task. The first interaction feels responsive — Goose starts planning within seconds.

The learning curve for basic usage is gentle. Where it gets steep is when you start customizing: writing recipes, configuring MCP servers, or tuning model parameters. That part takes days, not minutes.

How Do MCP Servers and the Recipes System Work?

Goose uses MCP (Model Context Protocol) servers as its extension mechanism. An MCP server is a small service that gives Goose access to a specific capability — reading a database, interacting with a Jira board, querying your monitoring system, or fetching documentation. You configure which MCP servers Goose can access, and it calls them as needed during task execution.

This is the same protocol that Claude Code uses natively, which makes Goose one of the few open-source agents with real MCP support. The ecosystem of available MCP servers is growing — there are community-built servers for GitHub, Slack, PostgreSQL, and dozens of other tools.

Recipes: Reusable Workflow Macros

Recipes are YAML files that define multi-step workflows. A recipe might say: “Run the test suite, collect failures, fix each failing test, re-run to confirm, then create a commit.” You trigger a recipe with a single command and Goose executes the entire sequence.

The community has published hundreds of recipes on GitHub. Some are simple (format all files, update dependencies), others are ambitious (full CI pipeline debugging, automated code review with specific style guidelines). Writing your own requires understanding Goose's action model and YAML syntax — it took me about two days before I could write recipes that reliably did what I intended.

Recipes are where Goose differentiates itself from tools like Claude Code or Cursor. Neither of those has an equivalent “saved workflow” system. If you find yourself repeating the same multi-step task regularly, a Goose recipe can automate it. Whether that automation is worth the setup effort depends on how often you repeat the task.

Why Does Model Choice Matter So Much?

This is the part that most reviews of Goose gloss over, and it is the most important thing to understand before you commit time to it. Goose is a shell around an LLM. The quality of everything it does — planning, code generation, debugging, recipe execution — depends almost entirely on which model you connect.

In my testing, Claude Opus and GPT-4o produced results comparable to what you get from Claude Code or Cursor. They handled multi-file edits, understood project context from file reads, and recovered from errors intelligently. Claude Sonnet was slightly below that — functional for most tasks but occasionally needed nudging on complex refactors.

Local models through Ollama were a different story. Llama 3 (70B) managed simple single-file tasks but fell apart on anything requiring coordination across files. Smaller models (7B, 13B) were not useful for real engineering work — they generated plausible-looking code that failed tests and couldn't recover from the failures.

The implication: if you use Goose with a top-tier cloud model, your API costs run $5-20/month for moderate usage. At that point, the cost advantage over Claude Code ($20/month flat) or Cursor ($20/month) is modest to nonexistent. Goose's real advantage is flexibility and privacy, not price — unless you happen to have a local model that performs well enough for your use case.

How Does Goose Compare to Claude Code, Cursor, and Codex?

The comparison is not as simple as “which one writes better code.” These tools make fundamentally different trade-offs between control, convenience, and capability. Here is how they stack up on the dimensions that actually affect daily work.

Feature	Goose	Claude Code	Cursor	OpenAI Codex
Price	Free (+ API costs ~$5-20/mo)	$20/mo (Max plan)	$20/mo Pro	API usage-based
License	Apache 2.0 (open-source)	Proprietary	Proprietary	Proprietary
LLM Support	Any (Claude, GPT, Gemini, Ollama)	Claude models only	Multiple (built-in)	OpenAI models only
Interface	Desktop app + CLI	Terminal CLI	GUI (VS Code-based)	Web + CLI
MCP Server Support	Yes (core feature)	Native (full ecosystem)	Basic	No
Workflow Macros	Yes (recipes, YAML)	No (manual prompts)	Task templates	No
Runs Locally	Yes (fully local)	Yes (API calls to Anthropic)	Partial (cloud agents)	Cloud only
Offline Capable	Yes (with local LLM)	No	No	No
Autonomous Task Quality	Model-dependent	Strong	Strong (GUI-guided)	Moderate
GitHub Stars	~27K	N/A (closed source)	N/A (closed source)	N/A (closed source)

The pattern is clear: Goose wins on openness and flexibility. You own your data, you pick your model, you extend it however you want. Claude Code and Cursor win on out-of-the-box quality — their tight integration with specific models means less configuration and more consistent results.

For a deeper look at how Claude Code and Codex compare directly, see our Codex vs Claude Code comparison. And if you are evaluating Cursor specifically, our Cursor 3 review covers the latest agent-first interface.

What Are the Honest Downsides of Goose?

I want to be direct about what does not work well, because the GitHub README and community discussions tend to focus on the happy path.

✕ Output quality is a dice roll based on your model. If you connect a weaker model to save money, Goose becomes noticeably worse than paid alternatives. The tool itself does not compensate for model limitations — it faithfully passes whatever the model produces.
✕ Terminal comfort is mandatory. The desktop app exists, but power features like recipes, MCP server configuration, and debugging Goose itself all happen in config files and the terminal. If you are not comfortable editing YAML and reading logs, you will hit a wall.
✕ The recipe system has a real learning curve. Writing a recipe that reliably does what you intend takes trial and error. YAML syntax issues, action ordering problems, and unclear error messages make the first few days frustrating. The documentation is improving but still has gaps.
✕ No built-in IDE integration. Unlike Cursor (which is an IDE) or Claude Code (which has VS Code extensions), Goose runs alongside your editor. You switch between your editor and Goose's interface, which adds context-switching overhead that adds up over a full day.
✕ Community support, not enterprise support. If something breaks, you file a GitHub issue. There is no paid support tier, no SLA, no account manager. For solo developers and small teams, this is fine. For organizations with compliance requirements, it is a real gap.

None of these are dealbreakers for the right user. But they mean Goose is not a drop-in replacement for Claude Code or Cursor — it is a different kind of tool with different trade-offs. The developers who love Goose are the ones who value control over convenience and are willing to invest setup time for long-term flexibility.

Who Should Actually Use Goose?

After a week of daily use, I think Goose fits three profiles well and one poorly.

Good fit:

✓ Developers who care about data privacy. If your code cannot leave your machine — government work, healthcare, financial services — Goose with a local LLM is one of the few viable options for an AI coding agent.
✓ Teams that want to standardize workflows. The recipes system is genuinely useful for teams where everyone should follow the same debugging or deployment process. Write the recipe once, share it across the team.
✓ Tinkerers and tool-builders. If you enjoy configuring and extending your development tools, Goose gives you more surface area to work with than any closed-source alternative. MCP servers, recipes, and the open codebase mean you can modify anything.

Not a good fit:

✕ Developers who want the strongest coding AI with zero setup. Claude Code and Cursor both work out of the box with minimal configuration and produce consistently strong results. Goose requires more setup time and delivers variable quality depending on your model choice.

For a broader view of the current AI coding tool market, including where Goose fits alongside a dozen other options, check our AI coding tools comparison guide. Another open-source agent worth comparing is NousResearch's Hermes Agent review — it takes a different approach with a 3-layer memory system and 40+ built-in tools for background automation tasks.

How We Tested

We ran Goose v0.9.x (latest stable) from March 31 to April 6, 2026 on two projects: a Next.js 15 SaaS application (~6K lines) and a Python FastAPI service (~3K lines). Models tested: Claude Opus 4.6, GPT-4o, Claude Sonnet 4.5, and Llama 3 70B via Ollama. Each task was run with at least two different models for comparison. API costs over the testing period totaled approximately $14 across all cloud models. Desktop app (macOS) and CLI (macOS + Linux) were both used.

FAQ

Is Goose by Block completely free?

Goose itself is free and open-source under Apache 2.0. However, you need to provide your own LLM API key. Light usage with Claude or GPT typically costs $5-20/month in API fees. You can also run it with free local models through Ollama, making the total cost zero.

What LLMs does Goose support?

Goose works with virtually any LLM: Claude (Anthropic), GPT (OpenAI), Gemini (Google), and local models via Ollama. You configure your preferred model and API key, then Goose handles the rest. Output quality varies significantly — Claude Opus and GPT-4o produce the strongest results, while smaller local models struggle with complex multi-step tasks.

How does Goose compare to Claude Code?

Goose is model-agnostic and free (Apache 2.0), while Claude Code is locked to Anthropic models at $20/month. Claude Code has deeper autonomous reasoning and tighter integration out of the box. Goose offers more flexibility — you choose your model, customize workflows with recipes, and keep all data local. For control, Goose wins. For raw coding performance, Claude Code currently leads.

What are Goose recipes?

Recipes are YAML-based workflow macros that define multi-step tasks Goose can execute automatically. For example, a recipe could run your test suite, analyze failures, fix the code, and re-run tests. The community has shared hundreds of recipes on GitHub, but writing custom ones takes a few days to learn well.

Does Goose work offline?

Yes, if you run a local LLM through Ollama or similar tools. Goose itself runs entirely on your machine — no data leaves your laptop unless you choose a cloud API like OpenAI or Anthropic. This makes it one of the few AI coding agents suitable for air-gapped or high-security environments.

What is Goose AI agent and why did Block (Square) build it?

Goose is an open-source agentic CLI built by Block (formerly Square) for internal engineering use, then released publicly under Apache 2.0 in late 2024. Block engineers needed an AI agent that could run autonomously across files, execute shell commands, and operate against any LLM — not a vendor-locked tool. By open-sourcing Goose, Block followed a build-in-public pattern similar to Facebook React or Square Cash UI Kit: attract a developer ecosystem, get external contributions, and standardize internal and external tooling on the same codebase. The project now has roughly 27,000 GitHub stars and 350+ contributors.

Is Goose free to use?

Yes, Goose itself is free and open-source under Apache 2.0 — no subscription, no usage caps on the tool. You bring your own LLM API key, so total cost depends on which model you connect: Anthropic Claude Sonnet runs roughly $3 per million input tokens, OpenAI GPT-4o around $2.50, Google Gemini Pro $1.25, and Ollama local models cost nothing beyond electricity. For moderate solo developer usage with a cloud model, expect $5-20 per month in LLM API fees. With a local model through Ollama, total cost is zero.

Goose vs Claude Code CLI — which is more flexible?

Goose is more flexible on model choice and extensibility: it runs against Anthropic, OpenAI, Google, Ollama, or any OpenAI-compatible API, and the entire codebase is open under Apache 2.0 so you can fork and modify. Claude Code is more polished and tuned for Anthropic models specifically, with a curated skills marketplace and tight MCP integration. For developers who want maximum control over their stack — model swapping, local execution, custom recipes — Goose wins. For developers who want the strongest single-provider experience with less configuration, Claude Code wins. Both support MCP servers natively as of 2026.

How do I install Goose on Mac, Linux, or Windows?

On Mac, the easiest path is Homebrew: brew install block/tap/goose installs the latest stable release in about 30 seconds. Cross-platform installation uses pip: pip install goose-ai works on Mac, Linux, and Windows with Python 3.10 or newer. Alternatively, download a prebuilt release binary from github.com/block/goose/releases for offline installs or air-gapped environments. After install, run goose configure to set your default LLM provider and API key. The desktop app is a separate download from goose.ai for users who prefer a GUI.

What models does Goose support?

Goose supports any LLM with an API: Anthropic Claude (Sonnet 4.5, Opus 4.6), OpenAI GPT-4o and GPT-5 series, Google Gemini Pro and Flash, Ollama local models (Llama 3, Mistral, DeepSeek, Qwen), and any custom OpenAI-compatible endpoint including self-hosted vLLM and LM Studio. Model selection happens in the config file or via goose configure. Output quality varies dramatically: top-tier cloud models (Claude Opus, GPT-4o) match Claude Code and Cursor on complex tasks, while smaller local models struggle beyond single-file edits.

Can Goose run as a long-lived agent for autonomous tasks?

Yes. Goose supports a continuous mode (goose run --continuous) that keeps the agent alive across multiple steps without per-action approval, useful for batch refactors, CI debugging, or overnight code generation. Budget caps prevent runaway token spend: goose run --budget 1.00 stops at one dollar of API spend. Checkpoint saves let you resume after interruptions without losing state. For production autonomous workflows, pair continuous mode with MCP servers that gate sensitive actions (writes, deploys) so the agent cannot execute high-risk steps without human approval.

Is Goose safe for proprietary code?

Goose runs entirely on your machine — the agent itself never sends code to Block or any third party. Prompts and code snippets are sent only to the LLM provider you choose (Anthropic, OpenAI, Google) per their data policies. For maximum privacy, run Goose with Ollama and a local model like Llama 3 70B: nothing leaves your laptop. Block also offers a hosted instance with SOC 2 Type II certification for enterprise teams that prefer managed infrastructure. For regulated industries (healthcare, government, finance), the Goose + Ollama local-only configuration is one of the few viable AI coding agent setups.

What is Goose MCP server support like in 2026?

Goose has first-class MCP (Model Context Protocol) support as of 2026 — it can both consume MCP servers as tools and expose itself as an MCP server for other agents to call. Block ships official MCP servers for common developer tasks: GitHub issue management, PostgreSQL queries, Slack messaging, and Jira integration. The community has built 100+ additional MCP servers covering monitoring (Datadog, Grafana), cloud providers (AWS, GCP), and developer tools (Linear, Notion). Configuration is straightforward — add the MCP server endpoint to your goose config and the tool becomes available to the agent.

Last Updated: April 7, 2026 • Written by: Jim Liu, web developer based in Sydney who has tested 40+ AI coding tools since 2024.

Continuous mode cost, MCP ecosystem, and the local-only privacy setup

Two Goose details rarely get measured. First, continuous mode: running goose run --continuous --budget 2.00 lets the agent iterate unattended until it hits a dollar cap, so a 40-turn refactor on Claude Sonnet typically lands around $0.60 to $1.20 with per-action latency of roughly 3 to 6 seconds, most of it model round-trip rather than Goose overhead. Cline and Claude Code lack a built-in hard budget ceiling, so unattended loops there need external token watching. Second, the MCP ecosystem: Goose ships official MCP servers for GitHub, PostgreSQL, Slack, and Jira, and the community has built 100+ more, a wider out-of-the-box catalog than Claude Code or Cline expose today.

For regulated industries the decisive setup is Goose plus a local Ollama model (Llama 3 70B or Qwen 2.5 Coder): the agent runs on your machine and no prompt or code leaves the laptop, one of the few genuinely air-gapped agent configurations available. That is the column most comparison posts skip entirely.

Keep comparing agents and models: