Skip to main content
AI Tool ReviewOpen Source

DeerFlow Review: ByteDance's Open-Source AI Agent Framework With Multi-Agent Orchestration

ByteDance — the company behind TikTok — quietly open-sourced DeerFlow, a multi-agent orchestration framework that hit 25,000+ GitHub stars in a matter of weeks. It coordinates specialized AI agents under a supervisor, runs code in sandboxed containers, and maintains persistent state memory across sessions. Here is what it actually delivers.

OpenAI Tools Hub Team··14 min read

TL;DR

  • Built by ByteDance. MIT licensed. 25,000+ GitHub stars within weeks of release.
  • Multi-agent orchestration: supervisor agent delegates to specialized sub-agents (researcher, coder, reporter) that work in parallel or sequence.
  • Sandboxed code execution via Docker containers or Jupyter kernels — agent-generated code runs isolated from your host system.
  • Persistent state memory: agents retain context across sessions through a structured memory store, not just the LLM context window.
  • Supports OpenAI, Anthropic, Google, and Ollama backends — each sub-agent can use a different model.
  • Strong for deep research and complex automation. Less suited for quick inline coding tasks where Claude Code or Cursor are faster.
  • Honest caveat: documentation is still catching up to the codebase, and the learning curve for custom agent configuration is steep.

What Is DeerFlow?

DeerFlow is a multi-agent orchestration framework that ByteDance released as open source under the MIT license. The name comes from its internal project codename at ByteDance's AI research division. Unlike single-agent tools that send one prompt to one model and return one response, DeerFlow coordinates multiple specialized agents — each with its own role, tools, and memory — under a supervisor agent that manages the overall workflow.

The project reached 25,000 GitHub stars remarkably fast, which reflects both ByteDance's engineering reputation and genuine developer interest in multi-agent systems. The star count alone does not tell you whether the tool is production-ready — it tells you people are paying attention.

At its core, DeerFlow is designed for tasks that are too complex for a single LLM call: deep research spanning multiple sources, multi-step data analysis pipelines, code generation that requires separate planning and execution phases, and report generation that synthesizes information from different domains. The framework handles the coordination overhead that you would otherwise build yourself with prompt chaining and custom orchestration scripts.

Architecture: How Multi-Agent Orchestration Works

DeerFlow's architecture follows a hierarchical pattern. A supervisor agent sits at the top and receives your task. It breaks the task into sub-tasks and delegates each to a specialized sub-agent:

  • Researcher agent: Handles web search, content retrieval, and source synthesis. It crawls, extracts, and summarizes information from multiple sources in parallel.
  • Coder agent: Writes, debugs, and executes code in a sandboxed environment. It receives specifications from the supervisor and returns executable results.
  • Reporter agent: Takes raw outputs from other agents and formats them into structured reports, summaries, or documents with citations.
  • Custom agents: You can define your own sub-agents with custom system prompts, tool access, and model configurations. A "data analyst" agent that only has access to a Jupyter kernel and specific datasets, for example.

The supervisor does not simply fire-and-forget. It monitors sub-agent progress, handles failures by retrying or reassigning tasks, and synthesizes final outputs. If the researcher agent returns incomplete data, the supervisor can ask it to dig deeper or assign the gap to a different agent. This loop-based coordination is what separates DeerFlow from simpler prompt-chaining approaches.

Communication between agents happens through a structured message bus — not raw text passing. Each message carries metadata about its source agent, confidence level, and data type. This matters when the supervisor needs to resolve conflicting information from two sub-agents (for instance, a researcher finding one price point and a coder scraping a different one from an API).

Key Features

Sandboxed Code Execution

Every code execution task runs inside a Docker container or a Jupyter kernel sandbox. The agent writes code, the sandbox runs it, and the output returns to the agent — without the generated code ever touching your host filesystem or network unless you explicitly allow it. You configure the sandbox restrictions through a YAML file: network access (on/off), filesystem mounts (read-only or read-write), process limits, and execution timeouts.

In practice, the sandbox eliminates the anxiety of letting an AI agent run arbitrary code. During testing, we had the coder agent generate and execute roughly 80 Python scripts across various tasks. None leaked outside the container. The one downside: sandbox startup adds about 2-3 seconds of latency per execution, which is noticeable on tasks that require many rapid iterations.

Persistent State Memory

DeerFlow maintains a state memory store that persists across sessions. This is not just the LLM context window — it is a structured database where agents write and read task-relevant state: discovered facts, intermediate results, user preferences, and prior task outcomes. When you start a new session, agents can retrieve relevant prior state instead of starting from zero.

The memory system uses a combination of key-value storage (for explicit facts) and vector similarity search (for semantic retrieval of past task episodes). It is backed by ChromaDB by default, though you can swap in other vector stores. The practical value scales with usage — after around 30-40 tasks in a domain, retrieval hits become frequent enough to measurably reduce redundant work.

Multi-Model Configuration

Each sub-agent in DeerFlow can use a different LLM backend. This is genuinely useful for cost optimization: route the supervisor (which needs strong reasoning) to GPT-4o or Claude Opus, the researcher (which mostly does search and extraction) to a cheaper model like Claude Haiku or GPT-4o-mini, and the coder to Claude Sonnet (which benchmarks well on code tasks). The configuration is per-agent in the workflow YAML file.

Deep Research Mode

DeerFlow ships with a built-in deep research workflow that orchestrates the researcher agent through a multi-step process: generate search queries, crawl top results, extract and cross-reference key claims, identify gaps, run follow-up searches, and compile a structured research report with citations. This mirrors what a human researcher does, but executed in 3-5 minutes rather than 2-3 hours.

The research output quality depends heavily on the LLM backend. With GPT-4o or Claude Sonnet, the reports were structured, well-cited, and caught nuances we expected to be missed. With smaller local models, the research quality dropped noticeably — the agent would miss contradictions between sources or fail to synthesize conflicting data points.

Workflow Templates

DeerFlow includes pre-built workflow templates for common multi-agent patterns: research-and-report, code-review-and-fix, data-analysis-pipeline, and competitive-analysis. These templates are YAML files that define the agent roles, their tools, the execution flow, and the output format. You can use them as-is or customize them as starting points for your own workflows.

DeerFlow vs Claude Code vs Manus AI vs AutoGen

FactorDeerFlowClaude CodeManus AIAutoGen
BuilderByteDanceAnthropicMonica.imMicrosoft
LicenseMIT (open source)ProprietaryProprietaryMIT (open source)
Agent modelMulti-agent orchestrationSingle agentMulti-agentMulti-agent
Sandbox executionDocker + JupyterTerminal (unsandboxed)Cloud sandboxDocker (optional)
State memoryPersistent (KV + vector)Session-onlySession-onlyConfigurable
LLM backendsAny (per-agent config)Claude onlyMultipleAny OpenAI-compatible
GitHub stars~25KN/A (proprietary)N/A (proprietary)~38K
Coding strengthGood (sandbox-first)ExcellentGoodModerate
Research strengthExcellentLimitedGoodModerate
Learning curveSteepLowMediumHigh

Sources: DeerFlow GitHub (github.com/bytedance/deerflow), Anthropic Claude Code documentation, Manus AI pricing page, Microsoft AutoGen docs. Data as of March 2026.

The positioning is distinct. Claude Code excels at single-agent coding workflows — fast, reliable, deeply integrated with the terminal. DeerFlow excels at multi-step research and automation where the task naturally decomposes into parallel sub-tasks. AutoGen is the closest architectural competitor (also multi-agent, also open source), but DeerFlow's sandbox execution and built-in research workflow give it an edge for out-of-box usability.

For a broader comparison of agentic AI tools across the market, see our agentic AI tools comparison.

Getting Started With DeerFlow

Step 1: Clone and Install

Clone the repository from github.com/bytedance/deerflow. DeerFlow requires Python 3.10+ and Docker (for sandboxed execution). Run pip install -r requirements.txt to install dependencies. The main packages are langgraph (for agent orchestration), chromadb (for state memory), playwright (for web research), and docker-py (for sandbox management).

Step 2: Configure LLM Backends

Copy .env.example to .env and set your API keys. DeerFlow supports per-agent model configuration in the workflow YAML:

  • OPENAI_API_KEY=sk-... for GPT-4o or GPT-4o-mini
  • ANTHROPIC_API_KEY=sk-ant-... for Claude Sonnet or Opus
  • OLLAMA_BASE_URL=http://localhost:11434 for local models

You can assign different models to different agents in the workflow file. A common pattern: Claude Opus for the supervisor, GPT-4o-mini for the researcher, Claude Sonnet for the coder.

Step 3: Run a Built-in Workflow

Start with the deep research template: python -m deerflow.run --workflow research --task "Analyze the current state of open-source AI agent frameworks". The supervisor will decompose the task, spawn researcher agents in parallel, and compile a structured report. First run typically takes 3-7 minutes depending on task complexity and your LLM backend speed.

Step 4: Build Custom Workflows

Custom workflows are defined in YAML files under workflows/. Each file specifies: agent roles, their system prompts, allowed tools, model backends, and the execution graph (which agents run in parallel vs. sequence). The documentation for workflow authoring has gaps — expect to reference existing templates and the source code for advanced configurations.

Real-World Use Cases

Deep Research and Competitive Analysis

This is DeerFlow's strongest use case. Give it a research question like "Compare the pricing, features, and market positioning of the five leading AI code editors as of March 2026" and it will: generate targeted search queries, crawl relevant pages in parallel, extract pricing data, cross-reference conflicting claims, and produce a structured report with source URLs. Tasks that take a human analyst 3-4 hours complete in under 7 minutes.

The quality of the output is genuinely impressive with frontier models. In our testing with Claude Sonnet as the backbone, the research reports correctly identified pricing changes that occurred within the previous two weeks — something static knowledge-cutoff models would miss.

Multi-Step Data Analysis

The combination of sandboxed Jupyter execution and the coder agent makes DeerFlow a strong choice for data analysis workflows: load a dataset, clean it, run statistical analysis, generate visualizations, and produce a summary. The sandbox means the analysis code runs safely, and the state memory means you can iteratively refine the analysis across multiple sessions without re-running earlier steps.

Automated Content Pipelines

Research + coding + reporting agents working together can automate content pipelines: research a topic, gather data, generate charts or comparison tables, and draft a structured article. We tested this for producing technical comparison content and the raw output was roughly 70% publishable — requiring human editing for tone, accuracy verification, and nuance that the agents missed.

Code Review and Refactoring

The pre-built code-review workflow has one agent read through a codebase, another identify issues and suggest fixes, and a third apply the fixes and run tests in the sandbox. It works reasonably well on well-structured Python and TypeScript codebases. On legacy code with unusual patterns, the review quality dropped — the agents would suggest changes that broke existing behavior because they lacked full project context.

Limitations and Rough Edges

1. Documentation lags significantly behind the codebase

DeerFlow's GitHub README covers installation and basic usage. The workflow YAML format, custom agent configuration, memory system tuning, and sandbox customization are documented sparsely. You will read source code to understand advanced features. ByteDance has acknowledged this gap and is working on comprehensive docs, but as of mid-March 2026, this is the single biggest barrier to adoption.

2. Steep learning curve for custom workflows

Building a custom multi-agent workflow from scratch requires understanding LangGraph execution graphs, DeerFlow's agent configuration schema, tool registration, and memory scope rules. This is not a tool you configure in an afternoon. The pre-built templates are a good starting point, but anything beyond research and code review requires meaningful investment to set up correctly.

3. Sandbox startup latency

Docker container spin-up adds 2-3 seconds per code execution. For tasks that require dozens of iterative code runs, this latency compounds. DeerFlow supports a "warm pool" configuration that keeps containers pre-started, but it consumes more memory. On a 4GB RAM VPS, the warm pool is impractical alongside the agent itself.

4. Multi-agent coordination is not always better

For simple tasks — "write a function that does X" or "fix this bug" — the multi-agent overhead adds complexity without improving output. The supervisor agent, sub-agent delegation, and inter-agent communication add latency and token usage. Claude Code or Cursor will complete a simple coding task in 15 seconds; DeerFlow might take 2 minutes for the same task due to coordination overhead. Multi-agent orchestration pays off on complex, multi-domain tasks — not on everyday coding.

5. Memory management requires tuning

The persistent state memory grows over time. Without periodic cleanup, retrieval becomes slower and less relevant — old task states pollute the similarity search. DeerFlow includes a memory prune command, but there is no automatic cleanup policy. For long-running deployments, you need to build this into your maintenance workflow.

Pricing and Resource Requirements

DeerFlow itself is free (MIT license). Your costs are infrastructure and LLM API calls:

  • Hosting: A $10-20/month VPS (2 vCPU, 4GB RAM) is the minimum for running DeerFlow with Docker sandboxing and no local LLM. The sandbox containers need headroom. For local model inference via Ollama, budget $80-120/month for a GPU-equipped instance.
  • LLM API costs: Multi-agent workflows consume more tokens than single-agent tools because multiple agents are reasoning simultaneously. A typical deep research task with three agents uses roughly 150K-300K tokens total. At GPT-4o pricing ($2.50/M input, $10/M output), that is $1-3 per complex task. At 30 tasks/month, budget $30-90 in API costs.
  • Cost optimization: Route cheap sub-tasks (search, extraction) to GPT-4o-mini ($0.15/M input) and expensive reasoning to a frontier model. This can cut API costs by 40-60% compared to running all agents on GPT-4o.

Total cost for a moderate-use developer: $40-110/month. This is higher than Claude Code or Cursor for simple coding tasks, but the value proposition is different — DeerFlow handles task types those tools cannot.

Reduce Your AI Tool Costs

Running DeerFlow alongside other paid AI subscriptions? GamsGo offers shared subscription slots for services like ChatGPT Plus (~$8/mo), YouTube Premium (~$3/mo), and Spotify Premium (~$2/mo). Legally structured shared plans that cut your monthly stack cost significantly. Use code WK2NU for a discount on your first order.

Check GamsGo Deals

Who Should Use DeerFlow

DeerFlow is a strong fit if you:

  • Need multi-agent orchestration for complex research, analysis, or automation workflows
  • Want sandboxed code execution so agents cannot accidentally damage your systems
  • Run tasks that naturally decompose into parallel sub-tasks (research + coding + reporting)
  • Require persistent state memory across sessions for ongoing projects
  • Want to optimize LLM costs by routing different agents to different model tiers
  • Are comfortable with Python, Docker, and reading source code when documentation falls short

It is not the right choice if you:

  • Primarily need quick inline coding assistance — Claude Code or Cursor are faster and more reliable for this
  • Want a polished, fully-documented product you can set up in under an hour
  • Are not comfortable with Docker, YAML configuration, and Python environments
  • Run mostly simple, single-step tasks where multi-agent coordination adds overhead without value
  • Need a tight budget — the combined VPS + API costs are higher than single-agent alternatives

FAQ

What is DeerFlow and who built it?

DeerFlow is an open-source multi-agent orchestration framework built by ByteDance. It coordinates specialized AI agents (researcher, coder, reporter) under a supervisor agent that decomposes complex tasks into sub-tasks and manages execution. Released under the MIT license, it gathered 25,000+ GitHub stars within weeks of launch.

Is DeerFlow free to use?

Yes. The framework is MIT licensed — completely free to use, modify, and deploy commercially. You pay only for LLM API costs when using cloud providers (OpenAI, Anthropic, Google). Running local models through Ollama eliminates API costs but requires a GPU-capable server ($80-120/month for adequate inference speed on 70B+ models).

How does DeerFlow compare to Claude Code?

They serve different workflows. Claude Code is a single-agent tool optimized for terminal-based coding with fast iteration cycles and deep Anthropic integration. DeerFlow is a multi-agent framework for complex tasks that benefit from parallel agent specialization — research, data analysis, multi-step automation. For quick coding tasks, Claude Code is faster and more reliable. For deep research or multi-domain workflows, DeerFlow delivers results that a single agent cannot easily match.

What LLM models does DeerFlow support?

DeerFlow supports any OpenAI-compatible API endpoint: OpenAI (GPT-4o, o3), Anthropic (Claude Sonnet, Claude Opus), Google (Gemini), and local models via Ollama. A distinctive feature is per-agent model configuration — you can assign different models to different agents to optimize cost and capability.

Can DeerFlow run code safely?

Yes. Code execution runs in Docker containers or Jupyter kernel sandboxes, isolated from your host system. You configure network access, filesystem mounts, and process limits through a YAML configuration file. This is one of DeerFlow's strongest features — it enables autonomous code execution workflows without the security risks of unsandboxed agents.