Skip to main content

Synthesia Teardown — Enterprise AI Avatar Video ($150M ARR, $4B London Unicorn)

Copyable to YOU

Sign in with Google to see your personal Copyable Score - a 5-dimension breakdown of how likely you (with your budget, tech stack, channels, network, and timing) can replicate this product.

Synthesia Teardown — Enterprise AI Avatar Video ($150M ARR, $4B London Unicorn)

Last updated: 2026-05-16 · Researched via Sacra, Contrary Research, TechCrunch, CNBC, Synthesia blog, UCL News, Matt Turck podcast, Accel podcast, G2 reviews

TL;DR

Synthesia turns a typed script into a video of a photorealistic human saying that script. Not a cartoon. Not a robotic talking head from 2021. An actual presenter — gestures, micro-expressions, lip sync in 140+ languages — generated in roughly the time it takes to grab coffee. They hit $100M ARR in April 2025, then $146M by September 2025 per Sacra's tracking, and closed a $200M Series E at $4B valuation in January 2026 led by Google Ventures with NVIDIA, Accel, and NEA piling on. They reportedly turned down a $3B acquisition offer from Adobe somewhere in mid-2025 — that's the kind of company this is.

The reason $150M ARR matters more than the headline number: it came from selling to enterprises. About 70% of revenue is enterprise deals. They serve 65,000+ businesses including 90% of the Fortune 100 — AWS, Bosch, Merck, SAP, Heineken, Zoom, Reuters. Average enterprise customer generates videos in 7 different languages. 40% of all videos rendered are translated versions of an original. That detail tells you the actual business: corporate L&D and internal comms teams who used to spend $5K-15K per training video on a studio shoot, now spinning up 30 versions for global rollout at near-zero marginal cost.

The moat sits in three uncomfortable places for would-be cloners. First, deep research: two of the four founders are full professors (Lourdes Agapito, UCL; Matthias Niessner, TUM) whose academic work on monocular non-rigid 3D reconstruction and neural face rendering is the literal foundation of the product. Second, NVIDIA Blackwell GPU access at a scale most startups can't touch — they run their own training pipeline (EXPRESS-1, now EXPRESS-2) on Google Cloud, not just calling someone's API. Third, the enterprise sales motion — SOC 2 Type II, ISO 42001, SCORM exports, SSO, branching scenarios, embedded quizzes, LMS integrations. None of that is sexy. All of it is required to get on a Fortune 500 procurement contract.

If you're thinking of cloning this: you can build a $5K MRR competitor with HeyGen-tier API wrappers. You probably can't build a $150M ARR competitor without research-grade ML talent or a $50M+ war chest. The interesting question is whether there's a wedge — a vertical, a workflow, a language market — where a focused indie team can carve out $30K-100K MRR before Synthesia or HeyGen notices. Spoiler: yes, but the window is closing.

Quick Facts

Field Detail
Website https://synthesia.io
Positioning "AI video platform for the enterprise" — text-to-video with photorealistic avatars
Founders Victor Riparbelli (CEO), Steffen Tjerrild (COO), Lourdes Agapito (UCL Professor), Matthias Niessner (TUM Professor)
Founded 2017, London
Headcount ~600 employees across 7 countries (London HQ, Amsterdam, Copenhagen, Munich, NYC, Zurich)
Funding $356M total. Series E $200M @ $4B (Jan 2026, GV-led). Earlier: Series D $180M @ $2.1B (Jan 2025), Series C $90M @ $1B (Jun 2023, Accel + NVIDIA + Kleiner Perkins)
Customers 65,000+ businesses. 90% of Fortune 100, 70% of FTSE 100. AWS, Bosch, Merck, SAP, Heineken, Zoom, Reuters
ARR ~$146M (Sep 2025, Sacra). Up from $88M end-2024. Targeting $200M+ in 2026
Revenue mix ~70% enterprise, ~30% SMB/self-serve. ~50% US, 50% Europe/Asia
Notable behavior Avg enterprise customer creates content in 7 languages. 40% of videos are translated versions. Turned down rumored $3B Adobe acquisition (mid-2025)
Acquisitions None disclosed. Strategic relationship with Adobe Ventures (April 2025)

The 5-Minute Product Walkthrough

I spent about 30 minutes in the free trial. Here's what actually happens.

You land on a dashboard that, honestly, looks like Google Slides had a baby with Loom. Big "Create video" button top-left. Templates panel on the right — "Onboarding," "Product update," "Sales pitch," "Compliance training." Pick a blank canvas instead because I want to see the raw flow.

The editor is the interesting bit. Each "scene" is a slide. On the left, you pick an avatar from a gallery — at the free tier I counted nine stock avatars (Synthesia's full library is north of 240). On the right, you type a script into a text box that's literally captioned "Type or paste your script here." You can pick a voice (different from the avatar — you can put any voice on any face, which is a small but useful detail), set a language from a dropdown of 140+, and choose camera framing.

Hit "Generate." A progress bar appears. For a 60-second clip the rendering took roughly 3 minutes on the free tier — felt slow if you're used to thinking of "AI video" as instant, perfectly reasonable if you remember a real video shoot is two days of pre-production. Enterprise customers reportedly get faster queue priority but I couldn't verify this from the trial.

The output is striking. Lip sync is genuinely good — visibly better than D-ID or Wav2Lip-tier open source. Micro-expressions (eyebrow raises, slight head tilts on emphasized words) are the giveaway that this is Synthesia 2.0's "Express" pipeline and not the 2022 version. The avatar I picked had a small breathing motion in idle frames — a tell that they're not just rendering keyframes from a mouth model but synthesizing continuous body language.

Downsides I noticed in 20 minutes: the avatars still feel ever-so-slightly stiff in wide shots. Hand gestures are deliberately conservative — Synthesia clearly chose "safe and slightly boring" over "expressive but uncanny." The pricing wall is also aggressive — even simple features like custom avatars or 1-click translation are gated behind Creator ($89/mo) or Enterprise (Contact Sales). The free plan caps at 3 minutes/month and watermarks output, which is the right move for them but means tire-kickers don&#

Sign in to read this report

You have read your 1 free report. Sign in with Google to unlock 2 more.

Sign in with Google