Skip to main content

Captions Teardown — Gaurav Misra's $60M Pivot from Captions to AI Avatars

Copyable to YOU

Sign in with Google to see your personal Copyable Score - a 5-dimension breakdown of how likely you (with your budget, tech stack, channels, network, and timing) can replicate this product.

Captions Teardown — Gaurav Misra's $60M Pivot from Captions to AI Avatars

TL;DR

Captions started as the simplest possible product — auto-generated subtitles for vertical videos on iOS. Four years later it has raised roughly $60M from Sequoia, a16z, and Index, ships an AI Studio that generates fully synthetic talking-head videos from a script, and clears an estimated $2.5M MRR (~$30M ARR). The interesting story is not the revenue. It is that Gaurav Misra, an ex-Snap engineering lead, took the riskiest unforced pivot in his category: he moved from a working consumer captions tool with millions of installs to a generative AI avatar product, and did it before the market was forcing him to.

If you are trying to clone this, the literal product is already taken. The wedge that is not taken is vertical AI video for one specific creator niche — fitness coaches, therapists, real estate agents, financial advisors, dentists — with avatar libraries and script templates tuned to that profession's content patterns.

Quick Facts

Product Captions (captions.ai)
Founded 2021
Founder Gaurav Misra (CEO), Dwight Churchill (CTO)
HQ New York, NY
Total raised ~$60M (Sequoia lead Series C, a16z, Index, Kleiner Perkins)
Estimated MRR ~$2.5M ($30M ARR)
Pricing Free / Pro $9.99/mo / Pro Annual $69.99/yr / Max plan for AI Studio
Platform iOS-first, Mac app, web companion
Wedge Mobile-native AI video creation for solo creators

The Product

Captions is two products stitched together that share a single iOS app and a single subscription.

Product one is the original Captions: import a vertical video, get word-level animated subtitles styled to match TikTok, Reels, and Shorts conventions. There is a B-roll feature that auto-suggests stock footage, a teleprompter mode that runs on your phone screen while you record, and a noise-removal step that runs locally on device. This is the thing that got the first five million installs.

Product two is AI Studio, launched in late 2023. You type a script. You pick an "AI Creator" — a synthetic talking-head avatar trained from real captured performances. The system generates a video of that avatar reading your script with lip-sync, gesture, eye-contact, and B-roll cuts automatically inserted. There is also an "AI Edit" mode that takes raw footage of a real person a

Sign in to read this report

You have read your 1 free report. Sign in with Google to unlock 2 more.

Sign in with Google