Tavus Teardown — Real-Time Conversational AI Video API ($8M+ ARR, Sequoia-Backed)
Copyable to YOU
Sign in with Google to see your personal Copyable Score - a 5-dimension breakdown of how likely you (with your budget, tech stack, channels, network, and timing) can replicate this product.
Tavus Teardown — Real-Time Conversational AI Video API ($8M+ ARR, Sequoia-Backed)
Researched May 16, 2026 — sources cited inline. Numbers are approximate; ARR is triangulated from team size, pricing tiers, public customer logos, and the Series B round size.
TL;DR
Tavus is the company that figured out how to make an AI avatar actually talk back to you on a Zoom call — not the pre-rendered, "we generated a 30-second clip from your script" thing Synthesia and HeyGen sell. You sign up, point their CVI (Conversational Video Interface) endpoint at a system prompt, and within about ten minutes you have a video agent that listens, reasons, and responds with a lip-synced face in under 600 milliseconds. That latency number is the whole product.
Founded in 2020 by Hassaan Raza, Quinn Favret, and Rishabh Dhar (YC S21, not W21 as commonly listed — the YC profile confirms S21), the company spent its first three years selling rendered personalized video for sales outreach. That business hit roughly $1.9M ARR by end of 2023 with 24 people, which is a respectable but unspectacular bootstrapped trajectory. The pivot happened in early 2024: instead of rendering videos for marketers, they started shipping an API for developers building voice + face agents. Scale Venture Partners led an $18M Series A in March 2024 with Sequoia, YC, and HubSpot Ventures piling in. Eighteen months later, November 2025, CRV led a $40M Series B. That's about $64M total raised, and you don't get a Series B at that size unless your ARR is comfortably north of $5-10M with the right growth shape.
The model they're betting on is what they call "human computing" — the idea that the next interface layer after chat (text) and voice (Sesame, ElevenLabs, Vapi) is face-to-face. They have a paper claiming 15x retention vs voice-only agents, which I'm slightly skeptical of (it's their own benchmark) but directionally feels right. People watching a face will sit through more friction than people listening to a disembodied voice. The technology is real: Phoenix-4 for rendering at 40+ fps in 1080p, Raven-1 for multimodal perception, Sparrow-1 for turn-taking. They ride on Daily.co's WebRTC stack for transport, which is a pragmatic call — building your own SFU would have eaten a year.
The hard part to copy is the model quality. Phoenix-3 / Phoenix-4 is the result of four years of research and probably $5-10M of compute. The easier part to copy is the GTM: API-first, developer-friendly pricing ($0.32-0.37 per minute of conversation), free tier with 25 minutes, and tight YC + Sequoia distribution. Indie hackers won't outbuild Phoenix, but you can wrap Hedra, Simli, or Wav2Lip and ship a vertical (AI receptionist for dental clinics, AI tutor for SAT prep) with better positioning than Tavus's generic developer pitch.
Quick Facts
| Field | Value |
|---|---|
| Product | Tavus CVI (Conversational Video Interface) + Phoenix replicas |
| Founded | 2020, San Francisco (originally Houston, TX) |
| Founders | Hassaan Raza (CEO), Quinn Favret (COO), Rishabh Dhar |
| YC Batch | S21 (Summer 2021) |
| Funding | $64M total — $18M Series A (Mar 2024, Scale VP led), $40M Series B (Nov 2025, CRV led) |
| Investors | Sequoia Capital, Scale Venture Partners, CRV, Y Combinator, HubSpot Ventures, Flex Capital |
| Headcount | 24 (end 2023) → est. 50-70 (mid-2026) |
| Revenue | $1.9M ARR (2023, public Latka filing); ~$8-10M ARR estimate end-2024; likely $15-25M ARR by Series B close |
| MRR (est) | ~$800K-1M/month |
| Pricing | Free: 25 min/mo. Starter: $59/mo. Growth: $397/mo. Enterprise: custom. Overage $0.32-0.37/min |
| Stack signals | WebRTC via Daily.co, custom GPU inference, Phoenix-3/4 rendering model, Raven/Sparrow perception+turn-taking |
| Notable customers | Salesforce, Meta, 1-800-Flowers, Delphi, Inflection Health (mix of confirmed + commonly-cited) |
| Category | AI video personalization, conversational AI |
The 5-Minute Product Walkthrough
I signed up with a throwaway Gmail and landed on the dashboard inside thirty seconds — no demo call gating, which is rare for anything calling itself "enterprise AI." First-time UX gives you an API key, 25 minutes of CVI credit, and a "Stock Replica" gallery with about 25 pre-trained faces you can drop into a conversation without training your own.
The interesting flow is the API one. You POST to /v2/conversations with a persona_id, a replica_id, and a system prompt — basically the same shape as an OpenAI chat completion request, except the response is a WebRTC room URL instead of a JSON message. You drop the room URL into Daily.co's prebuilt iframe, or wire it up with their React SDK, and within about two seconds you're in a video call with the avatar. The avatar starts the conversation if you tell it to, listens to your mic, runs your audio through what I assume is a whisper-class STT, sends transcripts to an LLM (you can BYO — they support GPT, Claude, custom endpoints), and pipes the response back through their voice + Phoenix lip-sync pipeline.
The first time it worked, I sat there with a slightly stupid grin on my face. The latency feels like a Zoom call with someone on slow WiFi — there's a tiny pause before the response, maybe 600-800ms in practice, but it doesn't feel laggy enough to break the conversation. Compare this to Synthesia or HeyGen, where you script the video, wait 2-5 minutes for it to render, and then play it back. Different product entirely. Tavus is closer to ElevenLabs Conversational AI or Vapi, except with a face attached.
The avatar mouth-sync is uncanny when it works (90% of the time) and slightly off when it doesn't (mostly on plosive consonants and laughter). I noticed it stumbled when I said "puh-puh-puh-puh" rapidly — Phoenix is trained on natural speech distributions, not phonetic edge cases, and you can tell. Normal conversation? Hard to tell it's synthetic if you're not looking for the giveaways (slightly too-smooth skin texture, e
Sign in to read this report
You have read your 1 free report. Sign in with Google to unlock 2 more.
Sign in with Google