Deepgram - Automated Speech Recognition (ASR)

We raised $72M to define the future of AI speech understanding!

Language AI models to power your apps.

Power your apps with world-class speech and text AI models. Effortlessly accurate. Blazing fast. Enterprise-ready scale. Hands-down the best price. Everything developers need to build with confidence and ship faster.

Based on 124+ reviews.

Trusted by the world’s top Enterprises, Startups, & Researchers

Files or live streams. Hell yes we have SDKs.

Deepgram is a comprehensive AI transcription foundation plus the understanding features you need to make your data readable and actionable by humans…or machines.

Step 1: Input Audio

NASA: First All Female Space Walk

POST https://api.deepgram.com/v1/listen

1{ 
2  "url":"nasa_demo" 
3}

The response will show here

Step 2: Transcription Output

The response will show here

Give it a try.

Click the mic to transcribe live in English or select another language.

Transcription

Click the mic to transcribe live in English or select another language.

Audio Intelligence

Audio InputNASA File

00:00

Speaker diarizationKnow who’s talking. Detect and label speaker changes throughout a conversation with speaker diarization.

Entity detection

Summarization

Topic detection

Language translation

Language detection

Sentiment analysis

diarize=true

Speaker 0 : Alright. I’m ready.

Speaker 1 : Good evening. I’m Dr. Emmett Brown. I’m standing on the parking lot at Twin Pines Mall. It’s Saturday Morning October twenty sixth nineteen eighty five one eighteen AM. And this is temporal experiment number one. Come on Einey. Hey, boy. Get in there. At a boy. In you go. Sit down. Get your seatbelt on. That’s it.

Speaker 0 : Okay.

Speaker 1 : Please note, that Einstein’s clock is in precise synchronization with my control watch. Got it?

Speaker 0 : Right. Check doc.

Speaker 1 : Good. Have a good trip einstein. Watch your head.

Speaker 0 : You got that thing hooked up to the car?

Speaker 1 : Watch this.

Speaker 0 : Yeah Ok.

Speaker 1 : Not me the car, the car. If my calculations are correct. When this baby hits eighty eight miles per hour, you’re gonna see some serious s**t. Watch this watch this. What did I tell you? Eighty eight miles per hour. The thermal displacement occurred exactly what? One O two A M and zero seconds.

Speaker 0 : Jesus Christ. Jesus Christ, doc, you disintegrated einstein.

Speaker 1 : Calm down Marty. I didn’t disintegrate anything. The molecular structure of both Einstein and the car are completely intact.

Speaker 0 : Then where hell are they?

Speaker 1 : The appropriate question is, when the hell are they? You see, Einstein has just become the world’s first time traveler. I set him into the future. One minute into the future to be exact. Now precisely one twenty one AM and zero seconds we shall catch up with him and the time machine.

Speaker 0 : Wait a minute. Wait a minute. Doc. Are you telling me that you built a time machine out of a Delorean?

Ready to get started?

Conversational & transcription intelligence on the world’s best speech AI platform.

Unbeatable Value, Unmatched Performance.

All transcribed with Deepgram.

450X Faster

Transcribe an hour of pre-recorded audio in about 8 seconds.

<300ms latency

The fastest real-time transcription speeds for human-like conversational AI experiences, real-time analytics, and enablement.

30+ Languages

Over 30 languages and dialects to choose from with more rapidly being added. Over 100 languages supported for translation.

40+ File Types

Over 40 different audio formats and encodings supported including MP3, MP4, MP2, AAC, WAV, FLAC, PCM, M4A, Ogg, Opus, and WebM.

Setting New Benchmarks in ASR Performance

All ASR providers strive to have the most accurate transcripts possible, but what about other critical features you require? We advise performing side-by-side comparisons and testing with the real-world audio you'll use in production to determine the best speech solution for your needs.

See The Full Comparison

Platforms

Batch process (1hr of audio)

~8 s

4980 s

1443 s

Real-time streaming lag

<300 ms

Not available

1443 ms

Tailored speech models

Deep speech (search)

Diarization

Up to 10

Not available

Up to 6

Noise reduction

Custom vocabulary

Redaction

Punctuation

Essential building blocks for language AI.

Deepgram is a comprehensive AI transcription foundation plus the understanding features you need to make your data readable and actionable by humans…or machines.

Transcription

Create accurate, usable transcripts. It’s speech-to-text for developers, by developers.

Punctuation, Numerals, Redaction, Profanity Filtering
Utterances, Deep Search, Find & Replace, VAD, Keywords
Paragraphs, Interim Results Understanding Features

Explore More

Understanding

Accurately identify, extract, and summarize conversational audio built on the industry’s most accurate, speech-to-text.

Speaker Diarization, Entity Detection, Summarization
Topic Detection, Language Translation
Language Detection, Sentiment Analysis

Explore More

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Every voice. Heard and understood.