Speech-to-Text

Configuration

OpenAI Whisper

Provider*

OpenAI Whisper

Audio/Video File*

Upload an audio or video file

Audio/Video File Reference*

Reference audio/video from previous blocks

Audio/Video URL

Or enter publicly accessible audio/video URL

Language*

Select...

Timestamps*

Select...

API Key*

••••••••

Model*

Select...

Translate to English

Disabled

Output

Parameter	Type	Description
`transcript`	string	Full transcribed text
`segments`	array	Timestamped segments with speaker labels
`language`	string	Detected or specified language
`duration`	number	Audio duration in seconds
`confidence`	number	Overall confidence score
`sentiment`	array	Sentiment analysis results
`entities`	array	Detected entities
`summary`	string	Auto-generated summary

Deepgram

Provider*

Deepgram

Audio/Video File*

Upload an audio or video file

Audio/Video File Reference*

Reference audio/video from previous blocks

Audio/Video URL

Or enter publicly accessible audio/video URL

Language*

Select...

Timestamps*

Select...

API Key*

••••••••

Model*

Select...

Speaker Diarization

Disabled

Output

Parameter	Type	Description
`transcript`	string	Full transcribed text
`segments`	array	Timestamped segments with speaker labels
`language`	string	Detected or specified language
`duration`	number	Audio duration in seconds
`confidence`	number	Overall confidence score
`sentiment`	array	Sentiment analysis results
`entities`	array	Detected entities
`summary`	string	Auto-generated summary

ElevenLabs

Provider*

ElevenLabs

Audio/Video File*

Upload an audio or video file

Audio/Video File Reference*

Reference audio/video from previous blocks

Audio/Video URL

Or enter publicly accessible audio/video URL

Language*

Select...

Timestamps*

Select...

API Key*

••••••••

Model*

Select...

Output

Parameter	Type	Description
`transcript`	string	Full transcribed text
`segments`	array	Timestamped segments with speaker labels
`language`	string	Detected or specified language
`duration`	number	Audio duration in seconds
`confidence`	number	Overall confidence score
`sentiment`	array	Sentiment analysis results
`entities`	array	Detected entities
`summary`	string	Auto-generated summary

AssemblyAI

Provider*

AssemblyAI

Audio/Video File*

Upload an audio or video file

Audio/Video File Reference*

Reference audio/video from previous blocks

Audio/Video URL

Or enter publicly accessible audio/video URL

Language*

Select...

Timestamps*

Select...

API Key*

••••••••

Model*

Select...

Speaker Diarization

Disabled

Sentiment Analysis

Disabled

Entity Detection

Disabled

PII Redaction

Disabled

Auto Summarization

Disabled

Output

Parameter	Type	Description
`transcript`	string	Full transcribed text
`segments`	array	Timestamped segments with speaker labels
`language`	string	Detected or specified language
`duration`	number	Audio duration in seconds
`confidence`	number	Overall confidence score
`sentiment`	array	Sentiment analysis results
`entities`	array	Detected entities
`summary`	string	Auto-generated summary

Google Gemini

Provider*

Google Gemini

Audio/Video File*

Upload an audio or video file

Audio/Video File Reference*

Reference audio/video from previous blocks

Audio/Video URL

Or enter publicly accessible audio/video URL

Language*

Select...

Timestamps*

Select...

API Key*

••••••••

Model*

Select...

Output

Parameter	Type	Description
`transcript`	string	Full transcribed text
`segments`	array	Timestamped segments with speaker labels
`language`	string	Detected or specified language
`duration`	number	Audio duration in seconds
`confidence`	number	Overall confidence score
`sentiment`	array	Sentiment analysis results
`entities`	array	Detected entities
`summary`	string	Auto-generated summary

Usage Instructions

Transcribe audio and video files to text using leading AI providers. Supports multiple languages, timestamps, and speaker diarization.

Notes

Category: tools
Type: stt

Speech-to-Text

On this page