Speech-to-Text

Convert speech to text using AI

Configuration

OpenAI Whisper

Provider*
OpenAI Whisper
Audio/Video File*
Upload an audio or video file
Audio/Video File Reference*
Reference audio/video from previous blocks
Audio/Video URL
Or enter publicly accessible audio/video URL
Language*
Select...
Timestamps*
Select...
API Key*
••••••••
Model*
Select...
Translate to English
Disabled
Output
ParameterTypeDescription
transcriptstringFull transcribed text
segmentsarrayTimestamped segments with speaker labels
languagestringDetected or specified language
durationnumberAudio duration in seconds
confidencenumberOverall confidence score
sentimentarraySentiment analysis results
entitiesarrayDetected entities
summarystringAuto-generated summary

Deepgram

Provider*
Deepgram
Audio/Video File*
Upload an audio or video file
Audio/Video File Reference*
Reference audio/video from previous blocks
Audio/Video URL
Or enter publicly accessible audio/video URL
Language*
Select...
Timestamps*
Select...
API Key*
••••••••
Model*
Select...
Speaker Diarization
Disabled
Output
ParameterTypeDescription
transcriptstringFull transcribed text
segmentsarrayTimestamped segments with speaker labels
languagestringDetected or specified language
durationnumberAudio duration in seconds
confidencenumberOverall confidence score
sentimentarraySentiment analysis results
entitiesarrayDetected entities
summarystringAuto-generated summary

ElevenLabs

Provider*
ElevenLabs
Audio/Video File*
Upload an audio or video file
Audio/Video File Reference*
Reference audio/video from previous blocks
Audio/Video URL
Or enter publicly accessible audio/video URL
Language*
Select...
Timestamps*
Select...
API Key*
••••••••
Model*
Select...
Output
ParameterTypeDescription
transcriptstringFull transcribed text
segmentsarrayTimestamped segments with speaker labels
languagestringDetected or specified language
durationnumberAudio duration in seconds
confidencenumberOverall confidence score
sentimentarraySentiment analysis results
entitiesarrayDetected entities
summarystringAuto-generated summary

AssemblyAI

Provider*
AssemblyAI
Audio/Video File*
Upload an audio or video file
Audio/Video File Reference*
Reference audio/video from previous blocks
Audio/Video URL
Or enter publicly accessible audio/video URL
Language*
Select...
Timestamps*
Select...
API Key*
••••••••
Model*
Select...
Speaker Diarization
Disabled
Sentiment Analysis
Disabled
Entity Detection
Disabled
PII Redaction
Disabled
Auto Summarization
Disabled
Output
ParameterTypeDescription
transcriptstringFull transcribed text
segmentsarrayTimestamped segments with speaker labels
languagestringDetected or specified language
durationnumberAudio duration in seconds
confidencenumberOverall confidence score
sentimentarraySentiment analysis results
entitiesarrayDetected entities
summarystringAuto-generated summary

Google Gemini

Provider*
Google Gemini
Audio/Video File*
Upload an audio or video file
Audio/Video File Reference*
Reference audio/video from previous blocks
Audio/Video URL
Or enter publicly accessible audio/video URL
Language*
Select...
Timestamps*
Select...
API Key*
••••••••
Model*
Select...
Output
ParameterTypeDescription
transcriptstringFull transcribed text
segmentsarrayTimestamped segments with speaker labels
languagestringDetected or specified language
durationnumberAudio duration in seconds
confidencenumberOverall confidence score
sentimentarraySentiment analysis results
entitiesarrayDetected entities
summarystringAuto-generated summary

Usage Instructions

Transcribe audio and video files to text using leading AI providers. Supports multiple languages, timestamps, and speaker diarization.

Notes

  • Category: tools
  • Type: stt