Blocks
Speech to Text
Convert audio to text using multiple providers
Transcribes audio files or streams to text using various speech recognition providers including OpenAI Whisper, Deepgram, and Google Speech-to-Text.
Block Preview
Speech to Text
Convert audio to text using multiple providers
Usage
- Add the block to your workflow and connect it to the upstream step.
- Configure any required credentials or tokens in the inputs.
- Fill in required inputs and optional parameters for the run.
- Run a test execution, inspect outputs, and iterate before deploying.
- Deploy the speech_to_text block with monitoring enabled in production.
Inputs (UI)
Provider
dropdownLayout: half
Options: OpenAI Whisper, Deepgram, Google Speech, AssemblyAI
Model
dropdownLayout: half
Condition: provider = "openai"
Options: Whisper Large, Whisper Medium
Audio Input
short-inputPlaceholder: Audio file path, URL, or base64 data
Layout: full
Language
dropdownLayout: half
Options: Auto-detect, English, Spanish, French
Word Timestamps
switchInclude word-level timestamps
Layout: half
Speaker Diarization
switchIdentify different speakers
Layout: half
Condition: provider = ["deepgram","assemblyai"]
Auto-Punctuation
switchAutomatically add punctuation
Layout: half
API Key
short-inputPlaceholder: API key or $ENV_VAR
Layout: full
Inputs (API)
provider
stringOptional
model
stringOptional
audioInput
stringRequired
language
stringOptional
enableWordTimestamps
booleanOptional
enableDiarization
booleanOptional
punctuate
booleanOptional
apiKey
stringOptional
Outputs
Primary response type:
{
"transcript": "string",
"confidence": "number",
"language": "string",
"segments": "json",
"words": "json",
"duration": "number"
}