Navigation

Mobile Docs

Speech to Text

Convert audio to text using multiple providers

Transcribes audio files or streams to text using various speech recognition providers including OpenAI Whisper, Deepgram, and Google Speech-to-Text.

Toolsspeech_to_text

Usage

01Add the block to your workflow and connect it to the upstream step.
02Configure any required credentials or tokens in the inputs.
03Fill in required inputs and optional parameters for the run.
04Run a test execution, inspect outputs, and iterate before deploying.
05Deploy the speech_to_text block with monitoring enabled in production.

Inputs (UI)

Provider

dropdown

Layout: half

Options: OpenAI Whisper, Deepgram, Google Speech, AssemblyAI

Model

dropdown

Layout: half

Condition: provider = "openai"

Options: Whisper Large, Whisper Medium

Audio Input

short-input

Placeholder: Audio file path, URL, or base64 data

Layout: full

Language

dropdown

Layout: half

Options: Auto-detect, English, Spanish, French

Word Timestamps

switch

Include word-level timestamps

Layout: half

Speaker Diarization

switch

Identify different speakers

Layout: half

Condition: provider = ["deepgram","assemblyai"]

Auto-Punctuation

switch

Automatically add punctuation

Layout: half

API Key

short-input

Placeholder: API key or $ENV_VAR

Layout: full

Inputs (API)

provider

string

Optional

model

string

Optional

audioInput

string

Required

language

string

Optional

enableWordTimestamps

boolean

Optional

enableDiarization

boolean

Optional

punctuate

boolean

Optional

apiKey

string

Optional

Outputs

Primary response type:

{
  "transcript": "string",
  "confidence": "number",
  "language": "string",
  "segments": "json",
  "words": "json",
  "duration": "number"
}

Tool Access

speech_to_text