Speech-to-Text
Overview
Speech-to-text models convert spoken language into written text, enabling voice-based interactions across various applications. These models leverage deep learning techniques, such as recurrent neural networks (RNNs) and transformers, to process audio signals and transcribe them with high accuracy. They are commonly used in voice assistants, transcription services, and accessibility tools, supporting multiple languages and adapting to different accents and speech patterns.
Generated audio transcriptions are stored on the server for 1 hour from the time of creation.
Quick Code Examples
Let's use the #g1_whisper-large model to transcribe the following audio fragment:
Example #1: Processing a Speech Audio File via URL
Example #2: Processing a Speech Audio File via File Path
All Available Speech-to-Text Models
Model ID + API Reference link
Developer
Context
Model Card
Last updated
Was this helpful?