gpt-4o-transcribe

This documentation is valid for the following list of our models:

  • openai/gpt-4o-transcribe

Model Overview

A speech-to-text model based on GPT-4o for audio transcription. It provides improved word error rates and more accurate language recognition compared to the original Whisper models. Recommended for use cases that require higher transcription accuracy.

Setup your API Key

If you don’t have an API key for the AI/ML API yet, feel free to use our Quickstart guide.

API Schemas

Creating and sending a speech-to-text conversion task to the server

post
Body
modelstring · enumRequiredPossible values:
urlstring · uriOptional

URL of the input audio file.

languagestringOptional

The BCP-47 language tag that hints at the primary spoken language. Depending on the Model and API endpoint you choose only certain languages are available

promptstringOptional

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

temperaturenumber · max: 1Optional

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Default: 0
Responses
200Success
application/json
generation_idstringRequired
post
/v1/stt/create
200Success

Requesting the result of the task from the server using the generation_id

get
Path parameters
generation_idstringRequired
Responses
200Success
application/json
idstringRequired
statusstring · enumRequiredPossible values:
outputany ofRequired
or
or
get
/v1/stt/{generation_id}
200Success

Code Example: Processing a Speech Audio File via URL

Let's use the openai/gpt-4o-transcribe model to transcribe the following audio fragment:

Response

Last updated

Was this helpful?