gpt-4o-transcribe

This documentation is valid for the following list of our models:

openai/gpt-4o-transcribe

Model Overview

A speech-to-text model based on GPT-4o for audio transcription. It provides improved word error rates and more accurate language recognition compared to the original Whisper models. Recommended for use cases that require higher transcription accuracy.

OpenAI STT models are priced based on tokens, similar to chat models. In practice, this means the cost primarily depends on the duration of the input audio.

Setup your API Key

If you don’t have an API key for the AI/ML API yet, feel free to use our Quickstart guide.

API Schemas

Creating and sending a speech-to-text conversion task to the server

post

Authorizations

AuthorizationstringRequired

Bearer key

Body

modelundefined · enumRequiredPossible values:

languagestringOptional

The BCP-47 language tag that hints at the primary spoken language. Depending on the Model and API endpoint you choose only certain languages are available

promptstringOptional

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

temperaturenumber · max: 1Optional

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Default: 0

urlstring · uriRequired

URL of the input audio file.

Responses

201Success

application/json

generation_idstring · uuidRequired

post

/v1/stt/create

curl -L \
  --request POST \
  --url 'https://api.aimlapi.com/v1/stt/create' \
  --header 'Authorization: Bearer <YOUR_API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
      "model": "openai/gpt-4o-transcribe",
      "url": "https://audio-samples.github.io/samples/mp3/blizzard_primed/sample-0.mp3"
    }'

201Success

{
  "generation_id": "123e4567-e89b-12d3-a456-426614174000"
}

Requesting the result of the task from the server using the generation_id

get

Authorizations

AuthorizationstringRequired

Bearer key

Path parameters

generation_idstringRequired

Responses

201Success

application/json

generation_idstringRequired

statusstring · enumRequiredPossible values:

resultany ofOptional

any · nullableOptional

errorany · nullableOptional

get

/v1/stt/{generation_id}

GET /v1/stt/{generation_id} HTTP/1.1
Host: api.aimlapi.com
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*

201Success

{
  "generation_id": "text",
  "status": "queued",
  "result": {
    "metadata": {
      "transaction_key": "text",
      "request_id": "text",
      "sha256": "text",
      "created": "2026-03-09T08:47:49.339Z",
      "duration": 1,
      "channels": 1,
      "models": [
        "text"
      ],
      "model_info": {
        "ANY_ADDITIONAL_PROPERTY": {
          "name": "text",
          "version": "text",
          "arch": "text"
        }
      }
    },
    "results": {
      "channels": {
        "alternatives": [
          {
            "transcript": "text",
            "confidence": 1,
            "words": [
              {
                "word": "text",
                "start": 1,
                "end": 1,
                "confidence": 1,
                "punctuated_word": "text"
              }
            ],
            "paragraphs": [
              {
                "transcript": "text",
                "paragraphs": {
                  "sentences": [
                    {
                      "text": "text",
                      "start": 1,
                      "end": 1
                    }
                  ],
                  "num_words": 1,
                  "start": 1,
                  "end": 1
                }
              }
            ]
          }
        ]
      }
    }
  },
  "error": null
}

Code Example: Processing a Speech Audio File via URL

Let's use the openai/gpt-4o-transcribe model to transcribe the following audio fragment:

import requests
import time
import json

base_url = "https://api.aimlapi.com/v1"
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
api_key = "<YOUR_AIMLAPI_KEY>"

# Create and send a speech-to-text conversion task to the server
def create_stt():
    url = f"{base_url}/stt/create"
    headers = {
        "Authorization": f"Bearer {api_key}", 
    }
    data = {
        "model": "openai/gpt-4o-transcribe",
        "url": "https://audio-samples.github.io/samples/mp3/blizzard_primed/sample-0.mp3"
    }
 
    response = requests.post(url, json=data, headers=headers)
    if response.status_code >= 400:
        print(f"Error: {response.status_code} - {response.text}")
    else:
        response_data = response.json()
        print(response_data)
        return response_data

# Request the result of the task from the server using the generation_id
def get_stt(gen_id):
    url = f"{base_url}/stt/{gen_id}"
    headers = {
        "Authorization": f"Bearer {api_key}", 
    }
    response = requests.get(url, headers=headers)
    return response.json()
    
# Start the generation, then repeatedly request the result from the server every 10 sec.
def main():
    stt_response = create_stt()
    gen_id = stt_response.get("generation_id")

    if gen_id:
        start_time = time.time()

        timeout = 600
        while time.time() - start_time < timeout:
            response_data = get_stt(gen_id)

            if response_data is None:
                print("Error: No response from API")
                break
        
            status = response_data.get("status")

            if status in ["queued", "generating"]:
                print(f"Status: {status}. Checking again in 10 seconds.")
                time.sleep(10)
            else:
                # data = .json()
                print("Processing complete:")
                print(json.dumps(response_data["result"], indent=2, ensure_ascii=False))
                return response_data
   
        print("Timeout reached. Stopping.")
        return None     


if __name__ == "__main__":
    main()

const baseUrl = "https://api.aimlapi.com/v1";
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
const apiKey = "<YOUR_AIMLAPI_KEY>";

// Create and send a speech-to-text conversion task to the server
async function createSTT() {
  const url = `${baseUrl}/stt/create`;

  const response = await fetch(url, {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${apiKey}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "openai/gpt-4o-transcribe",
      url: "https://audio-samples.github.io/samples/mp3/blizzard_primed/sample-0.mp3",
    }),
  });

  if (!response.ok) {
    const text = await response.text();
    console.error(`Error: ${response.status} - ${text}`);
    return null;
  }

  const data = await response.json();
  console.log(data);
  return data;
}

// Request the result of the task from the server using the generation_id
async function getSTT(genId) {
  const url = `${baseUrl}/stt/${genId}`;

  const response = await fetch(url, {
    headers: {
      "Authorization": `Bearer ${apiKey}`,
    },
  });

  if (!response.ok) {
    return null;
  }

  return response.json();
}

// Start generation and poll every 10s
async function main() {
  const sttResponse = await createSTT();
  const genId = sttResponse?.generation_id;

  if (!genId) {
    console.error("No generation_id received");
    return null;
  }

  const startTime = Date.now();
  const timeoutMs = 600 * 1000; // 10 minutes

  while (Date.now() - startTime < timeoutMs) {
    const responseData = await getSTT(genId);

    if (!responseData) {
      console.error("Error: No response from API");
      return null;
    }

    const status = responseData.status;

    if (status === "queued" || status === "generating") {
      console.log(`Status: ${status}. Checking again in 10 seconds.`);
      await new Promise(resolve => setTimeout(resolve, 10_000));
    } else {
      console.log("Processing complete:");
      console.log(JSON.stringify(responseData.result, null, 2));
      return responseData;
    }
  }

  console.log("Timeout reached. Stopping.");
  return null;
}

main();

Response

{'generation_id': 'RlLz0hRdAs9voL5Qi1Pzr', 'status': 'queued'}
Status: queued. Checking again in 10 seconds.
Processing complete:
{
  "text": "He doesn't belong to you, and I don't see how you have anything to do with what is be his power. He's he personally that from this stage to you.",
  "usage": {
    "type": "tokens",
    "total_tokens": 135,
    "input_tokens": 100,
    "input_token_details": {
      "text_tokens": 0,
      "audio_tokens": 100
    },
    "output_tokens": 35
  }
}

Previouswhisper-tiny Nextgpt-4o-mini-transcribe

Last updated 1 month ago

Was this helpful?

hashtagModel Overview

hashtagSetup your API Key

hashtagAPI Schemas

hashtagCreating and sending a speech-to-text conversion task to the server

hashtagRequesting the result of the task from the server using the generation_id

hashtagCode Example: Processing a Speech Audio File via URL

Model Overview

Setup your API Key

API Schemas

Creating and sending a speech-to-text conversion task to the server

Requesting the result of the task from the server using the generation_id

Code Example: Processing a Speech Audio File via URL