MAI-Transcribe 1.5

This documentation is valid for the following list of our models:

  • microsoft/mai-transcribe-1.5

Model Overview

MAI-Transcribe 1.5 — speech-to-text model from Microsoft. Supports multilingual transcription, automatic language detection, and punctuation restoration.

Setup your API Key

If you don't have an API key for the AI/ML API yet, feel free to use our Quickstart guide.

API Schemas

Creating and sending a speech-to-text conversion task to the server

post
Body
modelstring · enumRequiredPossible values:
urlstring · uriOptional

URL of the input audio file. Provide either url or audio — exactly one is required, not both.

Example: https://example.com/audio/sample.mp3
audiostring · binaryOptional

The audio file to transcribe. Provide either url or audio — exactly one is required, not both.

max_tokensinteger · min: 1Optional

The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

Example: 4096
temperaturenumber · max: 2Optional

What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

Default: 1Example: 1
top_pnumber · max: 1Optional

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

Default: 1Example: 1
max_completion_tokensinteger · min: 1Optional

An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

Example: 4096
Responses
200Success
application/json
generation_idstringRequired
post
/v1/stt/create
200Success

Requesting the result of the task from the server using the generation_id

get
Path parameters
generation_idstringRequired
Responses
200Success
application/json
idstringRequired
statusstring · enumRequiredPossible values:
outputany ofRequired
or
or
get
/v1/stt/{generation_id}
200Success

Code Example: Processing a Speech Audio File via URL

Let's use the microsoft/mai-transcribe-1.5 model to transcribe the following audio fragment:

Response

Last updated

Was this helpful?