universal

This documentation is valid for the following list of our models:

  • aai/universal

A new Speech-to-Text model offering exceptional accuracy by leveraging its deep understanding of context and semantics, with the broadest language support.

Setup your API Key

If you don’t have an API key for the AI/ML API yet, feel free to use our Quickstart guide.

API Schemas

Creating and sending a speech-to-text conversion task to the server

post
Body
modelstring · enumRequiredPossible values:
urlstring · uriOptional

URL of the input audio file.

audio_start_frominteger · nullableOptional

The point in time, in milliseconds, in the file at which the transcription was started.

audio_end_atinteger · nullableOptional

The point in time, in milliseconds, in the file at which the transcription was terminated.

language_codestringOptional

The language of your audio file. Possible values are found in Supported Languages. The default value is 'en_us'.

language_confidence_thresholdnumber · max: 1 · nullableOptional

The confidence threshold for the automatically detected language. An error will be returned if the language confidence is below this threshold. Defaults to 0.

language_detectionboolean · nullableOptional

Enable Automatic language detection, either true or false. Available for universal model only.

punctuateboolean · nullableOptional

Adds punctuation and capitalization to the transcript

Default: null
format_textboolean · nullableOptional

Enable Text Formatting, can be true or false.

Default: true
disfluenciesboolean · nullableOptional

Transcribe Filler Words, like "umm", in your media file; can be true or false.

Default: false
multichannelboolean · nullableOptional

Enable Multichannel transcription, can be true or false.

Default: false
speaker_labelsboolean · nullableOptional

Enable Speaker diarization, can be true or false.

Default: null
speakers_expectedinteger · nullableOptional

Tell the speaker label model how many speakers it should attempt to identify. See Speaker diarization for more details.

Default: null
content_safetyboolean · nullableOptional

Enable Content Moderation, can be true or false.

Default: false
iab_categoriesboolean · nullableOptional

Enable Topic Detection, can be true or false.

Default: false
auto_highlightsboolean · nullableOptional

Enable Key Phrases, either true or false.

Default: false
word_booststring[]Optional

The list of custom vocabulary to boost transcription probability for.

boost_paramstring · enumOptional

How much to boost specified words. Allowed values: low, default, high.

Possible values:
filter_profanityboolean · nullableOptional

Filter profanity from the transcribed text, can be true or false.

Default: false
redact_piiboolean · nullableOptional

Redact PII from the transcribed text using the Redact PII model, can be true or false.

Default: false
redact_pii_audioboolean · nullableOptional

Generate a copy of the original media file with spoken PII "beeped" out, can be true or false. See PII redaction for more details.

Default: false
redact_pii_audio_qualitystring · enumOptional

Controls the filetype of the audio created by redact_pii_audio. Currently supports mp3 (default) and wav. See PII redaction for more details.

Possible values:
redact_pii_substring · enumOptional

The replacement logic for detected PII, can be entity_type or hash. See PII redaction for more details.

Possible values:
sentiment_analysisboolean · nullableOptional

Enable Sentiment Analysis, can be true or false.

Default: false
entity_detectionboolean · nullableOptional

Enable Entity Detection, can be true or false.

Default: false
summarizationboolean · nullableOptional

Enable Summarization, can be true or false.

Default: false
summary_modelstring · enumOptional

The model to summarize the transcript. Allowed values: informative, conversational, catchy.

Possible values:
summary_typestring · enumOptional

The type of summary. Allowed values: bullets, bullets_verbose, gist, headline, paragraph.

Possible values:
auto_chaptersboolean · nullableOptional

Enable Auto Chapters, either true or false.

Default: false
speech_thresholdnumber · max: 1 · nullableOptional

Reject audio files that contain less than this fraction of speech. Valid values are in the range [0, 1] inclusive.

Responses
200Success
application/json
generation_idstringRequired
post
/v1/stt/create
200Success

Requesting the result of the task from the server using the generation_id

get
Path parameters
generation_idstringRequired
Responses
200Success
application/json
idstringRequired
statusstring · enumRequiredPossible values:
outputany ofRequired
or
or
get
/v1/stt/{generation_id}
200Success

Quick Example: Processing a Speech Audio File via URL

Let's transcribe the following audio fragment:

Response

Last updated

Was this helpful?