universal

This documentation is valid for the following list of our models:

  • aai/universal

A new Speech-to-Text model offering exceptional accuracy by leveraging its deep understanding of context and semantics, with the broadest language support.

Setup your API Key

If you don’t have an API key for the AI/ML API yet, feel free to use our Quickstart guide.

API Schema

Creating and sending a speech-to-text conversion task to the server

post
Authorizations
AuthorizationstringRequired

Bearer key

Body
modelundefined · enumRequiredPossible values:
audio_start_fromintegerOptional

The point in time, in milliseconds, in the file at which the transcription was started.

audio_end_atintegerOptional

The point in time, in milliseconds, in the file at which the transcription was terminated.

language_codestringOptional

The language of your audio file. Possible values are found in Supported Languages. The default value is 'en_us'.

language_confidence_thresholdnumber · max: 1Optional

The confidence threshold for the automatically detected language. An error will be returned if the language confidence is below this threshold. Defaults to 0.

language_detectionbooleanOptional

Enable Automatic language detection, either true or false. Available for universal model only.

punctuatebooleanOptional

Adds punctuation and capitalization to the transcript

Default: true
format_textbooleanOptional

Enable Text Formatting, can be true or false.

Default: true
disfluenciesbooleanOptional

Transcribe Filler Words, like "umm", in your media file; can be true or false.

Default: false
multichannelbooleanOptional

Enable Multichannel transcription, can be true or false.

Default: false
speaker_labelsbooleanOptional

Enable Speaker diarization, can be true or false.

Default: false
speakers_expectedintegerOptional

Tell the speaker label model how many speakers it should attempt to identify. See Speaker diarization for more details.

content_safetybooleanOptional

Enable Content Moderation, can be true or false.

Default: false
iab_categoriesbooleanOptional

Enable Topic Detection, can be true or false.

Default: false
auto_highlightsbooleanOptional

Enable Key Phrases, either true or false.

Default: false
word_booststring[]Optional

The list of custom vocabulary to boost transcription probability for.

boost_paramstring · enumOptional

How much to boost specified words. Allowed values: low, default, high.

Possible values:
filter_profanitybooleanOptional

Filter profanity from the transcribed text, can be true or false.

Default: false
redact_piibooleanOptional

Redact PII from the transcribed text using the Redact PII model, can be true or false.

Default: false
redact_pii_audiobooleanOptional

Generate a copy of the original media file with spoken PII "beeped" out, can be true or false. See PII redaction for more details.

Default: false
redact_pii_audio_qualitystring · enumOptional

Controls the filetype of the audio created by redact_pii_audio. Currently supports mp3 (default) and wav. See PII redaction for more details.

Possible values:
redact_pii_substring · enumOptional

The replacement logic for detected PII, can be entity_type or hash. See PII redaction for more details.

Possible values:
sentiment_analysisbooleanOptional

Enable Sentiment Analysis, can be true or false.

Default: false
entity_detectionbooleanOptional

Enable Entity Detection, can be true or false.

Default: false
summarizationbooleanOptional

Enable Summarization, can be true or false.

Default: false
summary_modelstring · enumOptional

The model to summarize the transcript. Allowed values: informative, conversational, catchy.

Possible values:
summary_typestring · enumOptional

The type of summary. Allowed values: bullets, bullets_verbose, gist, headline, paragraph.

Possible values:
auto_chaptersbooleanOptional

Enable Auto Chapters, either true or false.

Default: false
speech_thresholdnumber · max: 1Optional

Reject audio files that contain less than this fraction of speech. Valid values are in the range [0, 1] inclusive.

Responses
post
/v1/stt/create
201Success

Requesting the result of the task from the server using the generation_id

get
Authorizations
AuthorizationstringRequired

Bearer key

Path parameters
generation_idstringRequired
Responses
get
/v1/stt/{generation_id}
201Success

Quick Example: Processing a Speech Audio File via URL

Let's transcribe the following audio fragment:

Response

Last updated

Was this helpful?