slam-1

This documentation is valid for the following list of our models:

  • aai/slam-1

A new Speech-to-Text model offering exceptional accuracy by leveraging its deep understanding of context and semantics (English only).

Setup your API Key

If you don’t have an API key for the AI/ML API yet, feel free to use our Quickstart guide.

API Schemas

Creating and sending a speech-to-text conversion task to the server

post
Body
modelstring · enumRequiredPossible values:
urlstring · uriOptional

URL of the input audio file.

audio_start_frominteger · nullableOptional

The point in time, in milliseconds, in the file at which the transcription was started.

audio_end_atinteger · nullableOptional

The point in time, in milliseconds, in the file at which the transcription was terminated.

language_codestringOptional

The language of your audio file. Possible values are found in Supported Languages. The default value is 'en_us'.

language_confidence_thresholdnumber · max: 1 · nullableOptional

The confidence threshold for the automatically detected language. An error will be returned if the language confidence is below this threshold. Defaults to 0.

language_detectionboolean · nullableOptional

Enable Automatic language detection, either true or false. Available for universal model only.

punctuateboolean · nullableOptional

Adds punctuation and capitalization to the transcript

Default: null
format_textboolean · nullableOptional

Enable Text Formatting, can be true or false.

Default: true
disfluenciesboolean · nullableOptional

Transcribe Filler Words, like "umm", in your media file; can be true or false.

Default: false
multichannelboolean · nullableOptional

Enable Multichannel transcription, can be true or false.

Default: false
speaker_labelsboolean · nullableOptional

Enable Speaker diarization, can be true or false.

Default: null
speakers_expectedinteger · nullableOptional

Tell the speaker label model how many speakers it should attempt to identify. See Speaker diarization for more details.

Default: null
content_safetyboolean · nullableOptional

Enable Content Moderation, can be true or false.

Default: false
iab_categoriesboolean · nullableOptional

Enable Topic Detection, can be true or false.

Default: false
auto_highlightsboolean · nullableOptional

Enable Key Phrases, either true or false.

Default: false
word_booststring[]Optional

The list of custom vocabulary to boost transcription probability for.

boost_paramstring · enumOptional

How much to boost specified words. Allowed values: low, default, high.

Possible values:
filter_profanityboolean · nullableOptional

Filter profanity from the transcribed text, can be true or false.

Default: false
redact_piiboolean · nullableOptional

Redact PII from the transcribed text using the Redact PII model, can be true or false.

Default: false
redact_pii_audioboolean · nullableOptional

Generate a copy of the original media file with spoken PII "beeped" out, can be true or false. See PII redaction for more details.

Default: false
redact_pii_audio_qualitystring · enumOptional

Controls the filetype of the audio created by redact_pii_audio. Currently supports mp3 (default) and wav. See PII redaction for more details.

Possible values:
redact_pii_substring · enumOptional

The replacement logic for detected PII, can be entity_type or hash. See PII redaction for more details.

Possible values:
sentiment_analysisboolean · nullableOptional

Enable Sentiment Analysis, can be true or false.

Default: false
entity_detectionboolean · nullableOptional

Enable Entity Detection, can be true or false.

Default: false
summarizationboolean · nullableOptional

Enable Summarization, can be true or false.

Default: false
summary_modelstring · enumOptional

The model to summarize the transcript. Allowed values: informative, conversational, catchy.

Possible values:
summary_typestring · enumOptional

The type of summary. Allowed values: bullets, bullets_verbose, gist, headline, paragraph.

Possible values:
auto_chaptersboolean · nullableOptional

Enable Auto Chapters, either true or false.

Default: false
speech_thresholdnumber · max: 1 · nullableOptional

Reject audio files that contain less than this fraction of speech. Valid values are in the range [0, 1] inclusive.

Responses
200Success
application/json
generation_idstringRequired
post
/v1/stt/create
200Success

Requesting the result of the task from the server using the generation_id

get
Path parameters
generation_idstringRequired
Responses
200Success
application/json
idstringRequired
statusstring · enumRequiredPossible values:
outputany ofRequired
or
or
get
/v1/stt/{generation_id}
200Success

Quick Example: Processing a Speech Audio File via URL

Let's transcribe the following audio fragment:

Response

Last updated

Was this helpful?