slam-1

This documentation is valid for the following list of our models:

  • aai/slam-1

A new Speech-to-Text model offering exceptional accuracy by leveraging its deep understanding of context and semantics (English only).

Setup your API Key

If you don’t have an API key for the AI/ML API yet, feel free to use our Quickstart guide.

API Schema

Creating and sending a speech-to-text conversion task to the server

post
Authorizations
AuthorizationstringRequired

Bearer key

Body
modelundefined · enumRequiredPossible values:
audio_start_fromintegerOptional

The point in time, in milliseconds, in the file at which the transcription was started.

audio_end_atintegerOptional

The point in time, in milliseconds, in the file at which the transcription was terminated.

language_codestringOptional

The language of your audio file. Possible values are found in Supported Languages. The default value is 'en_us'.

language_confidence_thresholdnumber · max: 1Optional

The confidence threshold for the automatically detected language. An error will be returned if the language confidence is below this threshold. Defaults to 0.

language_detectionbooleanOptional

Enable Automatic language detection, either true or false. Available for universal model only.

punctuatebooleanOptional

Adds punctuation and capitalization to the transcript

Default: true
format_textbooleanOptional

Enable Text Formatting, can be true or false.

Default: true
disfluenciesbooleanOptional

Transcribe Filler Words, like "umm", in your media file; can be true or false.

Default: false
multichannelbooleanOptional

Enable Multichannel transcription, can be true or false.

Default: false
speaker_labelsbooleanOptional

Enable Speaker diarization, can be true or false.

Default: false
speakers_expectedintegerOptional

Tell the speaker label model how many speakers it should attempt to identify. See Speaker diarization for more details.

content_safetybooleanOptional

Enable Content Moderation, can be true or false.

Default: false
iab_categoriesbooleanOptional

Enable Topic Detection, can be true or false.

Default: false
auto_highlightsbooleanOptional

Enable Key Phrases, either true or false.

Default: false
word_booststring[]Optional

The list of custom vocabulary to boost transcription probability for.

boost_paramstring · enumOptional

How much to boost specified words. Allowed values: low, default, high.

Possible values:
filter_profanitybooleanOptional

Filter profanity from the transcribed text, can be true or false.

Default: false
redact_piibooleanOptional

Redact PII from the transcribed text using the Redact PII model, can be true or false.

Default: false
redact_pii_audiobooleanOptional

Generate a copy of the original media file with spoken PII "beeped" out, can be true or false. See PII redaction for more details.

Default: false
redact_pii_audio_qualitystring · enumOptional

Controls the filetype of the audio created by redact_pii_audio. Currently supports mp3 (default) and wav. See PII redaction for more details.

Possible values:
redact_pii_substring · enumOptional

The replacement logic for detected PII, can be entity_type or hash. See PII redaction for more details.

Possible values:
sentiment_analysisbooleanOptional

Enable Sentiment Analysis, can be true or false.

Default: false
entity_detectionbooleanOptional

Enable Entity Detection, can be true or false.

Default: false
summarizationbooleanOptional

Enable Summarization, can be true or false.

Default: false
summary_modelstring · enumOptional

The model to summarize the transcript. Allowed values: informative, conversational, catchy.

Possible values:
summary_typestring · enumOptional

The type of summary. Allowed values: bullets, bullets_verbose, gist, headline, paragraph.

Possible values:
auto_chaptersbooleanOptional

Enable Auto Chapters, either true or false.

Default: false
speech_thresholdnumber · max: 1Optional

Reject audio files that contain less than this fraction of speech. Valid values are in the range [0, 1] inclusive.

Responses
201Success

Requesting the result of the task from the server using the generation_id

get
Authorizations
AuthorizationstringRequired

Bearer key

Path parameters
generation_idstringRequired
Responses
get
/v1/stt/{generation_id}
201Success

Quick Example: Processing a Speech Audio File via URL

Let's transcribe the following audio fragment:

Response

Last updated

Was this helpful?