slam-1
This model use per-second billing. The cost of audio transcription is based on the number of seconds in the input audio file, not the processing time.
Model Overview
A new Speech-to-Text model offering exceptional accuracy by leveraging its deep understanding of context and semantics.
Setup your API Key
If you don’t have an API key for the AI/ML API yet, feel free to use our Quickstart guide.
API Schema
Creating and sending a speech-to-text conversion task to the server
The point in time, in milliseconds, in the file at which the transcription was started.
The point in time, in milliseconds, in the file at which the transcription was terminated.
The language of your audio file. Possible values are found in Supported Languages. The default value is 'en_us'.
The confidence threshold for the automatically detected language. An error will be returned if the language confidence is below this threshold. Defaults to 0.
Enable Automatic language detection, either true or false. Available for universal model only.
Adds punctuation and capitalization to the transcript
true
Enable Text Formatting, can be true or false.
true
Transcribe Filler Words, like "umm", in your media file; can be true or false.
false
Enable Multichannel transcription, can be true or false.
false
Enable Speaker diarization, can be true or false.
false
Tell the speaker label model how many speakers it should attempt to identify. See Speaker diarization for more details.
Enable Content Moderation, can be true or false.
false
Enable Topic Detection, can be true or false.
false
Enable Key Phrases, either true or false.
false
The list of custom vocabulary to boost transcription probability for.
How much to boost specified words. Allowed values: low, default, high.
Filter profanity from the transcribed text, can be true or false.
false
Redact PII from the transcribed text using the Redact PII model, can be true or false.
false
Generate a copy of the original media file with spoken PII "beeped" out, can be true or false. See PII redaction for more details.
false
Controls the filetype of the audio created by redact_pii_audio. Currently supports mp3 (default) and wav. See PII redaction for more details.
The replacement logic for detected PII, can be entity_type
or hash
. See PII redaction for more details.
Enable Sentiment Analysis, can be true or false.
false
Enable Entity Detection, can be true or false.
false
Enable Summarization, can be true or false.
false
The model to summarize the transcript. Allowed values: informative, conversational, catchy.
The type of summary. Allowed values: bullets, bullets_verbose, gist, headline, paragraph.
Enable Auto Chapters, either true or false.
false
Reject audio files that contain less than this fraction of speech. Valid values are in the range [0, 1] inclusive.
POST /v1/stt/create HTTP/1.1
Host: api.aimlapi.com
Authorization: Bearer <YOUR_AIMLAPI_KEY>
Content-Type: application/json
Accept: */*
Content-Length: 882
{
"model": "aai/slam-1",
"audio": {
"buffer": null,
"mimetype": "text",
"size": 1,
"originalname": "text",
"encoding": "text",
"fieldname": "text"
},
"audio_start_from": 1,
"audio_end_at": 1,
"language_code": "text",
"language_confidence_threshold": 1,
"language_detection": true,
"punctuate": true,
"format_text": true,
"disfluencies": false,
"multichannel": false,
"speaker_labels": false,
"speakers_expected": 1,
"content_safety": false,
"iab_categories": false,
"custom_spelling": [
{
"from": "text",
"to": "text"
}
],
"auto_highlights": false,
"word_boost": [
"text"
],
"boost_param": "low",
"filter_profanity": false,
"redact_pii": false,
"redact_pii_audio": false,
"redact_pii_audio_quality": "mp3",
"redact_pii_policies": [
"account_number"
],
"redact_pii_sub": "entity_name",
"sentiment_analysis": false,
"entity_detection": false,
"summarization": false,
"summary_model": "informative",
"summary_type": "bullets",
"auto_chapters": false,
"speech_threshold": 1
}
{
"generation_id": "123e4567-e89b-12d3-a456-426614174000"
}
Requesting the result of the task from the server using the generation_id
GET /v1/stt/{generation_id} HTTP/1.1
Host: api.aimlapi.com
Authorization: Bearer <YOUR_AIMLAPI_KEY>
Accept: */*
{
"status": "text",
"result": {
"metadata": {
"transaction_key": "text",
"request_id": "text",
"sha256": "text",
"created": "2025-08-14T13:59:55.019Z",
"duration": 1,
"channels": 1,
"models": [
"text"
],
"model_info": {
"ANY_ADDITIONAL_PROPERTY": {
"name": "text",
"version": "text",
"arch": "text"
}
}
}
}
}
Quick Example: Processing a Speech Audio File via URL
Let's transcribe the following audio fragment:
import time
import requests
import json # for getting a structured output with indentation
base_url = "https://api.aimlapi.com/v1"
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
api_key = "<YOUR_AIMLAPI_KEY>"
# Creating and sending a speech-to-text conversion task to the server
def create_stt():
url = f"{base_url}/stt/create"
headers = {
"Authorization": f"Bearer {api_key}",
}
data = {
"model": "aai/slam-1",
"url": "https://audio-samples.github.io/samples/mp3/blizzard_primed/sample-0.mp3"
}
response = requests.post(url, json=data, headers=headers)
if response.status_code >= 400:
print(f"Error: {response.status_code} - {response.text}")
else:
response_data = response.json()
print(response_data)
return response_data
# Requesting the result of the task from the server using the generation_id
def get_stt(gen_id):
url = f"{base_url}/stt/{gen_id}"
headers = {
"Authorization": f"Bearer {api_key}",
}
response = requests.get(url, headers=headers)
return response.json()
# First, start the generation, then repeatedly request the result from the server every 10 seconds.
def main():
stt_response = create_stt()
gen_id = stt_response.get("generation_id")
if gen_id:
start_time = time.time()
timeout = 600
while time.time() - start_time < timeout:
response_data = get_stt(gen_id)
if response_data is None:
print("Error: No response from API")
break
status = response_data.get("status")
if status == "waiting" or status == "active":
print("Still waiting... Checking again in 10 seconds.")
time.sleep(10)
else:
print("Processing complete:/n", response_data["result"]["text"])
# Uncomment the line below to print the entire "result" object with all service data
# print("Processing complete:/n", json.dumps(response_data["result"], indent=2, ensure_ascii=False))
return response_data
print("Timeout reached. Stopping.")
return None
if __name__ == "__main__":
main()
Last updated
Was this helpful?