Speech 2.5 HD Preview
A high-definition text-to-speech model with enhanced multilingual expressiveness, more precise voice replication, and expanded support for 40 languages.
Setup your API Key
If you don’t have an API key for the AI/ML API yet, feel free to use our Quickstart guide.
Code Example
import os
import requests
def main():
url = "https://api.aimlapi.com/v1/tts"
headers = {
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization": "Bearer <YOUR_AIMLAPI_KEY>",
}
payload = {
"model": "minimax/speech-2.5-turbo-preview",
"text": "Hi! What are you doing today?",
"voice_setting": {
"voice_id": "Wise_Woman"
}
}
response = requests.post(url, headers=headers, json=payload, stream=True)
dist = os.path.abspath("your_file_name.wav")
with open(dist, "wb") as write_stream:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
write_stream.write(chunk)
print("Audio saved to:", dist)
main()import fs from "fs";
import path from "path";
async function main() {
const url = "https://api.aimlapi.com/v1/tts";
const payload = {
model: "minimax/speech-2.5-hd-preview",
text: "Hi! What are you doing today?",
voice_setting: {
voice_id: "Wise_Woman"
}
};
const response = await fetch(url, {
method: "POST",
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization": `Bearer <YOUR_AIMLAPI_KEY>`,
"Content-Type": "application/json"
},
body: JSON.stringify(payload)
});
// Read response as ArrayBuffer and convert to Buffer
const arrayBuffer = await response.arrayBuffer();
const buffer = Buffer.from(arrayBuffer);
// Save audio to file in the current working directory
const dist = path.join(process.cwd(), "your_file_name.wav");
fs.writeFileSync(dist, buffer);
console.log("Audio saved to:", dist);
}
main();API Schema
Bearer key
The text content to be converted to speech.
Enable streaming mode for real-time audio generation. When enabled, audio is generated and delivered in chunks as it's processed.
falseLanguage recognition enhancement option.
Enable subtitle generation service. Only available for non-streaming requests. Generates timing information for the synthesized speech.
falseFormat of the output content for non-streaming requests. Controls how the generated audio data is encoded in the response.
hexPossible values: {
"metadata": {
"transaction_key": "text",
"request_id": "text",
"sha256": "text",
"created": "2025-11-13T00:54:29.058Z",
"duration": 1,
"channels": 1,
"models": [
"text"
],
"model_info": {
"ANY_ADDITIONAL_PROPERTY": {
"name": "text",
"version": "text",
"arch": "text"
}
}
}
}Last updated
Was this helpful?