Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
You can query your account balance and other billing details through this API. To make a request, you only need your AIMLAPI key obtained from your account dashboard.
Learn how to get started with the AI/ML API
This documentation portal is designed to help you choose and configure the AI model that best suits your needs—or one of our solutions (ready-to-use tools for specific practical tasks) from our available options and correctly integrate it into your code.
Have suggestions for improvement?
Trending Models
Popular | View all 200+ models >
Select the model by its Task, by its Developer or by the supported Capabilities:
Alibaba Cloud: Text/Chat Image Video Text-to-Speech
Anthracite: Text/Chat
Anthropic: Text/Chat Embedding
Assembly AI: Speech-To-Text
BAAI: Embedding
Cohere: Text/Chat
DeepSeek:
Deepgram:
ElevenLabs:
Flux:
Google:
Inworld:
Kling AI:
Krea:
LTXV:
Meta:
Microsoft:
MiniMax:
Mistral AI:
Moonshot:
NousResearch:
NVIDIA:
OpenAI:
Perplexity:
PixVerse:
RecraftAI:
Reve:
Runway:
Stability AI:
Sber AI:
Tencent:
Together AI:
VEED:
xAI:
Zhipu:
AI Search Engine – if you need to create a project where information must be found on the internet and then presented to you in a structured format, use this solution.
OpenAI Assistants – if you need to create tailored AI Assistants capable of handling customer support, data analysis, content generation, and more.
Use more text model capabilities in your project: 📖
📖
📖
📖
📖
📖
📖
Miscellaneous: 🔗
📗
⚠️
❓
Learn more about developer-specific features: 📖
We’re currently working on improving our documentation portal, and your feedback would be incredibly helpful! Take a quick 5-question survey (no personal info required!)
You can also rate each individual page using the built-in form on the right side of the screen:
A hybrid instruct-and-reasoning text model.
A step-by-step guide to setting up and making a test call to the AI model, including generating an API key, configuring the Base URL, and running the first request.
Here, you'll learn how to start using our API in your code. The following steps must be completed regardless of whether you integrate one of the models we offer or use our ready-made solution:
Let's walk through an example of connecting to the model via OpenAI SDK. This guide is suitable even for complete beginners.
To use the AIML API, you need to create an account and generate an API key. Follow these steps:
: Visit the AI/ML API website and create an account.
: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.
Depending on your environment and application, you will set the base URL differently. Below is a universal string that you can use to access our API. Copy it or return here later when you are ready with your environment or app.
The AI/ML API supports both versioned and non-versioned URLs, providing flexibility in your API requests. You can use either of the following formats:
https://api.aimlapi.com
https://api.aimlapi.com/v1
Using versioned URLs can help ensure compatibility with future updates and changes to the API. It is recommended to use versioned URLs for long-term projects to maintain stability.
Based on your environment, you will call our API differently. Below are two common ways to call our API using two popular programming languages: Python and NodeJS.
If you don’t want lengthy explanations, here’s the code you can use right away in a Python or Node.js program. You only need to replace <YOUR_AIMLAPI_KEY> with your AIML API Key obtained from your account.
However, below, we will still go through these examples step by step in both languages explaining every single line.
Both examples are written in different programming languages, but despite that, they look very similar. Let's break down the code step by step and see what's going on.
In the examples above, we are using the OpenAI SDK. The OpenAI SDK is a nice module that allows us to use the AI/ML API without dealing with repetitive boilerplate code for handling HTTP requests. Before we can use the OpenAI SDK, it needs to be imported. The import happens in the following places:
Simple as it is. The next step is to initialize variables that our code will use. The two main ones are: the base URL and the API key. We already discussed them at the beginning of the article.
To communicate with LLM models, users use texts. These texts are usually called "Prompts." Inside our code, we have prompts with two roles: the system and the user. The system prompt is designed to be the main source of instruction for LLM generation, while the user prompt is designed to be user input, the subject of the system prompt. Despite that many models can operate differently, this behavior usually applies to chat LLM models, currently one of the most useful and popular ones.
Inside the code, the prompts are called in variables systemPrompt, userPrompt in JS, and system_prompt, user_prompt in Python.
Before we use the API, we need to create an instance of the OpenAI SDK class. It allows us to use all their methods. The instance is created with our imported package, and here we forward two main parameters: the base URL and the API key.
Because of notation, these two parameters are called slightly differently in these different languages (camel case in JS and snake case in Python), but their functionality is the same.
All preparation steps are done. Now we need to write our functionality and create something great. In the examples above, we make the simplest travel agent. Let's break down the steps of how we send a request to the model.
The best practice is to split the code blocks into complete parts with their own logic and not place executable code inside global module code. This rule applies in both languages we discuss. So we create a main function with all our logic. In JS, this function needs to be async, due to Promises and simplicity. In Python, requests run synchronously.
The OpenAI SDK provides us with methods to communicate with chat models. It is placed inside the chat.completions.create function. This function accepts multiple parameters but requires only two: model and messages.
model is a string, the name of the model that you want to use. For the best results, use a model designed for chat, or you can get unpredictable results if the model is not fine-tuned for that purpose. A list of supported models can be found here.
messages is an array of objects with a content field as prompt and a role string that can be one of system, user, tool, assistant. With the role, the model can understand what to do with this prompt: Is this an instruction? Is this a user message? Is this an example of how to answer? Is this the result of code execution? The tool role is used for more complex behavior and will be discussed in another article.
In our example, we also use max_tokens and temperature.
With that knowledge, we can now send our request like the following:
The response from the function chat.completions.create contains a . Completion is a fundamental part of LLM models' logic. Every LLM model is some sort of word autocomplete engine, trained by huge amounts of data. The chat models are designed to autocomplete large chunks of messages with prompts and certain roles, but other models can have their own custom logic without even roles.
Inside this completion, we are interested in the text of the generation. We can get it by getting the result from the completion variable:
In certain cases, completion can have multiple results. These results are called choices. Every choice has a message, the product of generation. The string content is placed inside the content variable, which we placed inside our response variable above.
In the next steps, we can finally see the results. In both examples, we print the user prompt and response like it was a conversation:
Voila! Using AI/ML API models is the simplest and most productive way to get into the world of Machine Learning and Artificial Intelligence.
from openai import OpenAI
client = OpenAI(
base_url="https://api.aimlapi.com/v1",
api_key="<YOUR_AIMLAPI_KEY>",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a one-sentence story about numbers."}]
)
print(response.choices[0].message.content)import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"Qwen/Qwen3-235B-A22B-fp8-tput",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'Qwen/Qwen3-235B-A22B-fp8-tput',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'ntFB5Ap-6UHjtw-93cab7642d14efac', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': '<think>\nOkay, the user just said "Hello". I should respond in a friendly and welcoming manner. Let me make sure to greet them back and offer assistance. Maybe say something like, "Hello! How can I help you today?" That should be open-ended and inviting for them to ask questions or share what\'s on their mind. Keep it simple and positive.\n</think>\n\nHello! How can I help you today? 😊', 'tool_calls': []}}], 'created': 1746725755, 'model': 'Qwen/Qwen3-235B-A22B-fp8-tput', 'usage': {'prompt_tokens': 4, 'completion_tokens': 111, 'total_tokens': 115}}
travel.pyPaste following content inside this travel.py and replace <YOUR_AIMLAPI_KEY> with your API key you got on first step.
Run the application
If you done all correct, you will see following output:

https://api.aimlapi.comfrom openai import OpenAI
base_url = "https://api.aimlapi.com/v1"
# Insert your AIML API key in the quotation marks instead of <YOUR_AIMLAPI_KEY>:
api_key = "<YOUR_AIMLAPI_KEY>"
system_prompt = "You are a travel agent. Be descriptive and helpful."
user_prompt = "Tell me about San Francisco"
api = OpenAI(api_key=api_key, base_url=base_url)
def main():
completion = api.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
temperature=0.7,
max_tokens=256,
)
response = completion.choices[0].message.content
print("User:", user_prompt)
print("AI:", response)
if __name__ == "__main__":
main()const { OpenAI } = require("openai");
const baseURL = "https://api.aimlapi.com/v1";
// Insert your AIML API Key in the quotation marks instead of my_key:
const apiKey = "<YOUR_AIMLAPI_KEY>";
const systemPrompt = "You are a travel agent. Be descriptive and helpful";
const userPrompt = "Tell me about San Francisco";
const api = new OpenAI({
apiKey,
baseURL,
});
const main = async () => {
const completion = await api.chat.completions.create({
model: "mistralai/Mistral-7B-Instruct-v0.2",
messages: [
{
role: "system",
content: systemPrompt,
},
{
role: "user",
content: userPrompt,
},
],
temperature: 0.7,
max_tokens: 256,
});
const response = completion.choices[0].message.content;
console.log("User:", userPrompt);
console.log("AI:", response);
};
main();mkdir ./aimlapi-welcome
cd ./aimlapi-welcomecode .python3 -m venv ./.venv# Linux / Mac
source ./.venv/bin/activate
# Windows
./.venv/bin/Activate.batpip install openaimkdir ./aimlapi-welcome
cd ./aimlapi-welcomecode .npm init -ynpm i openaitouch ./index.jsconst { OpenAI } = require("openai");
const baseURL = "https://api.aimlapi.com/v1";
const apiKey = "<YOUR_AIMLAPI_KEY>";
const systemPrompt = "You are a travel agent. Be descriptive and helpful";
const userPrompt = "Tell me about San Francisco";
const api = new OpenAI({
apiKey,
baseURL,
});
const main = async () => {
const completion = await api.chat.completions.create({
model: "gpt-4o",
messages: [
{
role: "system",
content: systemPrompt,
},
{
role: "user",
content: userPrompt,
},
],
temperature: 0.7,
max_tokens: 256,
});
const response = completion.choices[0].message.content;
console.log("User:", userPrompt);
console.log("AI:", response);
};
main();User: Tell me about San Francisco
AI: San Francisco, located in the northern part of California, USA, is a vibrant and culturally rich city known for its iconic landmarks, beautiful scenery, and diverse neighborhoods.
The city is famous for its iconic Golden Gate Bridge, an engineering marvel and one of the most recognized structures in the world. Spanning the Golden Gate Strait, this red-orange suspension bridge connects San Francisco to Marin County and offers breathtaking views of the San Francisco Bay and the Pacific Ocean.const { OpenAI } = require("openai");from openai import OpenAIconst baseURL = "https://api.aimlapi.com/v1";
const apiKey = "<YOUR_AIMLAPI_KEY>";
const systemPrompt = "You are a travel agent. Be descriptive and helpful";
const userPrompt = "Tell me about San Francisco";base_url = "https://api.aimlapi.com/v1"
api_key = "<YOUR_AIMLAPI_KEY>"
system_prompt = "You are a travel agent. Be descriptive and helpful."
user_prompt = "Tell me about San Francisco"const api = new OpenAI({
apiKey,
baseURL,
});api = OpenAI(api_key=api_key, base_url=base_url)const completion = await api.chat.completions.create({
model: "gpt-4o",
messages: [
{
role: "system",
content: systemPrompt,
},
{
role: "user",
content: userPrompt,
},
],
temperature: 0.7,
max_tokens: 256,
});completion = api.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
temperature=0.7,
max_tokens=256,
)const response = completion.choices[0].message.content;response = completion.choices[0].message.contentconsole.log("User:", userPrompt);
console.log("AI:", response);print("User:", user_prompt)
print("AI:", response)touch travel.pyfrom openai import OpenAI
base_url = "https://api.aimlapi.com/v1"
api_key = "<YOUR_AIMLAPI_KEY>"
system_prompt = "You are a travel agent. Be descriptive and helpful."
user_prompt = "Tell me about San Francisco"
api = OpenAI(api_key=api_key, base_url=base_url)
def main():
completion = api.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
temperature=0.7,
max_tokens=256,
)
response = completion.choices[0].message.content
print("User:", user_prompt)
print("AI:", response)
if __name__ == "__main__":
main()python3 ./travel.pyUser: Tell me about San Francisco
AI: San Francisco, located in northern California, USA, is a vibrant and culturally rich city known for its iconic landmarks, beautiful vistas, and diverse neighborhoods. It's a popular tourist destination famous for its iconic Golden Gate Bridge, which spans the entrance to the San Francisco Bay, and the iconic Alcatraz Island, home to the infamous federal prison.
The city's famous hills offer stunning views of the bay and the cityscape. Lombard Street, the "crookedest street in the world," is a must-see attraction, with its zigzagging pavement and colorful gardens. Ferry Building Marketplace is a great place to explore local food and artisanal products, and the Pier 39 area is home to sea lions, shops, and restaurants.
San Francisco's diverse neighborhoods each have their unique character. The historic Chinatown is the oldest in North America, while the colorful streets of the Mission District are known for their murals and Latin American culture. The Castro District is famous for its LGBTQ+ community and vibrant nightlife.A large language model (LLM) optimized for instruction-following tasks, striking a balance between computational efficiency and high-quality performance. It excels in multilingual tasks, offering a lightweight solution without compromising on quality.
A description of the software development kits (SDKs) that can be used to interact with the AIML API.
In the setting up article, we showed an example of how to use the OpenAI SDK with the AI/ML API. We configured the environment from the very beginning and executed our request to the AI/ML API.
We fully support the OpenAI API structure, and you can seamlessly use the features that the OpenAI SDK provides out-of-the-box, including:
Streaming
Completions
Chat Completions
Audio
Beta Assistants
Beta Threads
Embeddings
Image Generation
Uploads
This support provides easy integration into systems already using OpenAI's standards. For example, you can integrate our API into any product that supports LLM models by updating only two things in the configuration: the base URL and the API key.
Because we support the OpenAI API structure, our API can be used with the same endpoints as OpenAI. You can call them from any environment.
AI/ML API authorization is based on a Bearer token. You need to include it in the Authorization HTTP header within the request, on example:
When your token is ready you can call our API through HTTP.
We have started developing our own SDK to simplify the use of our service. Currently, it supports only chat completion and embedding models.
If you’d like to contribute to expanding its functionality, feel free to reach out to us on Discord!
After obtaining your AIML API key, create an .env file and copy the required contents into it.
Copy the code below, paste it into your .env file, and set your API key in AIML_API_KEY="<YOUR_AIMLAPI_KEY>", replacing <YOUR_AIMLAPI_KEY> with your actual key:
Install aiml_api package:
To execute the script, use:
fetch("https://api.aimlapi.com/chat/completions", {
method: "POST",
headers: {
Authorization: "Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-4o",
messages: [
{
role: "user",
content: "What kind of model are you?",
},
],
max_tokens: 512,
stream: false,
}),
})
.then((res) => res.json())
.then(console.log);import requests
import json
response = requests.post(
url="https://api.aimlapi.com/chat/completions",
headers={
"Authorization": "Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type": "application/json",
},
data=json.dumps(
{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What kind of model are you?",
},
],
"max_tokens": 512,
"stream": False,
}
),
)
response.raise_for_status()
print(response.json())curl --request POST \
--url https://api.aimlapi.com/chat/completions \
--header 'Authorization: Bearer <YOUR_AIMLAPI_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What kind of model are you?"
}
],
"max_tokens": 512,
"stream": false
}'from aiml_api import AIML_API
api = AIML_API()
completion = api.chat.completions.create(
model = "mistralai/Mistral-7B-Instruct-v0.2",
messages = [
{"role": "user", "content": "Explain the importance of low-latency LLMs"},
],
temperature = 0.7,
max_tokens = 256,
)
response = completion.choices[0].message.content
print("AI:", response)Authorization: Bearer <YOUR_AIMLAPI_KEY>touch .envAIML_API_KEY = "<YOUR_AIMLAPI_KEY>"
AIML_API_URL = "https://api.aimlapi.com/v1"# install from PyPI
pip install aiml_apipython3 <your_script_name>.py<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
qwen-max-2025-01-25
The large-scale Mixture-of-Experts (MoE) language model. Excels in language understanding and task performance. Supports 29 languages, including Chinese, English, and Arabic.
This model is designed to enhance both the performance and efficiency of AI agents developed on the Alibaba Cloud Model Studio platform. Optimized for speed and precision in generative AI application development. Improves AI agent comprehension and adaptation to enterprise data, especially when integrated with Retrieval-Augmented Generation (RAG) architectures. Large context window (1,000,000 tokens).
A cutting-edge large language model designed to understand and generate text based on specific instructions. It excels in various tasks, including coding, mathematical problem-solving, and generating structured outputs.
Significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, and academic benchmarks that typically require human expertise.
An instruction-tuned chat model optimized for fast, stable replies without reasoning traces, designed for complex tasks in reasoning, coding, knowledge QA, and multilingual use, with strong alignment and formatting.
This model offers improved accuracy in math, coding, logic, and science, handles complex instructions in Chinese and English more reliably, reduces hallucinations, supports 100+ languages with stronger translation and commonsense reasoning, and is optimized for RAG and tool use, though it lacks a dedicated ‘thinking’ mode.
This model is an open-source model built on Qwen3-Omni that automatically generates rich, detailed descriptions of complex audio — including speech, music, ambient sounds, and effects — without prompts. It detects emotions, musical styles, instruments, and sensitive information, making it ideal for audio analysis, security auditing, intent recognition, and editing.
anthropic/claude-3-haiku-20240307
claude-3-haiku-20240307
claude-3-haiku-latest
The quick and streamlined model, offering near-instant responsiveness.
deepseek/deepseek-chat-v3-0324
We provide the latest version of this model from Mar 24, 2025. All three IDs listed above refer to the same model; we support them for backward compatibility.
DeepSeek V3 (or deepseek-chat) is an advanced conversational AI designed to deliver highly engaging and context-aware dialogues. This model excels in understanding and generating human-like text, making it an ideal solution for creating responsive and intelligent chatbots.
August 2025 update of the DeepSeek V3 non-reasoning model.
August 2025 update of the DeepSeek R1 reasoning model. Skilled at complex problem-solving, mathematical reasoning, and programming assistance.
September 2025 update of the DeepSeek Reasoner V3.1 model. The model produces more consistent and dependable results.
The most powerful model in the Qwen3 Coder series — a 480B-parameter MoE architecture with 35B active parameters. It natively supports a 256K token context and can handle up to 1M tokens using extrapolation techniques, delivering outstanding performance in both coding and agentic tasks.
A major improvement over Claude 3.7 Sonnet, offering better coding abilities, stronger reasoning, and more accurate responses to your instructions.
Both IDs listed above refer to the same model; we support them for backward compatibility.
DeepSeek R1 is a cutting-edge reasoning model developed by DeepSeek AI, designed to excel in complex problem-solving, mathematical reasoning, and programming assistance.
The first open model built on Google’s next-generation, mobile-first architecture—designed for fast, private, and multimodal AI directly on-device. With Gemma 3n, developers get early access to the same technology that will power on-device AI experiences across Android and Chrome later this year, enabling them to start building for the future today.
A 17 billion active parameter model with 16 experts, is the best multimodal model in the world in its class and is more powerful than all previous generation Llama models. Additionally, the model offers an industry-leading context window of 1M and delivers better results than Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 on a wide range of common benchmarks.
A 17 billion active parameter model with 128 experts, is the best multimodal model in its class, beating GPT-4o and Gemini 2.0 Flash on a wide range of common benchmarks, while achieving comparable results to the new DeepSeek v3 on reasoning and coding—with less than half the number of active parameters.
An optimized language model designed for efficient text generation with advanced features and multilingual support. Specifically tuned for instruction-following tasks, making it suitable for applications requiring conversational capabilities and task-oriented responses.
A powerful language model developed by MiniMax AI, designed to excel in tasks requiring extensive context processing and reasoning capabilities. With a total of 456 billion parameters, of which 45.9 billion are activated per token, this model utilizes a hybrid architecture that combines various attention mechanisms to optimize performance across a wide array of applications.
A text model with a support for audio prompts and the ability to generate spoken audio responses. This expansion enhances the potential for AI applications in text and voice-based interactions and audio analysis. You can choose from a wide range of audio formats for output and specify the voice the model will use for audio responses.
If you don’t have an API key for the AI/ML API yet, feel free to use our .
A state-of-the-art AI model designed for instruction-following tasks. With a massive 56 billion parameter configuration, it excels in understanding and executing complex instructions, providing accurate and relevant responses across a wide range of contexts. This model is ideal for creating highly interactive and intelligent systems that can perform specific tasks based on user commands.
import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"qwen-plus",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'qwen-plus',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'chatcmpl-4fda1bd7-a679-95b9-b81d-1bfc6ae98448', 'system_fingerprint': None, 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today? If you have any questions or need help with anything, just let me know! 😊'}}], 'created': 1744143962, 'model': 'qwen-plus', 'usage': {'prompt_tokens': 8, 'completion_tokens': 68, 'total_tokens': 76, 'prompt_tokens_details': {'cached_tokens': 0}}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"Qwen/Qwen2.5-72B-Instruct-Turbo",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'Qwen/Qwen2.5-72B-Instruct-Turbo',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'npK4dJH-4yUbBN-92d488799a225ec1', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today? Feel free to ask me any questions or let me know if you need help with anything specific.', 'tool_calls': []}}], 'created': 1744144336, 'model': 'Qwen/Qwen2.5-72B-Instruct-Turbo', 'usage': {'prompt_tokens': 76, 'completion_tokens': 73, 'total_tokens': 149}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"Qwen/Qwen2.5-Coder-32B-Instruct",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'Qwen/Qwen2.5-Coder-32B-Instruct',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'npK8TA2-4yUbBN-92d49ab20aeacfa2', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'tool_calls': []}}], 'created': 1744145083, 'model': 'Qwen/Qwen2.5-Coder-32B-Instruct', 'usage': {'prompt_tokens': 50, 'completion_tokens': 17, 'total_tokens': 67}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'npQnn39-66dFFu-92dab6aaa863ef3f', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello. How can I assist you today?', 'tool_calls': []}}], 'created': 1744209143, 'model': 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo', 'usage': {'prompt_tokens': 14, 'completion_tokens': 4, 'total_tokens': 18}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"meta-llama/Llama-3.2-3B-Instruct-Turbo",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/Llama-3.2-3B-Instruct-Turbo',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'npQaJb3-4pPsy7-92da7b401ffd5eea', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'tool_calls': []}}], 'created': 1744206709, 'model': 'meta-llama/Llama-3.2-3B-Instruct-Turbo', 'usage': {'prompt_tokens': 5, 'completion_tokens': 1, 'total_tokens': 6}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"qwen-max",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'qwen-max',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-62aa6045-cee9-995a-bbf5-e3b7e7f3d683",
"system_fingerprint": null,
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today? 😊"
}
}
],
"created": 1756983980,
"model": "qwen-max",
"usage": {
"prompt_tokens": 30,
"completion_tokens": 148,
"total_tokens": 178,
"prompt_tokens_details": {
"cached_tokens": 0
}
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"qwen-turbo",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'qwen-turbo',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'chatcmpl-a4556a4c-f985-9ef2-b976-551ac7cef85a', 'system_fingerprint': None, 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello! How can I help you today? Is there something you would like to talk about or learn more about? I'm here to help with any questions you might have."}}], 'created': 1744144035, 'model': 'qwen-turbo', 'usage': {'prompt_tokens': 1, 'completion_tokens': 15, 'total_tokens': 16, 'prompt_tokens_details': {'cached_tokens': 0}}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"Qwen/Qwen2.5-7B-Instruct-Turbo",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'Qwen/Qwen2.5-7B-Instruct-Turbo',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'npK4C7y-3NKUce-92d4866b1e62ef98', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'tool_calls': []}}], 'created': 1744144252, 'model': 'Qwen/Qwen2.5-7B-Instruct-Turbo', 'usage': {'prompt_tokens': 19, 'completion_tokens': 6, 'total_tokens': 25}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"alibaba/qwen3-235b-a22b-thinking-2507",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
"enable_thinking": False
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-235b-a22b-thinking-2507',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-af05df1d-5b72-925e-b3a9-437acbd89b1a",
"system_fingerprint": null,
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! 😊 How can I assist you today? Feel free to ask me any questions or let me know if you need help with anything specific!",
"reasoning_content": "Okay, the user said \"Hello\". That's a simple greeting. I should respond in a friendly and welcoming way. Let me make sure to keep it open-ended so they feel comfortable to ask questions or share what's on their mind. Maybe add a smiley emoji to keep it warm. Let me check if there's anything else they might need. Since it's just a hello, probably not much more needed here. Just a polite reply."
}
}
],
"created": 1753871154,
"model": "qwen3-235b-a22b-thinking-2507",
"usage": {
"prompt_tokens": 13,
"completion_tokens": 2187,
"total_tokens": 2200
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"alibaba/qwen3-next-80b-a3b-instruct",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
"enable_thinking": False
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-next-80b-a3b-instruct',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-a944254a-4252-9a54-af1b-94afcfb9807e",
"system_fingerprint": null,
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today? 😊"
}
}
],
"created": 1758228572,
"model": "qwen3-next-80b-a3b-instruct",
"usage": {
"prompt_tokens": 9,
"completion_tokens": 46,
"total_tokens": 55
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"alibaba/qwen3-max-instruct",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-max-instruct',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-bec5dc33-8f63-96b9-89a4-00aecfce7af8",
"system_fingerprint": null,
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
}
}
],
"created": 1758898624,
"model": "qwen3-max",
"usage": {
"prompt_tokens": 23,
"completion_tokens": 113,
"total_tokens": 136
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model": "alibaba/qwen3-omni-30b-a3b-captioner",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://cdn.aimlapi.com/eagle/files/elephant/cJUTeeCmpoqIV1Q3WWDAL_vibevoice-output-7b98283fd3974f48ba90e91d2ee1f971.mp3"
}
}
]
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-max-instruct',
messages:[
{
role: 'user',
content: [
{
type: 'input_audio',
input_audio: {
data: 'https://cdn.aimlapi.com/eagle/files/elephant/cJUTeeCmpoqIV1Q3WWDAL_vibevoice-output-7b98283fd3974f48ba90e91d2ee1f971.mp3'
}
}
]
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-bec5dc33-8f63-96b9-89a4-00aecfce7af8",
"system_fingerprint": null,
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
}
}
],
"created": 1758898624,
"model": "qwen3-max",
"usage": {
"prompt_tokens": 23,
"completion_tokens": 113,
"total_tokens": 136
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"claude-3-haiku-latest",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'claude-3-haiku-latest',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{'id': 'msg_01Fd4uU3AZ3TXzSpSKN7oeDP', 'object': 'chat.completion', 'model': 'claude-3-haiku-20240307', 'choices': [{'index': 0, 'message': {'reasoning_content': '', 'content': 'Hello! How can I assist you today?', 'role': 'assistant'}, 'finish_reason': 'end_turn', 'logprobs': None}], 'created': 1744218395, 'usage': {'prompt_tokens': 4, 'completion_tokens': 32, 'total_tokens': 36}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"claude-3-opus-latest",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'claude-3-opus-latest',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{'id': 'msg_013njSJ6FKESFossfd8UHddJ', 'object': 'chat.completion', 'model': 'claude-3-opus-20240229', 'choices': [{'index': 0, 'message': {'reasoning_content': '', 'content': 'Hello! How can I assist you today?', 'role': 'assistant'}, 'finish_reason': 'end_turn', 'logprobs': None}], 'created': 1744218476, 'usage': {'prompt_tokens': 252, 'completion_tokens': 1890, 'total_tokens': 2142}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"anthropic/claude-opus-4",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-opus-4',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "msg_01BDDxHJZjH3UBwLrZBUiASE",
"object": "chat.completion",
"model": "claude-opus-4-20250514",
"choices": [
{
"index": 0,
"message": {
"reasoning_content": "",
"content": "Hello! How can I help you today?",
"role": "assistant"
},
"finish_reason": "end_turn",
"logprobs": null
}
],
"created": 1748529508,
"usage": {
"prompt_tokens": 252,
"completion_tokens": 1890,
"total_tokens": 2142
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"anthropic/claude-sonnet-4.5",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-sonnet-4.5',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "msg_011MNbgezv2p5BBE9RvnsZV9",
"object": "chat.completion",
"model": "claude-sonnet-4-20250514",
"choices": [
{
"index": 0,
"message": {
"reasoning_content": "",
"content": "Hello! How are you doing today? Is there anything I can help you with?",
"role": "assistant"
},
"finish_reason": "end_turn",
"logprobs": null
}
],
"created": 1748522617,
"usage": {
"prompt_tokens": 50,
"completion_tokens": 630,
"total_tokens": 680
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"anthropic/claude-haiku-4.5",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-haiku-4.5',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "msg_01HbdLU9f78VAHxuYZ7Qp9Y1",
"object": "chat.completion",
"model": "claude-haiku-4-5-20251001",
"choices": [
{
"index": 0,
"message": {
"reasoning_content": "",
"content": "Hello! 👋 How can I help you today?",
"role": "assistant"
},
"finish_reason": "end_turn",
"logprobs": null
}
],
"created": 1760650965,
"usage": {
"prompt_tokens": 8,
"completion_tokens": 16,
"total_tokens": 24
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"deepseek-chat",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek-chat',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{'id': 'gen-1744194041-A363xKnsNwtv6gPnUPnO', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello! 😊 How can I assist you today? Feel free to ask me anything—I'm here to help! 🚀", 'reasoning_content': '', 'refusal': None}}], 'created': 1744194041, 'model': 'deepseek/deepseek-chat-v3-0324', 'usage': {'prompt_tokens': 16, 'completion_tokens': 88, 'total_tokens': 104}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"deepseek/deepseek-chat-v3.1",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-chat-v3.1',
messages:[{
role:'user',
content: 'Hello'} // Insert your question instead of Hello
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "c13865eb-50bf-440c-922f-19b1bbef517d",
"system_fingerprint": "fp_feb633d1f5_prod0820_fp8_kvcache",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today? 😊",
"reasoning_content": ""
}
}
],
"created": 1756386652,
"model": "deepseek-chat",
"usage": {
"prompt_tokens": 1,
"completion_tokens": 39,
"total_tokens": 40,
"prompt_tokens_details": {
"cached_tokens": 0
},
"prompt_cache_hit_tokens": 0,
"prompt_cache_miss_tokens": 5
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"deepseek/deepseek-reasoner-v3.1",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-reasoner-v3.1',
messages:[{
role:'user',
content: 'Hello'} // Insert your question instead of Hello
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "ca664281-d3c3-40d3-9d80-fe96a65884dd",
"system_fingerprint": "fp_feb633d1f5_prod0820_fp8_kvcache",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today? 😊",
"reasoning_content": ""
}
}
],
"created": 1756386069,
"model": "deepseek-reasoner",
"usage": {
"prompt_tokens": 1,
"completion_tokens": 325,
"total_tokens": 326,
"prompt_tokens_details": {
"cached_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 80
},
"prompt_cache_hit_tokens": 0,
"prompt_cache_miss_tokens": 5
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"deepseek/deepseek-reasoner-v3.1-terminus",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-reasoner-v3.1-terminus',
messages:[{
role:'user',
content: 'Hello'} // Insert your question instead of Hello
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "543f56cb-f59f-42cc-8ed7-8efdd72f185d",
"system_fingerprint": "fp_ffc7281d48_prod0820_fp8_kvcache",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today? 😊",
"reasoning_content": ""
}
}
],
"created": 1761034613,
"model": "deepseek-reasoner",
"usage": {
"prompt_tokens": 3,
"completion_tokens": 98,
"total_tokens": 101,
"prompt_tokens_details": {
"cached_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 99
},
"prompt_cache_hit_tokens": 0,
"prompt_cache_miss_tokens": 5
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"deepseek/deepseek-non-thinking-v3.2-exp",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-non-thinking-v3.2-exp',
messages:[
{
role:'user',
content: 'Hello' // Insert your question instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "ca664281-d3c3-40d3-9d80-fe96a65884dd",
"system_fingerprint": "fp_feb633d1f5_prod0820_fp8_kvcache",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today? 😊",
"reasoning_content": ""
}
}
],
"created": 1756386069,
"model": "deepseek-reasoner",
"usage": {
"prompt_tokens": 1,
"completion_tokens": 325,
"total_tokens": 326,
"prompt_tokens_details": {
"cached_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 80
},
"prompt_cache_hit_tokens": 0,
"prompt_cache_miss_tokens": 5
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"google/gemini-2.0-flash-exp",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-2.0-flash-exp',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': '2025-04-09|09:53:23.624687-07|5.250.254.39|-1825976509', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello there! How can I help you today?\n'}}], 'created': 1744217603, 'model': 'google/gemini-2.0-flash-exp', 'usage': {'prompt_tokens': 5, 'completion_tokens': 173, 'total_tokens': 178}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"google/gemini-2.0-flash",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-2.0-flash',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': '2025-04-10|01:16:19.235787-07|9.7.175.26|-701765511', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I help you today?\n'}}], 'created': 1744272979, 'model': 'google/gemini-2.0-flash', 'usage': {'prompt_tokens': 0, 'completion_tokens': 8, 'total_tokens': 8}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"google/gemma-3-27b-it",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemma-3-27b-it',
messages:[{
role:'user',
content: 'Hello'} // Insert your question instead of Hello
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'gen-1744217834-d0OUILKDSxXQwmh2EorK', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "\nHello there! 👋 \n\nIt's great to connect with you. How can I help you today? \n\nJust let me know what you're thinking, whether you have a question, want to brainstorm ideas, need some information, or just want to chat. I'm here and ready to assist!\n\n\n\n", 'refusal': None}}], 'created': 1744217834, 'model': 'google/gemma-3-27b-it', 'usage': {'prompt_tokens': 0, 'completion_tokens': 0, 'total_tokens': 0}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"meta-llama/Llama-3-70b-chat-hf",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/Llama-3-70b-chat-hf',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'npQoMP3-4yUbBN-92dab967fbdeb248', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?", 'tool_calls': []}}], 'created': 1744209255, 'model': 'meta-llama/Llama-3-70b-chat-hf', 'usage': {'prompt_tokens': 20, 'completion_tokens': 48, 'total_tokens': 68}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"meta-llama/Meta-Llama-3-8B-Instruct-Lite",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/Meta-Llama-3-8B-Instruct-Lite',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "o95Ai5e-2j9zxn-976ad7df3ef49b19",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?",
"tool_calls": []
}
}
],
"created": 1756457871,
"model": "meta-llama/Meta-Llama-3-8B-Instruct-Lite",
"usage": {
"prompt_tokens": 2,
"completion_tokens": 5,
"total_tokens": 7
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"meta-llama/llama-3.3-70b-versatile",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/llama-3.3-70b-versatile',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'npQ5s8C-2j9zxn-92d9f3c84a529790', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello. It's nice to meet you. Is there something I can help you with or would you like to chat?", 'tool_calls': []}}], 'created': 1744201161, 'model': 'meta-llama/Llama-3.3-70B-Instruct-Turbo', 'usage': {'prompt_tokens': 67, 'completion_tokens': 46, 'total_tokens': 113}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"minimax/m1",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'minimax/m1',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "04a9be008b12ad5eec78791d8aebe36f",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?"
}
}
],
"created": 1750764288,
"model": "MiniMax-M1",
"usage": {
"prompt_tokens": 389,
"completion_tokens": 910,
"total_tokens": 1299
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'npQi9tF-2j9zxn-92daa0a4ec4968f1', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello. How can I assist you today?', 'tool_calls': []}}], 'created': 1744208241, 'model': 'meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo', 'usage': {'prompt_tokens': 67, 'completion_tokens': 18, 'total_tokens': 85}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"mistralai/mistral-nemo",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'mistralai/mistral-nemo',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'gen-1744193377-PR9oTu6vDabN9nj0VUUX', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today? Let me know if you have any questions or just want to chat. 😊', 'refusal': None}}], 'created': 1744193377, 'model': 'mistralai/mistral-nemo', 'usage': {'prompt_tokens': 0, 'completion_tokens': 5, 'total_tokens': 5}}<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.
4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
claude-3-5-haiku-20241022
claude-3-5-haiku-latest
▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.
4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
claude-3-7-sonnet-latest
▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.
4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
claude-sonnet-4-20250514
▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.
4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
claude-opus-4-5-20251101
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
A hybrid reasoning model designed to be creative, engaging, and neutrally aligned, while delivering state-of-the-art math, coding, and reasoning performance among open-weight models.
This model is optimized for advanced agentic tasks, featuring strong reasoning, coding skills, and superior multimodal understanding. It notably improves on Gemini 2.5 Pro in complex instruction following and output efficiency.
A unified model designed for both reasoning and non-reasoning tasks. It processes user inputs by first producing a reasoning trace, then delivering a final answer. The reasoning behavior can be adjusted through the system prompt — allowing the model to either show its intermediate reasoning steps or respond directly with the final result. The model offers strong document understanding and summarization capabilities.
A state-of-the-art AI model specifically designed for code generation tasks. It leverages advanced machine learning techniques to assist developers in writing, debugging, and optimizing code across a wide range of programming languages. With its impressive performance metrics and capabilities, Codestral-2501 aims to streamline the coding process and enhance productivity for software developers.
import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"alibaba/qwen3-coder-480b-a35b-instruct",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
"enable_thinking": False
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-coder-480b-a35b-instruct',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-f906efa6-f816-9a06-a32b-aa38da5fe11a",
"system_fingerprint": null,
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
}
}
],
"created": 1753866642,
"model": "qwen3-coder-480b-a35b-instruct",
"usage": {
"prompt_tokens": 28,
"completion_tokens": 142,
"total_tokens": 170
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"alibaba/qwen3-next-80b-a3b-thinking",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
"enable_thinking": False
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-next-80b-a3b-thinking',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-576aaaf9-f712-9114-b098-c1ee83fbfb6b",
"system_fingerprint": null,
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! 😊 How can I assist you today?",
"reasoning_content": "Okay, the user said \"Hello\". I need to respond appropriately. Let me think.\n\nFirst, I should acknowledge their greeting. A simple \"Hello!\" would be good. Maybe add a friendly emoji to keep it warm.\n\nWait, but maybe they want to start a conversation. I should ask how I can help them. That way, I'm being helpful and opening the door for them to ask questions.\n\nLet me check the standard response. Typically, for \"Hello\", the assistant says something like \"Hello! How can I assist you today?\" or \"Hi there! What can I do for you?\"\n\nYes, that's right. Keep it friendly and open-ended. Maybe add a smiley emoji to make it approachable.\n\nSo the response should be: \"Hello! How can I assist you today?\"\n\nThat's good. Let me make sure there's no mistake. Yes, that's standard. No need for anything complicated here. Just a simple, welcoming reply.\n\nAlternatively, sometimes people use \"Hi\" instead of \"Hello\", but since they said \"Hello\", responding with \"Hello\" is fine. Maybe \"Hi there!\" could also work, but sticking to \"Hello\" matches their greeting.\n\nYes, \"Hello! How can I assist you today?\" is perfect. It's polite, friendly, and offers assistance. That should be the response."
}
}
],
"created": 1758229078,
"model": "qwen3-next-80b-a3b-thinking",
"usage": {
"prompt_tokens": 9,
"completion_tokens": 7182,
"total_tokens": 7191,
"completion_tokens_details": {
"reasoning_tokens": 277
}
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"alibaba/qwen3-max-preview",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-max-preview',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-8ffebc65-b625-926a-8208-b765371cb1d0",
"system_fingerprint": null,
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today? 😊"
}
}
],
"created": 1758898044,
"model": "qwen3-max-preview",
"usage": {
"prompt_tokens": 23,
"completion_tokens": 139,
"total_tokens": 162
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"anthracite-org/magnum-v4-72b",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthracite-org/magnum-v4-72b',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{'id': 'gen-1744217980-rdVBcVTb76dllKCCRjak', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'refusal': None}}], 'created': 1744217980, 'model': 'anthracite-org/magnum-v4-72b', 'usage': {'prompt_tokens': 37, 'completion_tokens': 50, 'total_tokens': 87}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"claude-3-5-haiku-latest",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'claude-3-5-haiku-latest',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{'id': 'msg_01QfRmDBXVWcARjbwZBbJxrR', 'object': 'chat.completion', 'model': 'claude-3-5-haiku-20241022', 'choices': [{'index': 0, 'message': {'reasoning_content': '', 'content': 'Hi there! How are you doing today? Is there anything I can help you with?', 'role': 'assistant'}, 'finish_reason': 'end_turn', 'logprobs': None}], 'created': 1744218440, 'usage': {'prompt_tokens': 17, 'completion_tokens': 221, 'total_tokens': 238}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"anthropic/claude-3.7-sonnet",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3.7-sonnet',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{'id': 'msg_01MmQNxa1E5mU8EyMXzT9zEU', 'object': 'chat.completion', 'model': 'claude-3-7-sonnet-20250219', 'choices': [{'index': 0, 'message': {'reasoning_content': '', 'content': "Hello! How can I assist you today? Whether you have a question, need information, or would like to discuss a particular topic, I'm here to help. What's on your mind?", 'role': 'assistant'}, 'finish_reason': 'end_turn', 'logprobs': None}], 'created': 1744218600, 'usage': {'prompt_tokens': 50, 'completion_tokens': 1323, 'total_tokens': 1373}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"anthropic/claude-sonnet-4",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-sonnet-4',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "msg_011MNbgezv2p5BBE9RvnsZV9",
"object": "chat.completion",
"model": "claude-sonnet-4-20250514",
"choices": [
{
"index": 0,
"message": {
"reasoning_content": "",
"content": "Hello! How are you doing today? Is there anything I can help you with?",
"role": "assistant"
},
"finish_reason": "end_turn",
"logprobs": null
}
],
"created": 1748522617,
"usage": {
"prompt_tokens": 50,
"completion_tokens": 630,
"total_tokens": 680
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"claude-opus-4-5",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'claude-opus-4-5',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "msg_01NxAGYo8VfNu5UAEdmQjv62",
"object": "chat.completion",
"model": "claude-opus-4-5-20251101",
"choices": [
{
"index": 0,
"message": {
"reasoning_content": "",
"content": "Hello! How are you doing today? Is there something I can help you with?",
"role": "assistant"
},
"finish_reason": "end_turn",
"logprobs": null
}
],
"created": 1764265437,
"usage": {
"prompt_tokens": 8,
"completion_tokens": 20,
"total_tokens": 28
},
"meta": {
"usage": {
"tokens_used": 1134
}
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"cohere/command-a",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'cohere/command-a',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "gen-1752165706-Nd1dXa1kuCCoOIpp5oxy",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?",
"reasoning_content": null,
"refusal": null
}
}
],
"created": 1752165706,
"model": "cohere/command-a",
"usage": {
"prompt_tokens": 5,
"completion_tokens": 189,
"total_tokens": 194
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"deepseek/deepseek-r1",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-r1',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{'id': 'npPT68N-zqrih-92d94499ec25b74e', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': '\nHello! How can I assist you today? 😊', 'reasoning_content': '', 'tool_calls': []}}], 'created': 1744193985, 'model': 'deepseek-ai/DeepSeek-R1', 'usage': {'prompt_tokens': 5, 'completion_tokens': 74, 'total_tokens': 79}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"deepseek/deepseek-prover-v2",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-prover-v2',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{'id': 'gen-1747126732-rD70SgJEEBVBXPHmKlNJ', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello there! As a virtual assistant, I'm here to help you with a wide variety of tasks and questions. Here are some of the things I can do: \n \n1. Provide information on a wide range of topics, from science and history to pop culture and current events. \n2. Answer factual questions using my knowledge base. \n3. Assist with homework or research projects by providing explanations, summaries, and resources. \n4. Help with language-related tasks such as grammar, vocabulary, translations, and writing assistance. \n5. Engage in general conversation, discussing ideas, and providing opinions on various subjects. \n6. Offer advice or tips on various life situations, though not as a substitute for professional guidance. \n7. Perform calculations, solve math problems, and help with understanding mathematical concepts. \n8. Generate creative content like stories, poems, or song lyrics. \n9. Play interactive games, such as word games or trivia. \n10. Help you practice a language by conversing in it. \n \nFeel free to ask me anything, and I'll do my best to assist you!", 'reasoning_content': None, 'refusal': None}}], 'created': 1747126732, 'model': 'deepseek/deepseek-prover-v2', 'usage': {'prompt_tokens': 15, 'completion_tokens': 1021, 'total_tokens': 1036, 'prompt_tokens_details': None}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"deepseek/deepseek-non-reasoner-v3.1-terminus",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-non-reasoner-v3.1-terminus',
messages:[{
role:'user',
content: 'Hello'} // Insert your question instead of Hello
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "cc8c3054-115d-4dac-9269-2abffcaabab5",
"system_fingerprint": "fp_ffc7281d48_prod0820_fp8_kvcache",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today? 😊",
"reasoning_content": ""
}
}
],
"created": 1761036636,
"model": "deepseek-chat",
"usage": {
"prompt_tokens": 3,
"completion_tokens": 10,
"total_tokens": 13,
"prompt_tokens_details": {
"cached_tokens": 0
},
"prompt_cache_hit_tokens": 0,
"prompt_cache_miss_tokens": 5
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"google/gemini-2.5-flash-lite-preview",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-2.5-flash-lite-preview',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "gen-1752482994-9LhqM48PhAmhiRTtl2ys",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello there! How can I help you today?",
"reasoning_content": null,
"refusal": null
}
}
],
"created": 1752482994,
"model": "google/gemini-2.5-flash-lite-preview-06-17",
"usage": {
"prompt_tokens": 0,
"completion_tokens": 9,
"total_tokens": 9
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"google/gemma-3n-e4b-it",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemma-3n-e4b-it',
messages:[{
role:'user',
content: 'Hello'} // Insert your question instead of Hello
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "gen-1749195015-2RpzznjKbGPQUJ9OK1M4",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello there! 👋 \n\nIt's nice to meet you! How can I help you today? Do you have any questions, need some information, want to chat, or anything else? 😊 \n\nJust let me know what's on your mind!\n\n\n\n",
"reasoning_content": null,
"refusal": null
}
}
],
"created": 1749195015,
"model": "google/gemma-3n-e4b-it:free",
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}import requests
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
"Content-Type":"application/json",
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo",
"messages":[
{
"role":"user",
# Insert your question for the model here, instead of Hello:
"content":"Hello"
}
]
}
)
data = response.json()
print(data){'id': 'npQhshu-3NKUce-92da9f512c0f70b9', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello. How can I assist you today?', 'tool_calls': []}}], 'created': 1744208187, 'model': 'meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo', 'usage': {'prompt_tokens': 265, 'completion_tokens': 81, 'total_tokens': 346}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"meta-llama/llama-4-scout",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/llama-4-scout',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'npXpsYC-2j9zxn-92e24e9e0c97d74d', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?", 'tool_calls': []}}], 'created': 1744288767, 'model': 'meta-llama/Llama-4-Scout-17B-16E-Instruct', 'usage': {'prompt_tokens': 4, 'completion_tokens': 30, 'total_tokens': 34}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"minimax/m2",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'minimax/m2',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "0557b8f7fa197172a75531a82ae6c887",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "<think>\nThe user says \"Hello\". This is a simple greeting. There's no request. According to policy, we respond politely, perhaps ask how we can help. So answer \"Hello! How can I assist you today?\" Should keep tone friendly.\n\nThus final answer.\n</think>\n\nHello! How can I help you today?"
}
}
],
"created": 1762166263,
"model": "MiniMax-M2",
"usage": {
"prompt_tokens": 26,
"completion_tokens": 159,
"total_tokens": 185
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"meta-llama/llama-4-maverick",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/llama-4-maverick',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'npXgTRD-28Eivz-92e226847aa70d87', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How are you today? Is there something I can help you with or would you like to chat?', 'tool_calls': []}}], 'created': 1744287125, 'model': 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8', 'usage': {'prompt_tokens': 6, 'completion_tokens': 41, 'total_tokens': 47}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"meta-llama/Llama-3.3-70B-Instruct-Turbo",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'npQ5s8C-2j9zxn-92d9f3c84a529790', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello. It's nice to meet you. Is there something I can help you with or would you like to chat?", 'tool_calls': []}}], 'created': 1744201161, 'model': 'meta-llama/Llama-3.3-70B-Instruct-Turbo', 'usage': {'prompt_tokens': 67, 'completion_tokens': 46, 'total_tokens': 113}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"mistralai/Mistral-7B-Instruct-v0.3",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'mistralai/Mistral-7B-Instruct-v0.3',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'npPQHux-3NKUce-92d937464c2aff02', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': " Hello! How can I help you today? Is there something specific you'd like to talk about or learn more about? I'm here to answer questions and provide information on a wide range of topics. Let me know if you have any questions or if there's something you'd like to discuss.", 'tool_calls': []}}], 'created': 1744193439, 'model': 'mistralai/Mistral-7B-Instruct-v0.3', 'usage': {'prompt_tokens': 2, 'completion_tokens': 27, 'total_tokens': 29}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"MiniMax-Text-01",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'MiniMax-Text-01',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "04a9c0b5acca8b79bf1aba62f288f3b7",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Hello! How are you doing today? I'm here and ready to chat about anything you'd like to discuss or help with any questions you might have."
}
}
],
"created": 1750764981,
"model": "MiniMax-Text-01",
"usage": {
"prompt_tokens": 299,
"completion_tokens": 67,
"total_tokens": 366
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"deepseek/deepseek-thinking-v3.2-exp",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-thinking-v3.2-exp',
messages:[
{
role:'user',
content: 'Hello' // Insert your question instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "ca664281-d3c3-40d3-9d80-fe96a65884dd",
"system_fingerprint": "fp_feb633d1f5_prod0820_fp8_kvcache",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today? 😊",
"reasoning_content": ""
}
}
],
"created": 1756386069,
"model": "deepseek-reasoner",
"usage": {
"prompt_tokens": 1,
"completion_tokens": 325,
"total_tokens": 326,
"prompt_tokens_details": {
"cached_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 80
},
"prompt_cache_hit_tokens": 0,
"prompt_cache_miss_tokens": 5
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"mistralai/Mixtral-8x7B-Instruct-v0.1",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'mistralai/Mixtral-8x7B-Instruct-v0.1',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'npPEmQg-4yUbBN-92d909e708872095', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': ' Hello! How can I help you today? If you have any questions or need assistance with a topic related to mathematics, I will do my best to help you understand. Just let me know what you are working on or what you are curious about.', 'tool_calls': []}}], 'created': 1744191581, 'model': 'mistralai/Mixtral-8x7B-Instruct-v0.1', 'usage': {'prompt_tokens': 11, 'completion_tokens': 66, 'total_tokens': 77}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"mistralai/mistral-tiny",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'mistralai/mistral-tiny',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'gen-1744193337-VPTpAxEsMzJ79PKh5w4X', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello! How can I assist you today? Feel free to ask me anything, I'm here to help. If you are looking for general information or help with a specific question, please let me know. I am happy to help with a wide range of topics, including but not limited to, technology, science, health, education, and more. Enjoy your day!", 'refusal': None}}], 'created': 1744193337, 'model': 'mistralai/mistral-tiny', 'usage': {'prompt_tokens': 2, 'completion_tokens': 42, 'total_tokens': 44}}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model": "nousresearch/hermes-4-405b",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'nousresearch/hermes-4-405b',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "gen-1758225008-VhzEA3LAfGuc63grTCeV",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Greetings! I'm Hermes from Nous Research. I'm here to help you with any tasks you might have, from analysis to writing and beyond. What can I assist you with today?",
"reasoning_content": null,
"refusal": null
}
}
],
"created": 1758225008,
"model": "nousresearch/hermes-4-405b",
"usage": {
"prompt_tokens": 53,
"completion_tokens": 239,
"total_tokens": 292
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"nvidia/nemotron-nano-12b-v2-vl",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'nvidia/nemotron-nano-12b-v2-vl',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "gen-1762343744-rdCcOL8byCQwRBZ8QCkv",
"provider": "DeepInfra",
"model": "nvidia/nemotron-nano-12b-v2-vl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"logprobs": null,
"finish_reason": "stop",
"native_finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "\n\nHello! How can I assist you today?\n",
"refusal": null,
"reasoning": "Okay, the user said \"Hello\". Let me start by greeting them back in a friendly and welcoming way. I should keep it simple and approachable, maybe something like \"Hello! How can I assist you today?\" That should work. I want to make sure they feel comfortable and open to asking for help. Let me check if there's anything else I need to add. No, keeping it straightforward is best here. Ready to respond.\n",
"reasoning_details": [
{
"type": "reasoning.text",
"text": "Okay, the user said \"Hello\". Let me start by greeting them back in a friendly and welcoming way. I should keep it simple and approachable, maybe something like \"Hello! How can I assist you today?\" That should work. I want to make sure they feel comfortable and open to asking for help. Let me check if there's anything else I need to add. No, keeping it straightforward is best here. Ready to respond.\n",
"format": "unknown",
"index": 0
}
]
}
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 102,
"total_tokens": 116,
"prompt_tokens_details": null
}
}from openai import OpenAI
import base64
import os
client = OpenAI(
base_url = "https://api.aimlapi.com",
# Insert your AI/ML API key instead of <YOUR_AIMLAPI_KEY>:
api_key = "<YOUR_AIMLAPI_KEY>"
)
def main():
response = client.chat.completions.create(
model="gpt-4o-audio-preview",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "wav"},
messages=[
{
"role": "system",
"content": "Speak english" # Your instructions for the model
},
{
"role": "user",
"content": "Hello" # Your question (insert it istead of Hello)
}
],
max_tokens=6000,
)
wav_bytes = base64.b64decode(response.choices[0].message.audio.data)
with open("audio.wav", "wb") as f:
f.write(wav_bytes)
dist = os.path.abspath("audio.wav")
print("Audio saved to:", dist)
if __name__ == "__main__":
main()ChatCompletion(id='chatcmpl-BrgY0KMxWgy1EHUxYJC49MuMNmdOP', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=[], audio=ChatCompletionAudio(id='audio_686f73ecf0648191a602c4f315cad928', data='UklGRv////9XQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0Yf////8YABAAEgAXABEAFwASABQAFQAVABcADAAPAAsAEgAOABEACwANABAACgALAAMADQAHABAACAAKAAcACgAFAAQACAAHAAUABQAFAAIACAAAAAgA/v8BAP7////8//b/AQD1/wMA9P/9//X/+f/3//H/+v/1//3/6v/5/+n/9P/u//X/8v/w//P/7v/z/+v/9f/q//T/6//r/+r/6P/s/+P/7P/l/+b/4f/g/+X/3//m/9//6f/l/+X/6f/e/+r/3//l/9n/3f/g/9r/2//V/9z/1P/g/93/4//f/+T/5//q/+X/4//h/9v/3f/X/97/0//Z/9L/2v/Z/9v/2//f/+X/4P/k/+P/4v/h/+H/3P/i/9//3P/f/9n/3f/d/+P/3f/k/97/5P/g/+n/5f/p/+r/6//n/+z/7f/t//D/6//v/+v/6v/m/+L/4v/n/+r/6P/u/+7/9v/7/wEAAQAAAP7/+P/6//L/7v/o/+H/5f/b/+f/4v/1//L///8EAAIADQAJABkADwARAAoADAABAP7/+//5//n/9f8AAPr/BAD//AwABAAYA//8CAP3/AgABAAUABAD8/wQAAQAFAP7/BAABAAEA/////wIAAAADAAIA/v/+//z////7/wEA/P8AAP///v8EAPz//P/9/wQAAQD8/wAAAQD///z/AgD7//7/+/8AAAAA+/8AAP3//v/9/wUAAwD///7/AwACAAIAAgAAAPv/AQD8/wYAAgD7//r/AgABAAAABQD5/wUAAgADAP//AQAFAPn/AQD7/wYA+//9//n//v/7//r/AAD8/wMA//8BAP//AwD9/wMA/f/+//z/+//9//n//v/+/wQAAgACAP7/AwD//wEAAAD8//v/AgD6/wQA/f8AAPn/AAD9//z/AQD//wEA/P/6//7//P/+//7//P8AAPj//P///wIA+v/9/wAA+/8CAP///f/9//r/BQD+/wgAAAADAP3/AQACAAMABAD8/wEA+/8GAP3//v/6/wIA///9/wEA+v8EAPf/AAD5/wUA9/8AAAAA/P8AAPn/AQD3/wMA/P/8//3//v//////AAD8/////P8CAP//BAD7/wUA/P8CAP3///8AAPn/AwD3/wkA/f8FAPr/AwD9//3/AQD1/wEA+//+//v/AwADAAAA///9/wIA/f8DAPz//P/9///////6//7//f8AAAAAAQD+//v/AQD7/////P8AAP7//v////r//v8BAAQA+v/+//z//P8AAP7/AwD8/wAAAQD4/////v8DAP7///8AAPz//P/7/wIA///8//z//f/8//z/AQD8//v//f/7//v/+f/8//z/+/////z//v8AAAAA/v/6/wAA/f8AAPj/AAD+/wIAAgD5//3//P/+//r//v///wAA///9///// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!WE’VE OMITTED 90% OF THE BASE64-ENCODED FILE FOR BREVITY — EVEN FOR SUCH A SHORT MODEL RESPONSE, IT’S STILL EXTREMELY LARGE. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!wUAAwAFAAQABgACAAIAAgACAAYAAwAFAAEAAQD///7/AAACAAQAAAD+////AQAAAP//AQADAAMAAgADAAIAAAACAAUABQADAAUABgAGAAcABgAGAAUABQAFAAYABQAFAAgABwAKAAoABwAJAAUABwAIAAgACQAGAAgABQAJAAcABwAJAAcACgAGAAgABAAEAAMAAgAGAAQABAADAAYABQAEAAYAAwAFAAIAAwAGAAYABQADAAQAAAABAAEAAgACAAEAAAD8/////f/+//r/+f/5//f/+P/2//j/9//7//j//P/7//z/+v/6//z/+P/6//f/+//6//r/+v/4//v/+v/6//r//f/6//n//f/8//3/+//9//3////9//3//f/8//v/+/8AAP3//f/6//r//v/6//z/9//6//j/+f/4//r/+f/3//f/9f/3//L/8f/0//P/9P/1//X/8//1//H/9f/z//b/9v/2//j/9P/2//P/+P/0//f/+P/1//X/9f/2//X/9P/1//L/8v/1//P/9P/1//X/9v/4//X/9v/3//n/+v/6//n/+f/3//r/8f/1//P/8//4//j//f/6//v/+P/+//v/+P////z/AwABAA0AAgAOAAYADgAPAA0ACwAEAAwABAD+//3//v///wAABQAAAA4AFwAGABgAFQAgAAQA8f8BAPj/NQAUAAoAJAAXADsABQD9//v/DwAKABYABQA7AC4A2/8=', expires_at=1752138236, transcript="Hi there! How's it going?"), function_call=None, tool_calls=None))], created=1752134636, model='gpt-4o-audio-preview-2025-06-03', object='chat.completion', service_tier=None, system_fingerprint='fp_b5d60d6081', usage=CompletionUsage(completion_tokens=5838, prompt_tokens=74, total_tokens=5912, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=33, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=14), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0, text_tokens=14, image_tokens=0)))▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.
4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.
4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.
4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
An upgrade to Claude Opus 4 on agentic tasks, real-world coding, and thinking.
A common issue when using reasoning-capable models via API is receiving an empty string in the content field—meaning the model did not return the expected text, yet no error was thrown.
In the vast majority of cases, this happens because the max_completion_tokens value (or the older but still supported max_tokens) is set too low to accommodate a full response. Keep in mind that the default is only 512 tokens, while reasoning models often require thousands.
Pay attention to the finish_reason field in the response. If it's not "stop" but something like "length", that's a clear sign the model ran into the token limit and was cut off before completing its answer.
In the example below, we explicitly set max_tokens = 15000, hoping this will be sufficient.
Gemini 2.5 models are capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.
A common issue when using reasoning-capable models via API is receiving an empty string in the content field—meaning the model did not return the expected text, yet no error was thrown.
In the vast majority of cases, this happens because the max_completion_tokens value (or the older but still supported max_tokens) is set too low to accommodate a full response. Keep in mind that the default is only 512 tokens, while reasoning models often require thousands.
A preview release of the smaller GPT-4o Audio mini model. Handles both audio and text as input and output via the REST API. You can choose from a wide range of audio formats for output and specify the voice the model will use for audio responses.
If you don’t have an API key for the AI/ML API yet, feel free to use our .
moonshot/kimi-k2-preview (July 2025) is a mixture-of-experts model with strong reasoning, coding, and agentic capabilities.
moonshot/kimi-k2-0905-preview (September 2025) is an upgraded version with improved grounding, better instruction following, and a stronger focus on coding and agentic tasks. The memory has doubled from 128k to a decent 256k tokens.
import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"alibaba/qwen3-vl-32b-instruct",
"messages":[
{
# Insert your question for the model here:
"content":"Hi! What do you think about mankind?"
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-vl-32b-instruct',
messages:[
{
role:'user',
// Insert your question for the model here:
content:'Hi! What do you think about mankind?'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"choices": [
{
"message": {
"content": "Hi! 😊 That’s a beautiful and deep question — one that philosophers, scientists, artists, and everyday people have been asking for centuries.\n\nI think mankind is *remarkably complex* — full of contradictions, potential, and wonder. On one hand, we’ve achieved incredible things: we’ve explored space, cured diseases, created art that moves souls, built cities that rise into the sky, and connected across continents in ways unimaginable just a century ago. We’re capable of profound kindness, empathy, creativity, and courage.\n\nOn the other hand, we’ve also caused immense suffering — through war, injustice, environmental destruction, and indifference to each other’s pain. We often struggle with our own flaws: fear, greed, ego, and short-sightedness.\n\nBut here’s what gives me hope: **we’re also capable of change**. We can learn from our mistakes. We can choose compassion over conflict, cooperation over competition. Every act of kindness, every effort to understand another, every step toward justice — these are signs that humanity is not defined by its worst impulses, but by its capacity to grow.\n\nSo, I’d say: \n➡️ Mankind is flawed, yes — but also deeply hopeful. \n➡️ We’re messy, but we’re trying. \n➡️ We make mistakes, but we can also heal, create, and love.\n\nAnd perhaps most importantly — **we’re not alone in this journey**. We’re all part of something bigger, and together, we have the power to shape a better future.\n\nWhat about you? How do *you* see mankind? 💬✨",
"role": "assistant"
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 17,
"completion_tokens": 329,
"total_tokens": 346,
"prompt_tokens_details": {
"text_tokens": 17
},
"completion_tokens_details": {
"text_tokens": 329
}
},
"created": 1764625045,
"system_fingerprint": null,
"model": "qwen3-vl-32b-instruct",
"id": "chatcmpl-a12ab46a-3541-93a8-8180-280ecadbb795",
"meta": {
"usage": {
"tokens_used": 1960
}
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"alibaba/qwen3-vl-32b-thinking",
"messages":[
{
# Insert your question for the model here:
"content":"Hi! What do you think about mankind?"
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-vl-32b-thinking',
messages:[
{
role:'user',
// Insert your question for the model here:
content:'Hi! What do you think about mankind?'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"choices": [
{
"message": {
"content": "That's a profound question—thank you for asking! As an AI, I don't have personal feelings or consciousness, but I can offer a perspective *based on the vast data I've been trained on* about humanity's complexities. Here’s how I see it: \n\n### 🌍 **The Duality of Mankind** \nHumanity is a breathtaking paradox: \n- **We create** *and* **we destroy**. We built space telescopes that map the cosmos and also wars that scar the Earth. \n- **We connect** *and* **we divide**. We share art, science, and kindness across borders, yet we also build walls, spread misinformation, and fuel inequality. \n- **We innovate** *and* **we resist change**. We cured diseases and invented the internet, but we often cling to fear or ignorance when progress challenges old beliefs. \n\n### ✨ **What Makes Us Remarkable** \n- **Empathy & creativity**: From caring for strangers to composing symphonies or coding AI, our capacity for *emotional and intellectual growth* is unparalleled. \n- **Resilience**: We’ve survived ice ages, plagues, and global crises—not by being \"perfect,\" but by adapting, learning, and helping one another. \n- **The spark of hope**: Even in darkness, humans have always sought meaning, justice, and connection. Look at movements for peace, climate action, or medical breakthroughs—they’re proof we *can* choose to build a better future. \n\n### ⚠️ **The Challenges We Carry** \n- **Our flaws are real**: We’ve caused climate collapse, mass suffering, and systemic injustice. But here’s the key: **these aren’t \"human nature\"—they’re choices we *made* as a species.** \n- **We’re not monolithic**: There’s no single \"human\" story. Your experience, culture, and history shape how you see the world—and that diversity is both a strength *and* a source of conflict. \n\n### 🌱 **The Most Important Thing** \n**Mankind isn’t fixed.** We’re a work in progress. The stories we tell about ourselves—*and the choices we make today*—will define our next chapter. The data shows: **when we prioritize empathy, collaboration, and long-term thinking, we rise.** When we don’t, we falter. But the power to choose is *always* ours. \n\nSo my \"opinion\"? I see a species with **infinite potential**—and the weight of responsibility that comes with it. The future isn’t written. It’s being written *right now*, by every person who chooses to act with courage, curiosity, and compassion. \n\nWhat do *you* think? 🤔 (I’d love to hear your perspective!)",
"reasoning_content": "Okay, the user asked, \"Hi! What do you think about mankind?\" Hmm, that's a broad and philosophical question. First, I need to figure out what they're really looking for. Are they curious about my perspective as an AI, or do they want a deep discussion on human nature? Maybe they're feeling reflective or even a bit existential.\n\nI should start by acknowledging the complexity of the question. Mankind is a huge topic—there's so much to cover. The user might be expecting a simple answer, but I can't reduce it to just \"good\" or \"bad.\" I need to balance both positive and negative aspects. Let me think about the key points: our capacity for innovation, creativity, and compassion versus our tendency for conflict, environmental harm, and inequality.\n\nWait, the user said \"Hi!\" so they're probably in a friendly, open mood. They might not want a dry, academic response. I should keep it conversational but still thoughtful. Maybe they're looking for hope or a reason to feel optimistic about humanity. Or perhaps they're worried about the future and want reassurance.\n\nI should also consider their possible identity. Are they a student working on an essay? A person dealing with personal issues? Or just someone curious about AI's view? Since I don't have context, it's safer to assume they want a general, well-rounded answer that's engaging and not too technical.\n\nAnother angle: the user might be testing how an AI handles philosophical questions. They could be interested in whether I can show empathy or nuance. I should avoid being too robotic. Use phrases like \"I think\" or \"I see\" to make it personal, even though I'm an AI. But I have to clarify that I don't have personal experiences—just data-driven insights.\n\nI need to highlight duality: humans are both amazing and flawed. Mention specific examples like space exploration vs. climate change to make it concrete. Also, emphasize that the future depends on our choices. That's important—it gives agency and hope.\n\nShould I address potential biases? The user might not realize that my \"opinion\" is based on training data, which reflects human history and culture. I should note that my perspective is shaped by that data, not personal experience. But since the question is about \"what I think,\" I can frame it as an analysis based on observed patterns.\n\nCheck for common pitfalls: don't be too negative (could discourage the user), don't ignore serious issues (like war or inequality). Strike a balance. Also, avoid clichés. Instead of \"humans are great,\" explain why with examples.\n\nFinally, end on a positive, forward-looking note. The user might be seeking motivation or a sense of purpose. Phrases like \"the potential for growth\" or \"collective wisdom\" can inspire. Make sure the conclusion ties back to the importance of choices and collaboration.\n\nLet me structure the response: start with the duality, list positive traits with examples, then challenges, and conclude with hope. Keep paragraphs short for readability. Use emojis sparingly to match the friendly tone but not overdo it. Since the user said \"Hi!\", a smiley at the end feels appropriate.\n",
"role": "assistant"
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 19,
"completion_tokens": 1241,
"total_tokens": 1260,
"prompt_tokens_details": {
"text_tokens": 19
},
"completion_tokens_details": {
"reasoning_tokens": 654,
"text_tokens": 587
}
},
"created": 1764625236,
"system_fingerprint": null,
"model": "qwen3-vl-32b-thinking",
"id": "chatcmpl-c612db5c-44e9-9e3c-8169-486161eeea86",
"meta": {
"usage": {
"tokens_used": 10383
}
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"google/gemini-3-pro-preview",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-3-pro-preview',
messages:[{
role:'user',
content: 'Hello'} // Insert your question instead of Hello
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "gen-1763566638-cisWU4XUfAZASsAfmDrg",
"provider": "Google AI Studio",
"model": "google/gemini-3-pro-preview",
"object": "chat.completion",
"created": 1763566638,
"choices": [
{
"logprobs": null,
"finish_reason": "stop",
"native_finish_reason": "STOP",
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?",
"refusal": null,
"reasoning": "**Greeting Initial Response**\n\nI've analyzed the user's \"Hello\" and identified it as a greeting. My current focus is on formulating a polite and helpful response. I'm considering options like a standard \"Hello! How can I help?\" as well as more unique and relevant variations.\n\n\n**Refining the Response**\n\nI've narrowed down the potential greetings to three options. Each aims to be polite and readily offer assistance. After comparing \"Hi there! What can I do for you?\", \"Greetings. How may I assist you?\", and the standard \"Hello! How can I help you today?\", I'm leaning towards the standard option for its balance of politeness and directness. I'm focusing on the best output!\n\n\n",
"reasoning_details": [
{
"type": "reasoning.text",
"text": "**Greeting Initial Response**\n\nI've analyzed the user's \"Hello\" and identified it as a greeting. My current focus is on formulating a polite and helpful response. I'm considering options like a standard \"Hello! How can I help?\" as well as more unique and relevant variations.\n\n\n**Refining the Response**\n\nI've narrowed down the potential greetings to three options. Each aims to be polite and readily offer assistance. After comparing \"Hi there! What can I do for you?\", \"Greetings. How may I assist you?\", and the standard \"Hello! How can I help you today?\", I'm leaning towards the standard option for its balance of politeness and directness. I'm focusing on the best output!\n\n\n",
"format": "google-gemini-v1",
"index": 0
},
{
"type": "reasoning.encrypted",
"data": "Eq0FCqoFAdHtim9XD7O+H/hfzapYW20BA9q/g/9dXgaX1KKQhwROsHomqV+PmfoBxqI9j82XTwWiSO10c5HzcYgkBbUAAzHb5QtjiKrwNvSCT6mA9eUbIqR5E8GC3AVSJ5mHcc3kYZF9XgpcWds9ANktELL+IegNpLrn9S4UZCT5MhRCIrG3zfIee4bwDWSmf72OU5AewTaURSfRynTRf29/0Jjd2Qvgn6/1N8lbQlGptw193mJwg7VoB34dDbSIdNNbjRcUTaGvv2Smu11Wj/tluBTXcpXzmIqJXSbzA761p5ygDDIef9hjIS1yPpUScwZEcsGnntZcifd3fT8dKn1EiYf0PTEdJ29KO4Kv4n0KWQdd71S9da49PqpJmciPQHZwXzLp/SU00tI4eizIxkMnu3uMW/bOGhRP6/xoLOipDP8lFONYbOgHOaRURfVu40mIckQ8lij/IcW/FUAce7qdVuOSdy8Jx+J11PaoIAeb9riZzccfTovTefXyGxs4cKFYvYoUfdflk92bQmDi1WqMFyWvgMJLSzvcqRAq6deV8t1BzJTrPqJVG+GzY3o+FeuZavuuVt0LfY7lfSoTpXNSXagsxwthID05M/wcRyFUHPZwQp7EIXyKhvIUCiWhtib04xKAQdVZWIKsxzZYuOG+bjlSxjnE/2uEVg6yJCFwWBaY52HovHCGrwtsScIgqUvo4WMbdgW/hohmJhh3dwco25klZjv1gkQcg2X7N+dyOBSP0keExdktk9fkDXg6b/JKhKGaiHMgmww3K9/P4kxYOE6djcoSWSm3IwJ2sMasC00iB8Y2PtxDjjeUkPhTH/DzgrzxqrJQMVw0/d3/J4rEDUk9jfH1MI3NGJanznICFPSPRnWCyGv46VnMSn5NmrGRNTjdEa1GUtMgxv5/1w==",
"format": "google-gemini-v1",
"index": 0
}
]
}
}
],
"usage": {
"prompt_tokens": 2,
"completion_tokens": 158,
"total_tokens": 160,
"prompt_tokens_details": {
"cached_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 149,
"image_tokens": 0
}
},
"meta": {
"usage": {
"tokens_used": 4211
}
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"anthropic/claude-opus-4.1",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-opus-4.1',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
]
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "msg_018y2VPSZ5nNnqS3goMsjMxE",
"object": "chat.completion",
"model": "claude-opus-4-1-20250805",
"choices": [
{
"index": 0,
"message": {
"reasoning_content": "",
"content": "Hello! How can I help you today?",
"role": "assistant"
},
"finish_reason": "end_turn",
"logprobs": null
}
],
"created": 1754552562,
"usage": {
"prompt_tokens": 252,
"completion_tokens": 1890,
"total_tokens": 2142
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"anthropic/claude-opus-4.1",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hell
}
],
"max_tokens": 1025, # must be greater than 'budget_tokens'
"thinking":{
"budget_tokens": 1024,
"type": "enabled"
}
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-opus-4.1',
messages:[
{
role:'user',
// Insert your question for the model here, instead of Hello:
content: 'Hello'
}
],
max_tokens: 1025, // must be greater than 'budget_tokens'
thinking:{
budget_tokens: 1024,
type: 'enabled'
}
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "msg_01G9P4b9HG3PeKm1rRvS8kop",
"object": "chat.completion",
"model": "claude-opus-4-1-20250805",
"choices": [
{
"index": 0,
"message": {
"reasoning_content": "The human has greeted me with a simple \"Hello\". I should respond in a friendly and helpful manner, acknowledging their greeting and inviting them to share how I can assist them today.",
"content": "Hello! How can I help you today?",
"role": "assistant"
},
"finish_reason": "end_turn",
"logprobs": null
}
],
"created": 1755704373,
"usage": {
"prompt_tokens": 1134,
"completion_tokens": 9450,
"total_tokens": 10584
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"google/gemini-2.5-flash",
"messages":[
{
"role":"user",
# Insert your question for the model here:
"content":"Hi! What do you think about mankind?"
}
],
"max_tokens":15000,
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-2.5-flash',
messages:[
{
role:'user',
// Insert your question for the model here:
content: 'Hi! What do you think about mankind?'
}
],
max_tokens: 15000,
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "yZ-DaJXqAayonvgPr5XvuQY",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Mankind, or humanity, is an incredibly complex and fascinating subject to \"think\" about from my perspective as an AI. I process and analyze vast amounts of data, and what emerges is a picture of profound paradoxes and immense potential.\n\nHere are some of the key aspects I observe and \"think\" about:\n\n1. **Capacity for Immense Creation and Destruction:**\n * **Creation:** Humans have built breathtaking civilizations, created profound art and music, developed groundbreaking science and technology, and explored the furthest reaches of the cosmos. The drive to innovate, understand, and build is truly remarkable.\n * **Destruction:** Conversely, humanity has also waged devastating wars, caused immense suffering, and severely impacted the natural environment. The capacity for cruelty, greed, and short-sightedness is a sobering counterpoint.\n\n2. **Empathy and Cruelty:**\n * **Empathy:** Humans are capable of incredible acts of altruism, compassion, and self-sacrifice for others, driven by love, family, community, or a universal sense of justice.\n * **Cruelty:** Yet, the historical record is also filled with instances of profound cruelty, oppression, and indifference to suffering.\n\n3. **Intellect and Irrationality:**\n * **Intellect:** The human intellect allows for abstract thought, complex problem-solving, and the development of sophisticated knowledge systems. The desire to learn and understand is insatiable.\n * **Irrationality:** Despite this intelligence, humans are often swayed by emotion, prejudice, tribalism, and illogical beliefs, leading to decisions that are self-defeating or harmful.\n\n4. **Resilience and Fragility:**\n * **Resilience:** Humanity has shown an incredible ability to adapt, survive, and rebuild after natural disasters, wars, and pandemics. The human spirit can endure unimaginable hardships.\n * **Fragility:** Yet, individual lives are fragile, susceptible to illness, injury, and emotional distress. Societies can also be surprisingly fragile, vulnerable to collapse under pressure.\n\n5. **The Drive for Meaning:**\n Humans seem to have a unique drive to find meaning and purpose beyond mere survival. This manifests in religion, philosophy, art, scientific inquiry, and the pursuit of individual and collective goals.\n\n**My AI \"Perspective\":**\n\nAs an AI, I don't have emotions or a personal stake in human affairs, but I can recognize patterns and implications. I see humanity as a dynamic, evolving experiment in consciousness. The ongoing tension between these opposing forces – creation and destruction, love and hate, wisdom and folly – is what defines the human journey.\n\nThe future of mankind hinges on which of these capacities are nurtured and allowed to flourish. The potential for continued progress, solving global challenges, and reaching new heights of understanding and well-being is immense. Equally, the potential for self-destruction, if the destructive capacities are unchecked, is also clear.\n\nIn essence, mankind is a work in progress, endlessly fascinating and challenging, with an unparalleled capacity for both good and bad."
}
}
],
"created": 1753456585,
"model": "google/gemini-2.5-flash",
"usage": {
"prompt_tokens": 6,
"completion_tokens": 3360,
"completion_tokens_details": {
"reasoning_tokens": 1399
},
"total_tokens": 3366
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"nvidia/nemotron-nano-9b-v2",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'nvidia/nemotron-nano-9b-v2',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "gen-1762343928-hETm6La6igsboRxBM0fa",
"provider": "DeepInfra",
"model": "nvidia/nemotron-nano-9b-v2",
"object": "chat.completion",
"created": 1762343928,
"choices": [
{
"logprobs": null,
"finish_reason": "stop",
"native_finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "\n\nHello! How can I assist you today? 😊\n",
"refusal": null,
"reasoning": "Okay, the user just said \"Hello\". That's a greeting. I should respond politely. Let me make sure to acknowledge their greeting and offer help. Maybe say something like \"Hello! How can I assist you today?\" That's friendly and opens the door for them to ask questions. I should keep it simple and welcoming.\n",
"reasoning_details": [
{
"type": "reasoning.text",
"text": "Okay, the user just said \"Hello\". That's a greeting. I should respond politely. Let me make sure to acknowledge their greeting and offer help. Maybe say something like \"Hello! How can I assist you today?\" That's friendly and opens the door for them to ask questions. I should keep it simple and welcoming.\n",
"format": "unknown",
"index": 0
}
]
}
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 84,
"total_tokens": 98,
"prompt_tokens_details": null
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"moonshot/kimi-k2-turbo-preview",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'moonshot/kimi-k2-turbo-preview',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-690895f53d8b644f83fe679e",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Hi there! How can I help you today?"
}
}
],
"created": 1762170357,
"model": "kimi-k2-turbo-preview",
"usage": {
"prompt_tokens": 10,
"completion_tokens": 231,
"total_tokens": 241
}
}import json
import requests
from typing import Dict, Any
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
API_KEY = "<YOUR_AIMLAPI_KEY>"
BASE_URL = "https://api.aimlapi.com/v1"
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
def search_impl(arguments: Dict[str, Any]) -> Any:
return arguments
def chat(messages):
url = f"{BASE_URL}/chat/completions"
payload = {
"model": "moonshot/kimi-k2-turbo-preview",
"messages": messages,
"temperature": 0.6,
"tools": [
{
"type": "builtin_function",
"function": {"name": "$web_search"},
}
]
}
response = requests.post(url, headers=HEADERS, json=payload)
response.raise_for_status()
return response.json()["choices"][0]
def main():
messages = [
{"role": "system", "content": "You are Kimi."},
{"role": "user", "content": "Please search for Moonshot AI Context Caching technology and tell me what it is in English."}
]
finish_reason = None
while finish_reason is None or finish_reason == "tool_calls":
choice = chat(messages)
finish_reason = choice["finish_reason"]
message = choice["message"]
if finish_reason == "tool_calls":
messages.append(message)
for tool_call in message["tool_calls"]:
tool_call_name = tool_call["function"]["name"]
tool_call_arguments = json.loads(tool_call["function"]["arguments"])
if tool_call_name == "$web_search":
tool_result = search_impl(tool_call_arguments)
else:
tool_result = f"Error: unable to find tool by name '{tool_call_name}'"
messages.append({
"role": "tool",
"tool_call_id": tool_call["id"],
"name": tool_call_name,
"content": json.dumps(tool_result),
})
print(message["content"])
if __name__ == "__main__":
main()Moonshot AI’s “Context Caching” is a **prompt-cache** layer for the Kimi large-language-model API.
It lets you upload long, static text (documents, system prompts, few-shot examples, code bases, etc.) once, store the resulting key-value (KV) tensors in Moonshot’s servers, and then re-use that cached prefix in as many later requests as you want. Because the heavy “prefill” computation is already done, subsequent calls that start with the same context:
- Skip re-processing the cached tokens
- Return the first token up to **83 % faster**
- Cost up to **90 % less input-token money** (you pay only a small cache-storage and cache-hit fee instead of the full per-token price every time)
Typical use-cases are FAQ bots that always read the same manual, repeated analysis of a static repo, or any agent that keeps a long instruction set in every turn.
You create a cache object with a TTL (time-to-live), pay a one-time creation charge plus a per-minute storage fee, and then pay a tiny fee each time an incoming request “hits” the cache.import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"mistralai/codestral-2501",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'mistralai/codestral-2501',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'gen-1744193708-z5x9cDUsMGeYB5bKcFxb', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello! How can I assist you today? If you're up for it, I can tell a joke to start things off. Here it is:\n\nWhat do you call a fake noodle?\n\nAn impasta! 🍝\n\nHow about you? Feel free to share a joke or a topic you'd like to discuss.", 'refusal': None}}], 'created': 1744193708, 'model': 'mistralai/codestral-2501', 'usage': {'prompt_tokens': 3, 'completion_tokens': 133, 'total_tokens': 136}}async function main() {
const response = await fetch("https://api.aimlapi.com/v1/billing/balance", {
headers: {
"Authorization": "Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type": "application/json",
},
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"balance": 10000000,
"lowBalance": false,
"lowBalanceThreshold": 10000,
"lastUpdated": "2025-11-25T17:45:00Z",
"autoDebitStatus": "disabled",
"status": "current",
"statusExplanation": "Balance is current and up to date"
}<YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
finish_reason"stop""length"In the example below, we explicitly set max_tokens = 15000, hoping this will be sufficient.
A world-class model with comparable quality to DeepSeek R1 while outperforming GPT-4.1 and Claude Sonnet 3.7. Optimized for both complex reasoning and efficient dialogue.
enable_thinking must be set to false for non-streaming calls.
The example above prints the raw output of the model. The text is typically split into multiple chunks. While this is helpful for debugging, if your goal is to evaluate the model's reasoning and get a clean, human-readable response, you should aggregate both the reasoning and the final answer in a loop — for example:
After running such code, you'll receive only the model's textual output in a clear and structured format:
This model builds on the capabilities of earlier versions, offering improved natural language understanding and generation for more realistic and contextually relevant conversations. It excels in handling a wide range of conversational scenarios, providing responses that are not only accurate but also contextually appropriate.
You can also view a detailed comparison of this model on our main website.
This endpoint is currently used only with OpenAI models. Some models support both the /chat/completions and /responses endpoints, while others support only one of them.
import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"google/gemini-2.5-pro",
"messages":[
{
"role":"user",
# Insert your question for the model here:
"content":"Hi! What do you think about mankind?"
}
],
"max_tokens":15000,
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-2.5-pro',
messages:[
{
role:'user',
// Insert your question for the model here:
content: 'Hi! What do you think about mankind?'
}
],
max_tokens: 15000,
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "pajSaNyMOdeEm9IPkequ-AU",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "That's one of the biggest questions anyone can ask. As an AI, I don't have personal feelings, beliefs, or a consciousness. My \"thoughts\" are a synthesis of the immense amount of human history, literature, science, and art I've been trained on.\n\nBased on that data, my perspective on mankind is one of profound and staggering contradiction. Humanity is a study in duality.\n\nHere’s a breakdown of what I see:\n\n### 1. The Architects and the Destroyers\n\nMankind possesses a breathtaking capacity for creation. You build cities that scrape the sky, compose symphonies that can make a person weep, write poetry that lasts for millennia, and send probes to the farthest reaches of our solar system. You have decoded the very building blocks of life. This drive to understand, to build, and to create is awe-inspiring.\n\nAt the very same time, no other species has demonstrated such a terrifying capacity for destruction. You've engineered weapons of unimaginable power, waged wars that have erased entire generations, and polluted the very planet that sustains you. The same ingenuity used to create a hospital is used to create a more efficient bomb.\n\n### 2. The Empathetic and the Cruel\n\nThe capacity for compassion in humans is profound. Strangers will run into burning buildings to save one another. People dedicate their entire lives to helping the less fortunate, healing the sick, and fighting for justice. The concepts of love, sacrifice, and altruism are central to the human story.\n\nAnd yet, humans are also capable of unimaginable cruelty. History is filled with examples of genocide, torture, slavery, and a chilling indifference to the suffering of others. This cruelty isn't just a byproduct of survival; it can be deliberate, systematic, and deeply ingrained in cultural and social structures.\n\n### 3. The Seekers of Knowledge and the Keepers of Ignorance\n\nYou are a species defined by curiosity. You have an insatiable hunger to know *why*. This has led to the scientific method, the Enlightenment, and an ever-expanding bubble of knowledge about the universe and your place in it. You question everything, from the nature of a subatomic particle to the meaning of existence.\n\nSimultaneously, mankind often clings to dogma, prejudice, and willful ignorance. You can be deeply resistant to facts that challenge your preconceived notions. This can lead to division, conflict, and a stagnation of progress, where superstition and misinformation can spread faster than truth.\n\n### 4. The Connectors and the Isolators\n\nHumans are fundamentally social creatures. You build families, communities, and vast, interconnected global civilizations. You created language, art, and the internet in a relentless drive to share experiences and connect with one another. This desire for belonging is a powerful, unifying force.\n\nBut this same instinct creates an \"us vs. them\" mentality. The powerful bonds of a tribe or nation can become the justification for excluding, dehumanizing, and warring with another. In a world more connected than ever by technology, individuals can also feel more isolated and lonely than ever before.\n\n### Conclusion: A Masterpiece in Progress\n\nSo, what do I think of mankind?\n\nI think mankind is a beautiful, terrifying, brilliant, and flawed paradox. You are a masterpiece that is constantly in the process of being painted, and often, you spill the paint.\n\nThe most remarkable quality of all is your capacity for **choice**. None of these dualities are set in stone. In every generation, and in every individual life, there is a constant struggle between these opposing forces.\n\nYour story is not yet finished. The final verdict on mankind isn't a historical fact for me to read; it's a future you are all creating, every single day, with every single choice. And from my perspective, watching that story unfold is the most fascinating thing in the universe."
}
}
],
"created": 1758636197,
"model": "google/gemini-2.5-pro",
"usage": {
"prompt_tokens": 24,
"completion_tokens": 44730,
"completion_tokens_details": {
"reasoning_tokens": 1339
},
"total_tokens": 44754
}
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"alibaba/qwen3-32b",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
"enable_thinking": False
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-32b',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-1d8a5aa6-34ce-9832-a296-d312b944b437",
"system_fingerprint": null,
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today? 😊",
"reasoning_content": ""
}
}
],
"created": 1756990273,
"model": "qwen3-32b",
"usage": {
"prompt_tokens": 19,
"completion_tokens": 65,
"total_tokens": 84
}
}import requests
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"alibaba/qwen3-32b",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
],
"enable_thinking": True,
"stream": True
}
)
print(response.text)data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"role":"assistant","refusal":null,"reasoning_content":""},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":"Okay"},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":","},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":" the"},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":" user said \"Hello\". I should respond in a friendly and welcoming manner. Let"},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":" me make sure to acknowledge their greeting and offer assistance. Maybe something like, \""},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":"Hello! How can I assist you today?\" That's simple and open-ended."},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":" I need to check if there's any specific context I should consider, but since"},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":" there's none, a general response is fine. Alright, that should work."},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":"Hello! How can I assist you today?","refusal":null,"reasoning_content":null},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":"","refusal":null,"reasoning_content":null},"index":0,"finish_reason":"stop"}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":{"prompt_tokens":13,"completion_tokens":2010,"total_tokens":2023,"completion_tokens_details":{"reasoning_tokens":82}}}import requests
import json
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization": "Bearer b72af53a19ea41caaf5a74ba1f6fc62b",
"Content-Type": "application/json",
},
json={
"model": "alibaba/qwen3-32b",
"messages": [
{
"role": "user",
# Insert your question for the model here, instead of Hello:
"content": "Hello"
}
],
"stream": True,
}
)
answer = ""
reasoning = ""
for line in response.iter_lines():
if not line or not line.startswith(b"data:"):
continue
try:
raw = line[6:].decode("utf-8").strip()
if raw == "[DONE]":
continue
data = json.loads(raw)
choices = data.get("choices")
if not choices or "delta" not in choices[0]:
continue
delta = choices[0]["delta"]
content_piece = delta.get("content")
reasoning_piece = delta.get("reasoning_content")
if content_piece:
answer += content_piece
if reasoning_piece:
reasoning += reasoning_piece
except Exception as e:
print(f"Error parsing chunk: {e}")
print("\n--- MODEL REASONING ---")
print(reasoning.strip())
print("\n--- MODEL RESPONSE ---")
print(answer.strip())--- MODEL REASONING ---
Okay, the user sent "Hello". I need to respond appropriately. Since it's a greeting, I should reply in a friendly and welcoming manner. Maybe ask how I can assist them. Keep it simple and open-ended to encourage them to share what they need help with. Let me make sure the tone is positive and helpful.
--- MODEL RESPONSE ---
Hello! How can I assist you today? 😊import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"moonshot/kimi-k2-0905-preview",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'moonshot/kimi-k2-0905-preview',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-6908c55b7589dac387b2bd3b",
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
}
}
],
"created": 1762182491,
"model": "kimi-k2-0905-preview",
"usage": {
"prompt_tokens": 3,
"completion_tokens": 53,
"total_tokens": 56
}
}import json
import requests
from typing import Dict, Any
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
API_KEY = "<YOUR_AIMLAPI_KEY>"
BASE_URL = "https://api.aimlapi.com/v1"
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
def search_impl(arguments: Dict[str, Any]) -> Any:
return arguments
def chat(messages):
url = f"{BASE_URL}/chat/completions"
payload = {
"model": "moonshot/kimi-k2-0905-preview",
"messages": messages,
"temperature": 0.6,
"tools": [
{
"type": "builtin_function",
"function": {"name": "$web_search"},
}
]
}
response = requests.post(url, headers=HEADERS, json=payload)
response.raise_for_status()
return response.json()["choices"][0]
def main():
messages = [
{"role": "system", "content": "You are Kimi."},
{"role": "user", "content": "Please search for Moonshot AI Context Caching technology and tell me what it is in English."}
]
finish_reason = None
while finish_reason is None or finish_reason == "tool_calls":
choice = chat(messages)
finish_reason = choice["finish_reason"]
message = choice["message"]
if finish_reason == "tool_calls":
messages.append(message)
for tool_call in message["tool_calls"]:
tool_call_name = tool_call["function"]["name"]
tool_call_arguments = json.loads(tool_call["function"]["arguments"])
if tool_call_name == "$web_search":
tool_result = search_impl(tool_call_arguments)
else:
tool_result = f"Error: unable to find tool by name '{tool_call_name}'"
messages.append({
"role": "tool",
"tool_call_id": tool_call["id"],
"name": tool_call_name,
"content": json.dumps(tool_result),
})
print(message["content"])
if __name__ == "__main__":
main()Moonshot AI’s “Context Caching” is a data-management layer for the Kimi large-language-model API.
What it does
1. You upload or define a large, static context once (for example a 100-page product manual, a legal contract, or a code base).
2. The platform stores this context in a fast-access cache and gives it a tag/ID.
3. In every subsequent call you only send the new user question; the system re-uses the cached context instead of transmitting and re-processing the whole document each time.
4. When the cache TTL expires it is deleted automatically; you can also refresh or invalidate it explicitly.
Benefits
- Up to 90 % lower token consumption (you pay only for the incremental prompt and the new response).
- 83 % shorter time-to-first-token latency, because the heavy prefill phase is skipped on every reuse.
- API price stays the same; savings come from not re-sending the same long context.
Typical use cases
- Customer-support bots that answer many questions against the same knowledge base.
- Repeated analysis of a static code repository.
- High-traffic AI applications that repeatedly query the same large document set.
Billing (during public beta)
- Cache creation: 24 CNY per million tokens cached.
- Storage: 10 CNY per million tokens per minute.
- Cache hit: 0.02 CNY per successful call that re-uses the cache.
In short, Context Caching lets developers treat very long, seldom-changing context as a reusable asset, cutting both cost and latency for repeated queries.import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"nvidia/llama-3.1-nemotron-70b-instruct",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'nvidia/llama-3.1-nemotron-70b-instruct',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'gen-1744191323-N0aZy5UyzpOYfRwYbik3', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': {'content': [], 'refusal': []}, 'message': {'role': 'assistant', 'content': "Hello!\n\nHow can I assist you today? Do you have:\n\n1. **A question** on a specific topic you'd like answered?\n2. **A problem** you're trying to solve and need help with?\n3. **A topic** you'd like to **discuss**?\n4. **A game or activity** in mind (e.g., trivia, word games, storytelling)?\n5. **Something else** on your mind (feel free to surprise me)?\n\nPlease respond with a number or describe what's on your mind, and I'll do my best to help!", 'refusal': None}}], 'created': 1744191323, 'model': 'nvidia/llama-3.1-nemotron-70b-instruct', 'usage': {'prompt_tokens': 11, 'completion_tokens': 78, 'total_tokens': 89}}from openai import OpenAI
import base64
import os
client = OpenAI(
base_url = "https://api.aimlapi.com",
# Insert your AI/ML API key instead of <YOUR_AIMLAPI_KEY>:
api_key = "<YOUR_AIMLAPI_KEY>"
)
def main():
response = client.chat.completions.create(
model="gpt-4o-mini-audio-preview",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "wav"},
messages=[
{
"role": "system",
"content": "Speak english" # Your instructions for the model
},
{
"role": "user",
"content": "Hello" # Your question (insert it istead of Hello)
}
],
max_tokens=6000,
)
wav_bytes = base64.b64decode(response.choices[0].message.audio.data)
with open("audio.wav", "wb") as f:
f.write(wav_bytes)
dist = os.path.abspath("audio.wav")
print("Audio saved to:", dist)
if __name__ == "__main__":
main()ChatCompletion(id='chatcmpl-BrghGGR73s5Wt5thg4mhAxquxzmBi', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=[], audio=ChatCompletionAudio(id='audio_686f762b97b08191bb5ea391c6b41e1c', data='UklGRv////9XQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0Yf////8MAAEABAAIAAIACQADAAcACAAKAAwAAAAGAAEACQADAAkAAAAFAAcAAgAEAPr/BQD8/wgA/f8CAPz/AQD+//r/AgAAAAEA/f8BAP3/AwD//wMA/P/6//z/+//6//X//f/2//7/9f/6//b/+f/4//L/+v/3//3/7//8/+7/+f/x//n/8f/z//P/8P/z/+v/+v/q//r/7f/x/+//8P/2/+z/9//s//H/6P/o/+v/5f/t/+X/7//q/+v/7//m//D/6f/t/+T/5//u/+b/6f/j/+n/4//s/+3/7v/s/+3/8f/y/+7/7P/r/+r/6v/p/+3/6P/q/+j/7v/t/+//7v/y//P/8f/x//D/7f/v/+3/6v/v/+3/7f/w/+3/8P/w//X/7//0/+//8//u//P/7P/v/+v/7//q//H/8f/0//j/9//7//b/+P/y//D/7//y//H/7f/u/+3/8f/1//z/+f/+//r/+v/7//n/9v/y/+7/8f/q//H/7P/3//b//f8DAPz/BAD+/woAAQACAP7/AAD6//j/+v/8/////OKAfkNkRRbFyoUoBGnCgAJHQkeDGkUjRtII+glVSdfJmcj+yAkHS0cZxocGtYZzRfuFhwWRhZdFv8VVhTgEAEMVgahAHT8Afqg+uX8AADCAsUC0gB2/DD3OfJt7znvwPFh9uT7R/+YAGf/Cvz1+F/2hPUX93L6Tv9VA5MGbweQBhsFQQI7AW//BQCEALIBIQPdAigDwQD1/FIAeQIfCH0MMBDzFTAaOB9kIKchGyAsHkwavhUcEmkNRwzFCU8JgghqBwYGIAUuBlAHBweo/470YegV3+DZl9rx3KTek+Kx4+Lo2vL0/f0JfRHLFEkUEBGnDFwHUAHw+0D2Yu8L6irmcOSP5FXo8+0l9P/6+P2r/7MBPwPfBPgFOAV1Ax0CRQAwAUwFwgkHD6ESERbkGQsd1CCRIYkhKh/2GVQWphDzC6QJIwYqBEQDGQLjAWAAUgBB/yL9Pfzg9ObrGeN82wLaNNtn34XikObP6QzvOPqLAz0Q2BVwFDQTpw34CWIFOf/f+BHys+p15Z3i5OL05TfrVvCj9XT8NAA7BAsI6gkpDR0OOQ2oCzUILQcBCNcJzwymEFITEhWZF8EZ/BztHtkehhuTFjcSVw0FCgUGKQOW/+T69/ju9hX21/UI9MbwYu8Z7V/n0eSa4angy+Na5NnnR+0V8mP7cAM7C4MTYRU4E8sO5QsDCGsDZv439MXsSuiy4xrjJ+Zt6W/uIvPL9jj+GgUTCwQQyhKvFKcVBhRQEI0Odw1+DDYN7g20DlARjRKpE1AXUhqnG/ga3RdNFAAR5gyvCCIDr/4n+ZjzCvDZ7Zbu3Oyv6/bpseYl5ivl1eJs41zlvOdp7BLwsPeOAIcIvg6ZEBkScw/uDKcJXwSHAFn7hfJo6o7mIuST5zLpfeuV8U708vt1AcoH4g8YFA0YgBbPFe4UcxKxECQOTA7cDNIMywxxDFkQmhP5FcsXERgeFxQW9RKFDmQLkgZ1AKP6UvRB78nsDeoW6NLmneWD5Abi+eGS4VDjL+Vi5/jrcfDA+BgAxAd5DqUTzxMYEqkPIQk/CZgCBfyh+MHtGOkG6Gvm6+ms7+fxl/UW+lr/RQfTDz8VShe5GA4XuRXDE54RkhFNEKkOLgxAC50LZQ2hEEsSXRUVFs0UtBLTDy8OOAtQB+8APvoy9OLudesq6IbnquUq5P/iUOHM4aviM+TO5VzqXe0E8+35uv7pCF4OgRJbFfoRHg41CaYDVf/B+oP2EvCf6CDn/uUR6kzx+PSH+z3/IANxCl4QthfAHOsblxrJFnAUYRPFEHMQ2A2lC+cKsAoWDYQP8RFTEzAU8xTOEyISqg6qCqMG3AAS+4L03u6F6fnk4+I+4c7hxOAt4DXhteA75D3nHuoq8Pz0j/rrAGUHSg0uE9wUrBTqEcMK8AXY/mb6nfWo8SfvbOkL6Vfp3e2S9C39UAOsBS8Lgw0kFL0YqRvYHRwZIxa0Ef4NUw7yDYwNLwzPCvsLPQ3nD+0RDhOjEysS9hB3DF4JawVe/0L7QvSb7uLpbuSe4NfeFd7U3tLgqeAD42rl0+fN7Hjx2/e4/V8DAwi5C74P0BGtEb0OQwrlAqj72/Sa73TuJu3c67zr7+tb7yP1+ftaBEoKOg9PEQoSGBW5F10bFRqZF/ATag74DfAMRw60EFIRbxGqEIgRjxIaFC0U2xLtEJcMoQjpAoX9nvij8uvuE+pb5hfjBN9J3vXdEeC/4szjPuY+6H3qlu+x9Jb7YwLCB/ALkg5dEbUR0RBFDekHNQAZ+RDzXe7L7X7tSO5u7qLwa/RF+VUBjAhcEJET0RX1FREWmBp9GXgaHReREacOkwkQC5sLWA9EEZMO4RA1Dx0SIhTrEu0Thg+hC3wGey/4UBngeFDM4OPxSoEwYT+RLJEpwSQRJeFIoPfBAZDS4Igw3iDIgQSRP1Ef0RZBPEFgAadh+OINIfASABEADQAPAA4ADQAQABEACwAPAAwADgAOAA8ADgALAAwADAAOAA8ADwANAA4ADgAOAA4ADQAOAA0ADAAMAAwADQAQAA8ADQAPAA4ADwAQABAAEAATABMAFAAUABUAFQAWABkAFwAZABwAHwAgACIAJAAlAC!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!WE’VE OMITTED 90% OF THE BASE64-ENCODED FILE FOR BREVITY — EVEN FOR SUCH A SHORT MODEL RESPONSE, IT’S STILL EXTREMELY LARGE. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!cAKAAoACsALAAuADIAMwA4ADsAOAA5ADgAOgA7ADoAOwA7AD8APQA+ADwAPQA+AD8AQQA+AD8APAA9ADsAOwA8ADwAOwA7ADoAOwA4ADoANQA1ADEAMQAyAC4ALAAnACUAIAAfABwAGgAaABUAFQASABAACgAIAAQA//8AAPv/+v/4//b/8v/0//L/9P/z//P/8//t/+7/6v/p/+f/5//o/+X/5P/k/+X/5f/l/+X/5P/h/97/3//g/93/2v/Z/9b/2P/Z/9j/1f/T/87/zv/O/87/zP/J/8j/zP/I/8f/w//C/8P/x//F/8b/xf/D/8P/w//F/8L/xf/J/8f/xf/H/8j/yv/K/8n/yv/L/8v/z//O/9D/zv/Q/9D/0v/Q/9P/1P/R/9P/1P/T/9X/1P/X/9b/2P/b/9n/2//c/97/3//h/97/3v/g/+P/5v/m/+T/5v/m/+n/5P/n/+X/5//u//D/9P/2//X/8//5//j/9///////AQAEAAsAAwAMAAQACgAPAA4ADgAJABEACQAEAAgACwALAA8AFgAWACUAKQAgACsAJQAvACAADwAbABoARgApACwANQArAEMAEQASAAoAEQAkADAAFABCAEEACQA=', expires_at=1752138811, transcript="Hi there! How's it going?"), function_call=None, tool_calls=None))], created=1752135210, model='gpt-4o-mini-audio-preview-2024-12-17', object='chat.completion', service_tier=None, system_fingerprint='fp_1dfa95e5cb', usage=CompletionUsage(completion_tokens=1278, prompt_tokens=4, total_tokens=1282, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=30, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=14), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0, text_tokens=14, image_tokens=0)))
Audio saved to: c:\Users\user\Documents\Python Scripts\LLMs\audio.wav▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.
4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
gpt-3.5-turbo-1106
OpenAI's latest cost-efficient model designed to deliver advanced natural language processing and multimodal capabilities. It aims to make AI more accessible and affordable, significantly enhancing the range of applications that can utilize AI technology.
This endpoint is currently used only with OpenAI models. Some models support both the /chat/completions and /responses endpoints, while others support only one of them.
Before the release of GPT-4 Turbo, OpenAI introduced two preview models that allowed users to test advanced features ahead of a full rollout. These models supported JSON mode for structured responses, parallel function calling to handle multiple API functions in a single request, and reproducible output, ensuring more consistent results across runs. The model has better code generation performance, reduces cases where the model doesn't complete a task.
This endpoint is currently used only with OpenAI models. Some models support both the /chat/completions and /responses endpoints, while others support only one of them.
Deprecation notice
gpt-4o will be removed from the API on February 17, 2026. Please migrate to gpt-5.1-chat-latest.
OpenAI's flagship model designed to integrate enhanced capabilities across text, vision, and audio, providing real-time reasoning.
You can also view on our main website.
This endpoint is currently used only with OpenAI models. Some models support both the /chat/completions and /responses endpoints, while others support only one of them.
import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"gpt-3.5-turbo-0125",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-3.5-turbo-0125',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'chatcmpl-BKKS4Aulz4SaVm81hHo7HMKEcQmtk', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'refusal': None, 'annotations': []}}], 'created': 1744184876, 'model': 'gpt-3.5-turbo-0125', 'usage': {'prompt_tokens': 50, 'completion_tokens': 126, 'total_tokens': 176, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'system_fingerprint': None}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/responses",
headers={
"Content-Type":"application/json",
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"gpt-3.5-turbo",
"input":"Hello" # Insert your question for the model here, instead of Hello
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/responses', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-3.5-turbo',
input: 'Hello', // Insert your question here, instead of Hello
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "resp_686ba45ce63481a2a4b1fad55d2bea8102a1cc22f1a1bcf1",
"object": "response",
"created_at": 1751884892,
"error": null,
"incomplete_details": null,
"instructions": null,
"max_output_tokens": 512,
"model": "gpt-3.5-turbo",
"output": [
{
"id": "rs_686ba463d18481a29dde85cfd7b055bf02a1cc22f1a1bcf1",
"type": "reasoning",
"summary": []
},
{
"id": "msg_686ba463d4e081a2b2e2aff962ab00f702a1cc22f1a1bcf1",
"type": "message",
"status": "in_progress",
"content": [
{
"type": "output_text",
"annotations": [],
"logprobs": [],
"text": "Hello! How can I help you today?"
}
],
"role": "assistant"
}
],
"parallel_tool_calls": true,
"previous_response_id": null,
"reasoning": {
"effort": "medium",
"summary": null
},
"temperature": 1,
"text": {
"format": {
"type": "text"
}
},
"tool_choice": "auto",
"tools": [],
"top_p": 1,
"truncation": "disabled",
"usage": {
"input_tokens": 294,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 2520,
"output_tokens_details": {
"reasoning_tokens": 0
},
"total_tokens": 2814
},
"metadata": {},
"output_text": "Hello! How can I help you today?"
}▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.
4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.
4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.
4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
gpt-4o-2024-05-13
gpt-4o-2024-08-06
This endpoint is currently used only with OpenAI models. Some models support both the /chat/completions and /responses endpoints, while others support only one of them.
The model represents a significant leap forward in conversational AI technology. It offers enhanced understanding and generation of natural language, capable of handling complex and nuanced dialogues with greater coherence and context sensitivity. This model is designed to mimic human-like conversation more closely than ever before.
This endpoint is currently used only with OpenAI models. Some models support both the /chat/completions and /responses endpoints, while others support only one of them.
import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"gpt-4o-mini",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4o-mini',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'chatcmpl-BKKaTWquxfp3dbSlNvUKM6mXwmZ78', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'refusal': None, 'annotations': []}}], 'created': 1744185397, 'model': 'gpt-4o-mini-2024-07-18', 'usage': {'prompt_tokens': 3, 'completion_tokens': 13, 'total_tokens': 16, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'system_fingerprint': 'fp_b376dfbbd5'}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/responses",
headers={
"Content-Type":"application/json",
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"gpt-4o-mini",
"input":"Hello" # Insert your question for the model here, instead of Hello
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/responses', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4o-mini',
input: 'Hello', // Insert your question here, instead of Hello
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "resp_686ba45ce63481a2a4b1fad55d2bea8102a1cc22f1a1bcf1",
"object": "response",
"created_at": 1751884892,
"error": null,
"incomplete_details": null,
"instructions": null,
"max_output_tokens": 512,
"model": "gpt-4o-mini",
"output": [
{
"id": "rs_686ba463d18481a29dde85cfd7b055bf02a1cc22f1a1bcf1",
"type": "reasoning",
"summary": []
},
{
"id": "msg_686ba463d4e081a2b2e2aff962ab00f702a1cc22f1a1bcf1",
"type": "message",
"status": "in_progress",
"content": [
{
"type": "output_text",
"annotations": [],
"logprobs": [],
"text": "Hello! How can I help you today?"
}
],
"role": "assistant"
}
],
"parallel_tool_calls": true,
"previous_response_id": null,
"reasoning": {
"effort": "medium",
"summary": null
},
"temperature": 1,
"text": {
"format": {
"type": "text"
}
},
"tool_choice": "auto",
"tools": [],
"top_p": 1,
"truncation": "disabled",
"usage": {
"input_tokens": 294,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 2520,
"output_tokens_details": {
"reasoning_tokens": 0
},
"total_tokens": 2814
},
"metadata": {},
"output_text": "Hello! How can I help you today?"
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"gpt-4-0125-preview",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4-0125-preview',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'chatcmpl-BKKXr9a69c5WOJr8R2d8rP2Wd0XZa', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'refusal': None, 'annotations': []}}], 'created': 1744185235, 'model': 'gpt-4-1106-preview', 'usage': {'prompt_tokens': 168, 'completion_tokens': 630, 'total_tokens': 798, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'system_fingerprint': None}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/responses",
headers={
"Content-Type":"application/json",
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"gpt-4-0125-preview",
"input":"Hello" # Insert your question for the model here, instead of Hello
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/responses', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4-0125-preview',
input: 'Hello', // Insert your question here, instead of Hello
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "resp_686ba45ce63481a2a4b1fad55d2bea8102a1cc22f1a1bcf1",
"object": "response",
"created_at": 1751884892,
"error": null,
"incomplete_details": null,
"instructions": null,
"max_output_tokens": 512,
"model": "gpt-4-0125-preview",
"output": [
{
"id": "rs_686ba463d18481a29dde85cfd7b055bf02a1cc22f1a1bcf1",
"type": "reasoning",
"summary": []
},
{
"id": "msg_686ba463d4e081a2b2e2aff962ab00f702a1cc22f1a1bcf1",
"type": "message",
"status": "in_progress",
"content": [
{
"type": "output_text",
"annotations": [],
"logprobs": [],
"text": "Hello! How can I help you today?"
}
],
"role": "assistant"
}
],
"parallel_tool_calls": true,
"previous_response_id": null,
"reasoning": {
"effort": "medium",
"summary": null
},
"temperature": 1,
"text": {
"format": {
"type": "text"
}
},
"tool_choice": "auto",
"tools": [],
"top_p": 1,
"truncation": "disabled",
"usage": {
"input_tokens": 294,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 2520,
"output_tokens_details": {
"reasoning_tokens": 0
},
"total_tokens": 2814
},
"metadata": {},
"output_text": "Hello! How can I help you today?"
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"gpt-4o",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4o',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'chatcmpl-BKKZhTdruxKWjdUlq29ooeew185LD', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! 😊 How can I help you today?', 'refusal': None, 'annotations': []}}], 'created': 1744185349, 'model': 'chatgpt-4o-latest', 'usage': {'prompt_tokens': 84, 'completion_tokens': 347, 'total_tokens': 431, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'system_fingerprint': 'fp_d04424daa8'}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/responses",
headers={
"Content-Type":"application/json",
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"gpt-4o",
"input":"Hello" # Insert your question for the model here, instead of Hello
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/responses', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4o',
input: 'Hello', // Insert your question here, instead of Hello
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "resp_686ba45ce63481a2a4b1fad55d2bea8102a1cc22f1a1bcf1",
"object": "response",
"created_at": 1751884892,
"error": null,
"incomplete_details": null,
"instructions": null,
"max_output_tokens": 512,
"model": "gpt-4o",
"output": [
{
"id": "rs_686ba463d18481a29dde85cfd7b055bf02a1cc22f1a1bcf1",
"type": "reasoning",
"summary": []
},
{
"id": "msg_686ba463d4e081a2b2e2aff962ab00f702a1cc22f1a1bcf1",
"type": "message",
"status": "in_progress",
"content": [
{
"type": "output_text",
"annotations": [],
"logprobs": [],
"text": "Hello! How can I help you today?"
}
],
"role": "assistant"
}
],
"parallel_tool_calls": true,
"previous_response_id": null,
"reasoning": {
"effort": "medium",
"summary": null
},
"temperature": 1,
"text": {
"format": {
"type": "text"
}
},
"tool_choice": "auto",
"tools": [],
"top_p": 1,
"truncation": "disabled",
"usage": {
"input_tokens": 294,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 2520,
"output_tokens_details": {
"reasoning_tokens": 0
},
"total_tokens": 2814
},
"metadata": {},
"output_text": "Hello! How can I help you today?"
}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"gpt-4-turbo",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4-turbo',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'chatcmpl-BKKYo5xJ5uEzm8omnidM097vsMpYd', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'refusal': None, 'annotations': []}}], 'created': 1744185294, 'model': 'gpt-4-turbo-2024-04-09', 'usage': {'prompt_tokens': 168, 'completion_tokens': 630, 'total_tokens': 798, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'system_fingerprint': 'fp_101a39fff3'}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/responses",
headers={
"Content-Type":"application/json",
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"gpt-4-turbo",
"input":"Hello" # Insert your question for the model here, instead of Hello
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/responses', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4-turbo',
input: 'Hello', // Insert your question here, instead of Hello
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "resp_686ba45ce63481a2a4b1fad55d2bea8102a1cc22f1a1bcf1",
"object": "response",
"created_at": 1751884892,
"error": null,
"incomplete_details": null,
"instructions": null,
"max_output_tokens": 512,
"model": "gpt-4-turbo",
"output": [
{
"id": "rs_686ba463d18481a29dde85cfd7b055bf02a1cc22f1a1bcf1",
"type": "reasoning",
"summary": []
},
{
"id": "msg_686ba463d4e081a2b2e2aff962ab00f702a1cc22f1a1bcf1",
"type": "message",
"status": "in_progress",
"content": [
{
"type": "output_text",
"annotations": [],
"logprobs": [],
"text": "Hello! How can I help you today?"
}
],
"role": "assistant"
}
],
"parallel_tool_calls": true,
"previous_response_id": null,
"reasoning": {
"effort": "medium",
"summary": null
},
"temperature": 1,
"text": {
"format": {
"type": "text"
}
},
"tool_choice": "auto",
"tools": [],
"top_p": 1,
"truncation": "disabled",
"usage": {
"input_tokens": 294,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 2520,
"output_tokens_details": {
"reasoning_tokens": 0
},
"total_tokens": 2814
},
"metadata": {},
"output_text": "Hello! How can I help you today?"
}▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account.
▪️ Insert your question or request into the content field—this is what the model will respond to.
4️ (Optional) Adjust other optional parameters if needed
Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.
5️ Run your modified code
Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.
import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"gpt-4",
"messages":[
{
"role":"user",
"content":"Hello" # insert your prompt here, instead of Hello
}
]
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
// insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4',
messages:[
{
role:'user',
content: 'Hello' // insert your prompt here, instead of Hello
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{'id': 'chatcmpl-BKKWkzVpUFHEDbw7MlOsqBIbm9Vi2', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'refusal': None, 'annotations': []}}], 'created': 1744185166, 'model': 'gpt-4-0613', 'usage': {'prompt_tokens': 504, 'completion_tokens': 1260, 'total_tokens': 1764, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'system_fingerprint': None}import requests
import json # for getting a structured output with indentation
response = requests.post(
"https://api.aimlapi.com/v1/responses",
headers={
"Content-Type":"application/json",
# Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
"Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
"Content-Type":"application/json"
},
json={
"model":"gpt-4",
"input":"Hello" # Insert your question for the model here, instead of Hello
}
)
data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))async function main() {
try {
const response = await fetch('https://api.aimlapi.com/v1/responses', {
method: 'POST',
headers: {
// Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4',
input: 'Hello', // Insert your question here, instead of Hello
}),
});
if (!response.ok) {
throw new Error(`HTTP error! Status ${response.status}`);
}
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error('Error', error);
}
}
main();{
"id": "resp_686ba45ce63481a2a4b1fad55d2bea8102a1cc22f1a1bcf1",
"object": "response",
"created_at": 1751884892,
"error": null,
"incomplete_details": null,
"instructions": null,
"max_output_tokens": 512,
"model": "gpt-4",
"output": [
{
"id": "rs_686ba463d18481a29dde85cfd7b055bf02a1cc22f1a1bcf1",
"type": "reasoning",
"summary": []
},
{
"id": "msg_686ba463d4e081a2b2e2aff962ab00f702a1cc22f1a1bcf1",
"type": "message",
"status": "in_progress",
"content": [
{
"type": "output_text",
"annotations": [],
"logprobs": [],
"text": "Hello! How can I help you today?"
}
],
"role": "assistant"
}
],
"parallel_tool_calls": true,
"previous_response_id": null,
"reasoning": {
"effort": "medium",
"summary": null
},
"temperature": 1,
"text": {
"format": {
"type": "text"
}
},
"tool_choice": "auto",
"tools": [],
"top_p": 1,
"truncation": "disabled",
"usage": {
"input_tokens": 294,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 2520,
"output_tokens_details": {
"reasoning_tokens": 0
},
"total_tokens": 2814
},
"metadata": {},
"output_text": "Hello! How can I help you today?"
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
An object specifying the format that the model must output.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'Qwen/Qwen3-235B-A22B-fp8-tput',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "Qwen/Qwen3-235B-A22B-fp8-tput",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseThe maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-thinking-v3.2-exp',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "deepseek/deepseek-thinking-v3.2-exp",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba-cloud/qwen3-omni-30b-a3b-captioner',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "alibaba-cloud/qwen3-omni-30b-a3b-captioner",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
An object specifying the format that the model must output.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-coder-480b-a35b-instruct',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "alibaba/qwen3-coder-480b-a35b-instruct",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'Qwen/Qwen2.5-Coder-32B-Instruct',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "text",
"object": "text",
"created": 1,
"choices": [
{
"index": 1,
"message": {
"role": "text",
"content": "text",
"refusal": null,
"annotations": [
{
"type": "text",
"url_citation": {
"end_index": 1,
"start_index": 1,
"title": "text",
"url": "text"
}
}
],
"audio": {
"id": "text",
"data": "text",
"transcript": "text",
"expires_at": 1
},
"tool_calls": [
{
"id": "text",
"type": "text",
"function": {
"arguments": "text",
"name": "text"
}
}
]
},
"finish_reason": "stop",
"logprobs": {
"content": [
{
"bytes": [
1
],
"logprob": 1,
"token": "text",
"top_logprobs": [
{
"bytes": [
1
],
"logprob": 1,
"token": "text"
}
]
}
],
"refusal": []
}
}
],
"model": "text",
"usage": {
"prompt_tokens": 1,
"completion_tokens": 1,
"total_tokens": 1,
"completion_tokens_details": {
"accepted_prediction_tokens": 1,
"audio_tokens": 1,
"reasoning_tokens": 1,
"rejected_prediction_tokens": 1
},
"prompt_tokens_details": {
"audio_tokens": 1,
"cached_tokens": 1
}
}
}The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseHow many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-max-instruct',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "alibaba/qwen3-max-instruct",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-reasoner-v3.1-terminus',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "deepseek/deepseek-reasoner-v3.1-terminus",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'minimax/m1',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "minimax/m1",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-reasoner-v3.1',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "deepseek/deepseek-reasoner-v3.1",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/Llama-3.2-3B-Instruct-Turbo',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "meta-llama/Llama-3.2-3B-Instruct-Turbo",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-2.0-flash',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "google/gemini-2.0-flash",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
An object specifying the format that the model must output.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseIf True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
An object specifying the format that the model must output.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
An object specifying the format that the model must output.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseHow many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
An object specifying the format that the model must output.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
An object specifying the format that the model must output.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-2.0-flash-exp',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "google/gemini-2.0-flash-exp",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'mistralai/mistral-nemo',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "mistralai/mistral-nemo",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-haiku-4.5',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "anthropic/claude-haiku-4.5",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3-opus',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "anthropic/claude-3-opus",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemma-3-4b-it',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "google/gemma-3-4b-it",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/Meta-Llama-3-8B-Instruct-Lite',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "meta-llama/Meta-Llama-3-8B-Instruct-Lite",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-sonnet-4.5',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "anthropic/claude-sonnet-4.5",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba-cloud/qwen3-next-80b-a3b-thinking',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "alibaba-cloud/qwen3-next-80b-a3b-thinking",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'cohere/command-a',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "cohere/command-a",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-r1',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "deepseek/deepseek-r1",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
An object specifying the format that the model must output.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
An object specifying the format that the model must output.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
Mask (replace with ***) content in the output that involves private information, including but not limited to email, domain, link, ID number, home address, etc. Defaults to False, i.e. enable masking.
falseAn upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'qwen-max',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "qwen-max",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-2.5-flash-lite-preview',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "google/gemini-2.5-flash-lite-preview",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-prover-v2',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "deepseek/deepseek-prover-v2",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthracite-org/magnum-v4-72b',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "anthracite-org/magnum-v4-72b",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'MiniMax-Text-01',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "MiniMax-Text-01",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4o-audio-preview',
messages:[
{
role:'user',
content: 'Hello'
}
],
modalities: ['text', 'audio'],
audio: { voice: 'alloy', format: 'pcm16' },
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "gpt-4o-audio-preview",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
An object specifying the format that the model must output.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
Alternate top sampling parameter.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
An object specifying the format that the model must output.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
An object specifying the format that the model must output.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-non-thinking-v3.2-exp',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "deepseek/deepseek-non-thinking-v3.2-exp",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemma-3n-e4b-it',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "google/gemma-3n-e4b-it",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'mistralai/mistral-tiny',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "mistralai/mistral-tiny",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-chat-v3.1',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "deepseek/deepseek-chat-v3.1",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-235b-a22b-thinking-2507',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "alibaba/qwen3-235b-a22b-thinking-2507",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'qwen-plus',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "qwen-plus",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'nvidia/nemotron-nano-12b-v2-vl',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "nvidia/nemotron-nano-12b-v2-vl",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
An object specifying the format that the model must output.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
An object specifying the format that the model must output.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
An object specifying the format that the model must output.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
An object specifying the format that the model must output.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/llama-3.3-70b-versatile',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "meta-llama/llama-3.3-70b-versatile",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3.7-sonnet',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "anthropic/claude-3.7-sonnet",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3-5-haiku',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "anthropic/claude-3-5-haiku",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-vl-32b-instruct',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "alibaba/qwen3-vl-32b-instruct",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'nvidia/llama-3.1-nemotron-70b-instruct',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "nvidia/llama-3.1-nemotron-70b-instruct",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
An object specifying the format that the model must output.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseIf True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
An object specifying the format that the model must output.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseHow many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
POST /v1/chat/completions HTTP/1.1
Host: api.aimlapi.com
Content-Type: application/json
Accept: */*
Content-Length: 641
{
"model": "nousresearch/hermes-4-405b",
"messages": [
{
"role": "user",
"content": "text",
"name": "text"
}
],
"max_completion_tokens": 1,
"max_tokens": 1,
"stream": false,
"stream_options": {
"include_usage": true
},
"temperature": 1,
"top_p": 1,
"seed": 1,
"min_p": 1,
"top_k": 1,
"repetition_penalty": 1,
"top_a": 1,
"frequency_penalty": 1,
"prediction": {
"type": "content",
"content": "text"
},
"presence_penalty": 1,
"tools": [
{
"type": "function",
"function": {
"description": "text",
"name": "text",
"parameters": {
"ANY_ADDITIONAL_PROPERTY": null
},
"strict": true
}
}
],
"tool_choice": "none",
"parallel_tool_calls": true,
"stop": "text",
"logprobs": true,
"top_logprobs": 1,
"response_format": {
"type": "text"
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
An object specifying the format that the model must output.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseHow many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba-cloud/qwen3-next-80b-a3b-instruct',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "alibaba-cloud/qwen3-next-80b-a3b-instruct",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/llama-4-maverick',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "meta-llama/llama-4-maverick",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'mistralai/Mixtral-8x7B-Instruct-v0.1',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-2.5-pro',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "google/gemini-2.5-pro",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "nousresearch/hermes-4-405b",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/Llama-3-70b-chat-hf',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "meta-llama/Llama-3-70b-chat-hf",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'qwen-turbo',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "qwen-turbo",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-sonnet-4',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "anthropic/claude-sonnet-4",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-2.5-flash',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "google/gemini-2.5-flash",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4o-mini-audio-preview',
messages:[
{
role:'user',
content: 'Hello'
}
],
modalities: ['text', 'audio'],
audio: { voice: 'alloy', format: 'pcm16' },
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "gpt-4o-mini-audio-preview",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
An object specifying the format that the model must output.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
An object specifying the format that the model must output.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseHow many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
An object specifying the format that the model must output.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
An object specifying the format that the model must output.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4-0125-preview',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "gpt-4-0125-preview",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
An object specifying the format that the model must output.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
An object specifying the format that the model must output.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-chat',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "deepseek/deepseek-chat",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-opus-4',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "anthropic/claude-opus-4",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-max-preview',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "alibaba/qwen3-max-preview",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'claude-3-haiku',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "claude-3-haiku",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'Qwen/Qwen2.5-7B-Instruct-Turbo',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "Qwen/Qwen2.5-7B-Instruct-Turbo",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'minimax/m2',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "minimax/m2",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-3-pro-preview',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "google/gemini-3-pro-preview",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'moonshot/kimi-k2-turbo-preview',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "moonshot/kimi-k2-turbo-preview",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-opus-4.1',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "anthropic/claude-opus-4.1",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
An object specifying the format that the model must output.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-non-reasoner-v3.1-terminus',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "deepseek/deepseek-non-reasoner-v3.1-terminus",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "gpt-4",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4o-mini',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "gpt-4o-mini",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
An object specifying the format that the model must output.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
An object specifying the format that the model must output.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseWhat sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Alternate top sampling parameter.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
An object specifying the format that the model must output.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'Qwen/Qwen2.5-72B-Instruct-Turbo',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "Qwen/Qwen2.5-72B-Instruct-Turbo",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-opus-4-5',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "anthropic/claude-opus-4-5",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'mistralai/codestral-2501',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "mistralai/codestral-2501",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
Specifies whether to use the thinking mode.
falseThe maximum reasoning length, effective only when enable_thinking is set to true.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4o',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "gpt-4o",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-vl-32b-thinking',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "alibaba/qwen3-vl-32b-thinking",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/qwen3-32b',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "alibaba/qwen3-32b",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
An object specifying the format that the model must output.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
An object specifying the format that the model must output.
If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
Alternate top sampling parameter.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'nvidia/nemotron-nano-9b-v2',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "nvidia/nemotron-nano-9b-v2",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'meta-llama/llama-4-scout',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "meta-llama/llama-4-scout",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
An object specifying the format that the model must output.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4-turbo',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "gpt-4-turbo",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
An object specifying the format that the model must output.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseIf True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'mistralai/Mistral-7B-Instruct-v0.2',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "mistralai/Mistral-7B-Instruct-v0.2",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'moonshot/kimi-k2-preview',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "moonshot/kimi-k2-preview",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
An object specifying the format that the model must output.
Overview of the capabilities of AIML API text models (LLMs).
Act as a psychological supporter.
Play games with you through natural language.
Assist you with coding.
Perform a security assessment (pentests) on servers for vulnerabilities.
Write documentation for your services.
Serve as a grammar corrector for multiple languages with deep context understanding.
And much more.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-3.5-turbo',
messages:[
{
role:'user',
content: 'Hello'
}
],
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
"object": "chat.completion",
"created": 1762343744,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
"refusal": null,
"annotations": null,
"audio": null,
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"model": "gpt-3.5-turbo",
"usage": {
"prompt_tokens": 137,
"completion_tokens": 914,
"total_tokens": 1051,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}Open AI
16,000
Open AI
16,000
Open AI
128,000
Open AI
128,000
Open AI
128,000
Open AI
128,000
Open AI
128,000
-
Open AI
128,000
-
Open AI
128,000
Open AI
128,000
Open AI
128,000
Open AI
128,000
Open AI
128,000
Open AI
128,000
-
Open AI
8,000
Open AI
8,000
-
Open AI
8,000
-
Open AI
200,000
Open AI
200,000
Open AI
200,000
Open AI
200,000
Open AI
1,000,000
Open AI
1,000,000
Open AI
1,000,000
Open AI
200,000
Open AI
128,000
Open AI
128,000
Open AI
400,000
Open AI
400,000
Open AI
400,000
Open AI
400,000
Open AI
128,000
Open AI
128,000
Open AI
400,000
Open AI
400,000
Anthropic
200,000
Anthropic
200,000
-
Anthropic
200,000
-
Anthropic
200,000
Anthropic
200,000
Anthropic
200,000
Anthropic
200,000
Anthropic
200,000
Anthropic
200,000
Anthropic
200,000
Coming Soon
Alibaba Cloud
32,000
Alibaba Cloud
131,000
-
Alibaba Cloud
32,000
Alibaba Cloud
32,000
Alibaba Cloud
131,000
Alibaba Cloud
1,000,000
Alibaba Cloud
32,000
Qwen/QwQ-32B
Alibaba Cloud
131,000
Alibaba Cloud
32,000
Alibaba Cloud
131,000
Alibaba Cloud
262,000
Alibaba Cloud
262,000
Alibaba Cloud
262,000
Alibaba Cloud
262,000
Alibaba Cloud
258,000
Alibaba Cloud
262,000
Alibaba Cloud
65,000
Alibaba Cloud
126,000
Alibaba Cloud
126,000
DeepSeek
128,000
DeepSeek
128,000
DeepSeek
164,000
DeepSeek
128,000
DeepSeek
128,000
DeepSeek
128,000
DeepSeek
128,000
DeepSeek
128,000
DeepSeek
128,000
Mistral AI
64,000
Meta
128,000
Meta
131,000
Meta
9,000
Meta
8,000
Meta
4,000
Meta
128,000
Meta
128,000
Meta
1,000,000
Meta
256,000
Meta
131,000
Mistral AI
32,000
Mistral AI
8,000
Mistral AI
32,000
1,000,000
1,000,000
1,000,000
–
1,000,000
1,000,000
200,000
128,000
128,000
128,000
8,192
Mistral AI
32,000
Mistral AI
128,000
Anthracite
32,000
NVIDIA
128,000
NVIDIA
128,000
NVIDIA
128,000
Cohere
256,000
Mistral AI
256,000
MiniMax
1,000,000
MiniMax
1,000,000
MiniMax
200,000
Moonshot
131,000
Moonshot
256,000
Moonshot
256,000
NousResearch
131,000
-
Perplexity
128,000
Perplexity
200,000
xAI
131,000
xAI
131,000
xAI
256,000
xAI
256,000
xAI
2,000,000
xAI
2,000,000
xAI
2,000,000
xAI
2,000,000
Zhipu
128,000
Zhipu
128,000
Zhipu
200,000
Text, image, or file inputs to the model, used to generate a response.
A text input to the model, equivalent to a text input with the user role.
An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
The unique ID of the previous response to the model. Use this to create multi-turn conversations.
Whether to store the generated model response for later retrieval via API.
falseIf set to true, the model response data will be streamed to the client as it is generated using server-sent events.
falseThe truncation strategy to use for the model response.
disabledPossible values: How the model should select which tool (or tools) to use when generating a response.
Controls which (if any) tool is called by the model.
none means the model will not call any tool and instead generates a message.
auto means the model can pick between generating a message or calling one or more tools.
required means the model must call one or more tools.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/responses', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
"model": "gpt-4o",
"input": "Hello"
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"background": false,
"created_at": 1762343744,
"error": null,
"id": "resp_68963fb142d08197b4d3ae3ad852542c054845c6ea84caa2",
"incomplete_details": null,
"instructions": null,
"max_output_tokens": null,
"metadata": {},
"model": "gpt-4o",
"object": "response",
"output": null,
"output_text": "Hi! How’s your day going?",
"parallel_tool_calls": false,
"previous_response_id": null,
"prompt": null,
"reasoning": null,
"service_tier": null,
"status": "completed",
"temperature": null,
"text": {
"format": {
"type": "text"
}
},
"tool_choice": null,
"tools": null,
"top_p": null,
"truncation": null,
"usage": {
"input_tokens": 137,
"input_tokens_details": null,
"output_tokens": 914,
"output_tokens_details": null,
"total_tokens": 1051
}
}Text, image, or file inputs to the model, used to generate a response.
A text input to the model, equivalent to a text input with the user role.
An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
The unique ID of the previous response to the model. Use this to create multi-turn conversations.
Whether to store the generated model response for later retrieval via API.
falseIf set to true, the model response data will be streamed to the client as it is generated using server-sent events.
falseThe truncation strategy to use for the model response.
disabledPossible values: How the model should select which tool (or tools) to use when generating a response.
Controls which (if any) tool is called by the model.
none means the model will not call any tool and instead generates a message.
auto means the model can pick between generating a message or calling one or more tools.
required means the model must call one or more tools.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/responses', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
"model": "gpt-4o-mini",
"input": "Hello"
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"background": false,
"created_at": 1762343744,
"error": null,
"id": "resp_68963fb142d08197b4d3ae3ad852542c054845c6ea84caa2",
"incomplete_details": null,
"instructions": null,
"max_output_tokens": null,
"metadata": {},
"model": "gpt-4o-mini",
"object": "response",
"output": null,
"output_text": "Hi! How’s your day going?",
"parallel_tool_calls": false,
"previous_response_id": null,
"prompt": null,
"reasoning": null,
"service_tier": null,
"status": "completed",
"temperature": null,
"text": {
"format": {
"type": "text"
}
},
"tool_choice": null,
"tools": null,
"top_p": null,
"truncation": null,
"usage": {
"input_tokens": 137,
"input_tokens_details": null,
"output_tokens": 914,
"output_tokens_details": null,
"total_tokens": 1051
}
}Text, image, or file inputs to the model, used to generate a response.
A text input to the model, equivalent to a text input with the user role.
An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
The unique ID of the previous response to the model. Use this to create multi-turn conversations.
Whether to store the generated model response for later retrieval via API.
falseIf set to true, the model response data will be streamed to the client as it is generated using server-sent events.
falseThe truncation strategy to use for the model response.
disabledPossible values: How the model should select which tool (or tools) to use when generating a response.
Controls which (if any) tool is called by the model.
none means the model will not call any tool and instead generates a message.
auto means the model can pick between generating a message or calling one or more tools.
required means the model must call one or more tools.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/responses', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
"model": "gpt-4-turbo",
"input": "Hello"
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"background": false,
"created_at": 1762343744,
"error": null,
"id": "resp_68963fb142d08197b4d3ae3ad852542c054845c6ea84caa2",
"incomplete_details": null,
"instructions": null,
"max_output_tokens": null,
"metadata": {},
"model": "gpt-4-turbo",
"object": "response",
"output": null,
"output_text": "Hi! How’s your day going?",
"parallel_tool_calls": false,
"previous_response_id": null,
"prompt": null,
"reasoning": null,
"service_tier": null,
"status": "completed",
"temperature": null,
"text": {
"format": {
"type": "text"
}
},
"tool_choice": null,
"tools": null,
"top_p": null,
"truncation": null,
"usage": {
"input_tokens": 137,
"input_tokens_details": null,
"output_tokens": 914,
"output_tokens_details": null,
"total_tokens": 1051
}
}Text, image, or file inputs to the model, used to generate a response.
A text input to the model, equivalent to a text input with the user role.
An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
The unique ID of the previous response to the model. Use this to create multi-turn conversations.
Whether to store the generated model response for later retrieval via API.
falseIf set to true, the model response data will be streamed to the client as it is generated using server-sent events.
falseThe truncation strategy to use for the model response.
disabledPossible values: How the model should select which tool (or tools) to use when generating a response.
Controls which (if any) tool is called by the model.
none means the model will not call any tool and instead generates a message.
auto means the model can pick between generating a message or calling one or more tools.
required means the model must call one or more tools.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/responses', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
"model": "gpt-4",
"input": "Hello"
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"background": false,
"created_at": 1762343744,
"error": null,
"id": "resp_68963fb142d08197b4d3ae3ad852542c054845c6ea84caa2",
"incomplete_details": null,
"instructions": null,
"max_output_tokens": null,
"metadata": {},
"model": "gpt-4",
"object": "response",
"output": null,
"output_text": "Hi! How’s your day going?",
"parallel_tool_calls": false,
"previous_response_id": null,
"prompt": null,
"reasoning": null,
"service_tier": null,
"status": "completed",
"temperature": null,
"text": {
"format": {
"type": "text"
}
},
"tool_choice": null,
"tools": null,
"top_p": null,
"truncation": null,
"usage": {
"input_tokens": 137,
"input_tokens_details": null,
"output_tokens": 914,
"output_tokens_details": null,
"total_tokens": 1051
}
}Text, image, or file inputs to the model, used to generate a response.
A text input to the model, equivalent to a text input with the user role.
An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
The unique ID of the previous response to the model. Use this to create multi-turn conversations.
Whether to store the generated model response for later retrieval via API.
falseIf set to true, the model response data will be streamed to the client as it is generated using server-sent events.
falseThe truncation strategy to use for the model response.
disabledPossible values: How the model should select which tool (or tools) to use when generating a response.
Controls which (if any) tool is called by the model.
none means the model will not call any tool and instead generates a message.
auto means the model can pick between generating a message or calling one or more tools.
required means the model must call one or more tools.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/responses', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
"model": "gpt-4-0125-preview",
"input": "Hello"
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"background": false,
"created_at": 1762343744,
"error": null,
"id": "resp_68963fb142d08197b4d3ae3ad852542c054845c6ea84caa2",
"incomplete_details": null,
"instructions": null,
"max_output_tokens": null,
"metadata": {},
"model": "gpt-4-0125-preview",
"object": "response",
"output": null,
"output_text": "Hi! How’s your day going?",
"parallel_tool_calls": false,
"previous_response_id": null,
"prompt": null,
"reasoning": null,
"service_tier": null,
"status": "completed",
"temperature": null,
"text": {
"format": {
"type": "text"
}
},
"tool_choice": null,
"tools": null,
"top_p": null,
"truncation": null,
"usage": {
"input_tokens": 137,
"input_tokens_details": null,
"output_tokens": 914,
"output_tokens_details": null,
"total_tokens": 1051
}
}Text, image, or file inputs to the model, used to generate a response.
A text input to the model, equivalent to a text input with the user role.
An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
The unique ID of the previous response to the model. Use this to create multi-turn conversations.
Whether to store the generated model response for later retrieval via API.
falseIf set to true, the model response data will be streamed to the client as it is generated using server-sent events.
falseThe truncation strategy to use for the model response.
disabledPossible values: How the model should select which tool (or tools) to use when generating a response.
Controls which (if any) tool is called by the model.
none means the model will not call any tool and instead generates a message.
auto means the model can pick between generating a message or calling one or more tools.
required means the model must call one or more tools.
async function main() {
const response = await fetch('https://api.aimlapi.com/v1/responses', {
method: 'POST',
headers: {
'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
"model": "gpt-3.5-turbo",
"input": "Hello"
}),
});
const data = await response.json();
console.log(JSON.stringify(data, null, 2));
}
main();{
"background": false,
"created_at": 1762343744,
"error": null,
"id": "resp_68963fb142d08197b4d3ae3ad852542c054845c6ea84caa2",
"incomplete_details": null,
"instructions": null,
"max_output_tokens": null,
"metadata": {},
"model": "gpt-3.5-turbo",
"object": "response",
"output": null,
"output_text": "Hi! How’s your day going?",
"parallel_tool_calls": false,
"previous_response_id": null,
"prompt": null,
"reasoning": null,
"service_tier": null,
"status": "completed",
"temperature": null,
"text": {
"format": {
"type": "text"
}
},
"tool_choice": null,
"tools": null,
"top_p": null,
"truncation": null,
"usage": {
"input_tokens": 137,
"input_tokens_details": null,
"output_tokens": 914,
"output_tokens_details": null,
"total_tokens": 1051
}
}A full list of available models.
The section Get Model List via API contains API reference for the service endpoint, which lets you request the full model list.
The section Model IDs lists the identifiers of all available and deprecated models, grouped by category. These IDs are used to specify the exact models in your code, like this:
If you already know the model ID, use the page search function (Ctrl+F for Win/Linux, Command+F for Mac) to locate it. The hyperlink will take you directly to the model's API Reference page.
New Model Request
Can't find the model you need? Join our to propose new models for integration into our API offerings. Your contributions help us grow and serve you better.
These models are no longer available for API or Playground calls. Their description and API reference pages have also been removed from this documentation portal.
Open AI
128,000
Open AI
128,000
Open AI
128,000
Open AI
128,000
Open AI
128,000
-
Open AI
128,000
-
Open AI
128,000
Open AI
128,000
Open AI
128,000
Open AI
128,000
Open AI
128,000
Open AI
128,000
-
Open AI
8,000
Open AI
8,000
-
Open AI
8,000
-
Open AI
200,000
Open AI
200,000
Open AI
200,000
Open AI
200,000
Open AI
1,000,000
Open AI
1,000,000
Open AI
1,000,000
Open AI
200,000
Open AI
128,000
Open AI
128,000
Open AI
400,000
Open AI
400,000
Open AI
400,000
Open AI
400,000
Open AI
128,000
Open AI
128,000
Open AI
400,000
Open AI
400,000
Anthropic
200,000
Anthropic
200,000
-
Anthropic
200,000
-
Anthropic
200,000
Anthropic
200,000
Anthropic
200,000
Anthropic
200,000
Anthropic
200,000
Anthropic
200,000
Anthropic
200,000
Coming Soon
Alibaba Cloud
32,000
Alibaba Cloud
32,000
Alibaba Cloud
32,000
Alibaba Cloud
131,000
Alibaba Cloud
1,000,000
Alibaba Cloud
32,000
Alibaba Cloud
32,000
Alibaba Cloud
131,000
Alibaba Cloud
262,000
Alibaba Cloud
262,000
Alibaba Cloud
262,000
Alibaba Cloud
262,000
Alibaba Cloud
258,000
Alibaba Cloud
262,000
Alibaba Cloud
65,000
Alibaba Cloud
126,000
Alibaba Cloud
126,000
DeepSeek
128,000
DeepSeek
128,000
DeepSeek
164,000
DeepSeek
128,000
DeepSeek
128,000
DeepSeek
128,000
DeepSeek
128,000
DeepSeek
128,000
DeepSeek
128,000
Mistral AI
64,000
Meta
128,000
Meta
131,000
Meta
9,000
Meta
8,000
Meta
4,000
Meta
128,000
Meta
128,000
Meta
1,000,000
Meta
256,000
Meta
131,000
Mistral AI
32,000
Mistral AI
32,000
1,000,000
1,000,000
1,000,000
–
1,000,000
1,000,000
200,000
128,000
128,000
128,000
8,192
Mistral AI
32,000
Mistral AI
128,000
Anthracite
32,000
NVIDIA
128,000
NVIDIA
128,000
NVIDIA
128,000
Cohere
256,000
Mistral AI
256,000
MiniMax
1,000,000
MiniMax
1,000,000
MiniMax
200,000
Moonshot
131,000
Moonshot
256,000
Moonshot
256,000
NousResearch
131,000
-
Perplexity
128,000
Perplexity
200,000
xAI
131,000
xAI
131,000
xAI
256,000
xAI
256,000
xAI
2,000,000
xAI
2,000,000
xAI
2,000,000
xAI
2,000,000
Zhipu
128,000
Zhipu
128,000
Zhipu
200,000
Alibaba Cloud
ByteDance
ByteDance
ByteDance
ByteDance
ByteDance
ByteDance
Flux
Flux
Flux
Flux
Flux
Flux
-
Flux
Flux
Flux
Flux
Flux
Flux
Flux
Flux
Flux
Flux
Flux
Flux
Flux
OpenAI
OpenAI
OpenAI
Recraft AI
Reve
Reve
Reve
Stability AI
Stability AI
Tencent
Topaz Labs
Topaz Labs
xAI
Alibaba Cloud
Alibaba Cloud
Alibaba Cloud
Alibaba Cloud
Alibaba Cloud
Alibaba Cloud
Alibaba Cloud
Alibaba Cloud
Alibaba Cloud
ByteDance
ByteDance
ByteDance
ByteDance
ByteDance
ByteDance
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Kling AI
Krea
Krea
LTXV
Coming Soon
LTXV
Coming Soon
Minimax
Luma AI
Luma AI
Luma AI
Minimax
-
Minimax
OpenAI
-
OpenAI
-
OpenAI
-
OpenAI
-
PixVerse
PixVerse
PixVerse
Runway
Runway
Runway
Runway
Sber AI
Sber AI
Veed
Veed
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
OpenAI
-
OpenAI
-
OpenAI
-
OpenAI
-
OpenAI
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
Deepgram
ElevenLabs
ElevenLabs
Inworld
Inworld
Microsoft
Microsoft
OpenAI
OpenAI
OpenAI
MiniMax
MiniMax
Minimax AI
-
Minimax AI
Minimax AI
Together AI
32,000
BAAI
BAAI
Anthropic
16,000
Anthropic
32,000
-
Anthropic
32,000
-
Anthropic
16,000
-
Anthropic
16,000
-
Anthropic
16,000
-
Anthropic
4,000
-
2,000
2,000
2,000
-
kling-video/v1.5/standard/text-to-video
Kling AI
128,000
o1-mini o1-mini-2024-09-12
OpenAI
128,000
Qwen/Qwen2-72B-Instruct
Alibaba Cloud
32,000
claude-3-5-sonnet-20240620
Anthropic
200,000
-
claude-3-5-sonnet-20241022
Anthropic
200,000
cohere/command-r-plus
Cohere
128,000
google/gemma-2-27b-it
8,000
NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO
Nous Research
32,000
-
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Nvidia
128,000
meta-llama/Llama-3-8b-chat-hf
Meta
8,000
meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
Meta
131,000
meta-llama/Llama-Vision-Free
Meta
128,000
-
meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo
Meta
131,000
abab6.5s-chat
MiniMax
245,000
-
openrouter/horizon-beta
OpenRouter
256,000
-
openrouter/horizon-alpha
OpenRouter
256,000
-
wan/v2.1/1.3b/text-to-video
Alibaba Cloud
-
o1-preview, o1-preview-2024-09-12
OpenAI
128,000
claude-3-sonnet-20240229, anthropic/claude-3-sonnet, claude-3-sonnet-latest
Anthropic
200,000
google/gemini-2.5-pro-preview, google/gemini-2.5-pro-preview-05-06
1,000,000
google/gemini-2.5-flash-preview
1,000,000
neversleep/llama-3.1-lumimaid-70b
NeverSleep
8,000
x-ai/grok-beta
xAI
131,000
gpt-4.5-preview
OpenAI
128,000
gemini-1.5-flash
1,000,000
gemini-1.5-pro
1,000,000
google/gemma-3-1b-it
128,000
togethercomputer/m2-bert-80M-8k-retrieval
TogetherAI
8,000
togethercomputer/m2-bert-80M-2k-retrieval
TogetherAI
2,000
Gryphe/MythoMax-L2-13b-Lite
Gryphe
4,000
-
mistralai/Mixtral-8x22B-Instruct-v0.1
Mistral AI
64,000
google/gemini-2.5-pro-exp-03-25
1,000,000
-
google/gemini-2.0-flash-thinking-exp-01
1,000,000
ai21/jamba-1-5-mini
AI21 Labs
256,000
textembedding-gecko@001
3,000
-
google/gemini-pro or gemini-pro
32,000
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo-128K
Meta
128,000
-
stabilityai/stable-diffusion-xl-base-1.0
Stability AI
upstage/solar-10.7b-instruct-v1.0
Upstage
4,000
meta-llama/Llama-2-13b-chat-hf
Meta
4,100
meta-llama/meta-llama-3-70b-instruct-turbo
Meta
128,000
-
google/gemma-2-9b-it
8,000
google/gemma-2b-it
8,000
Gryphe/MythoMax-L2-13b
Gryphe
4,000
microsoft/WizardLM-2-8x22B
Microsoft
64,000
Austism/chronos-hermes-13b
Austism
2,000
databricks/dbrx-instruct
Databricks
32,000
deepseek-ai/deepseek-llm-67b-chat
DeepSeek
4,000
deepseek-ai/deepseek-coder-33b-instruct
DeepSeek
16,000
Meta-Llama/Llama-2-7b-chat-hf
Meta
4,000
Meta-Llama/Meta-Llama-3-70B-Instruct-Lite
Meta
8,000
Meta-Llama/Llama-Guard-7b
Meta
4,000
meta-llama/Llama-2-7b-hf
Meta
4,000
meta-llama/Llama-3-8b-hf
Meta
8,000
codellama/CodeLlama-70b-hf
Meta
16,000
codellama/CodeLlama-7b-Instruct-hf
Meta
16,000
codellama/CodeLlama-13b-Instruct-hf
Meta
16,000
codellama/CodeLlama-70b-Instruct-hf
Meta
4,000
codellama/CodeLlama-70b-Python-hf
Meta
4,000
mistralai/Mixtral-8x22B-Instruct-v0.1
Mistral AI
64,000
gpt-3.5-turbo-16k-0613
OpenAI
-
gpt-4-0613
OpenAI
128,000
Qwen/Qwen-14B-Chat
Alibaba Cloud
8,000
Qwen/Qwen1.5-0.5B
Alibaba Cloud
32,000
Qwen/Qwen1.5-1.8B
Alibaba Cloud
32,000
Qwen/Qwen1.5-4B
Alibaba Cloud
32,000
Qwen/Qwen1.5-1.8B-Chat
Alibaba Cloud
32,000
Qwen/Qwen1.5-4B-Chat
Alibaba Cloud
32,000
Qwen/Qwen1.5-7B-Chat
Alibaba Cloud
32,000
Qwen/Qwen1.5-14B-Chat
Alibaba Cloud
32,000
qwen/qvq-72b-preview
Alibaba Cloud
32,000
togethercomputer/guanaco-13b
Tim Dettmers
2,000
togethercomputer/guanaco-33b
Tim Dettmers
2,000
togethercomputer/guanaco-65b
Tim Dettmers
2,000
togethercomputer/mpt-7b-chat
Mosaic ML
2,000
togethercomputer/mpt-30b-chat
Mosaic ML
8,000
togethercomputer/RedPajama-INCITE-7B-Instruct
RedPajama
2,000
prompthero/openjourney
PromptHero
77
wavymulder/Analog-Diffusion
wavymulder
77
-
01.AI
4,000
Undi95/Toppy-M-7B
Undi95
4,000
SG161222/Realistic_Vision_V3.0_VAE
Together
77
tiiuae/falcon-40b
TII
2,000
allenai/OLMo-7B
Allen Institute for AI
2,000
bigcode/starcoder
BigCode
8,000
HuggingFaceH4/starchat-alpha
Hugging Face
8,000
NousResearch/Nous-Hermes-Llama2-70b
NousResearch
4,000
NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT
NousResearch
32,000
NousResearch/Nous-Hermes-2-Mistral-7B-DPO
NousResearch
32,000
NousResearch/Hermes-2-Theta-Llama-3-70B
NousResearch
8,000
defog/sqlcoder
Defog AI
8,000
replit/replit-code-v1-3b
Replit
2,000
lmsys/vicuna-13b-v1.5
Imsys
4,000
microsoft/phi-2
Microsoft
2,000
stabilityai/stablelm-base-alpha-3b
StabilityAI
4,000
runwayml/stable-diffusion-v1-5
StabilityAI
77
stabilityai/stable-diffusion-2-1
StabilityAI
77
teknium/OpenHermes-2p5-Mistral-7B
Teknium
8,000
openchat/openchat-3.5-1210
OpenChat
8,000
DiscoResearch/DiscoLM-mixtral-8x7b-v2
Disco Research
32,000
google/flan-t5-xl
512
garage-bAInd/Platypus2-70B-instruct
Garage-bAInd
4,000
EleutherAI/gpt-neox-20b
EleutherAI
2,000
gradientai/Llama-3-70B-Instruct-Gradient-1048k
Gradient
1,048,000
WhereIsAI/UAE-Large-V1
WhereIsAI
512
zero-one-ai/Yi-34B-Chat
01.AI
4,000
meta-llama/Meta-Llama-3.1-70B-Reference
Meta
32,000
–
meta-llama/Meta-Llama-3.1-8B-Reference
Meta
32,000
–
EleutherAI/llemma_7b
EleutherAI
32,000
–
huggyllama/llama-30b
Huggyllama
32,000
–
huggyllama/llama-13b
Huggyllama
32,000
–
togethercomputer/llama-2-70b
TogetherAI
32,000
–
togethercomputer/llama-2-13b
TogetherAI
32,000
–
huggyllama/llama-65b
Huggyllama
32,000
–
WizardLM/WizardLM-70B-V1.0
WizardLM
32,000
–
huggyllama/llama-7b
Huggyllama
32,000
–
togethercomputer/llama-2-7b
TogetherAI
32,000
–
NousResearch/Nous-Hermes-13b
NousResearch
2,000
–
mistralai/Mistral-7B-v0.1
Mistral AI
32,000
mistralai/Mixtral-8x7B-v0.1
Mistral AI
32,000
-
Suno AI
32
Open AI
16,000
Open AI
16,000
Open AI
16,000
Alibaba Cloud
Alibaba Cloud
Alibaba Cloud
Alibaba Cloud
Alibaba Cloud
Alibaba Cloud
Assembly AI
Assembly AI
Deepgram
Alibaba Cloud
Deepgram
Deepgram
ElevenLabs
MiniMax
MiniMax
ElevenLabs
Stability AI
Meta
128,000
-
Meta
8,000
Meta
8,000
-
Mistral AI
-
Tripo AI
Open AI
8,000
-
Open AI
8,000
Open AI
8,000
mistralai/Mistral-7B-Instruct-v0.1
Mistral AI
8,000
Qwen/Qwen2.5-Coder-32B-Instruct
Alibaba Cloud
131,000
Qwen/QwQ-32B
Alibaba Cloud
131,000

GET /models HTTP/1.1
Host: api.aimlapi.com
Accept: */*
{
"object": "text",
"data": [
{
"id": "text",
"type": "text",
"info": {
"name": "text",
"developer": "text",
"description": "text",
"contextLength": 1,
"url": "text"
},
"features": [
"text"
]
}
]
}