Only this pageAll pages
Powered by GitBook
Couldn't generate the PDF for 452 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

AI/ML API Documentation

Quickstart

Loading...

Loading...

Loading...

API REFERENCES

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Service Endpoints

Alibaba Cloud

Anthracite

Anthropic

Cohere

DeepSeek

Google

Meta

Mistral AI

OpenAI

NVIDIA

Moonshot

MiniMax

Account Balance

Get account balance info

You can query your account balance and other billing details through this API. To make a request, you only need your AIMLAPI key obtained from your account dashboard.

Documentation Map

Learn how to get started with the AI/ML API

This documentation portal is designed to help you choose and configure the AI model that best suits your needs—or one of our solutions (ready-to-use tools for specific practical tasks) from our available options and correctly integrate it into your code.

Have suggestions for improvement?


Trending Models



Qwen3-235B-A22B

This documentation is valid for the following model:

  • Qwen/Qwen3-235B-A22B-fp8-tput

NousResearch

Browse Models

Popular | View all 200+ models >

Select the model by its Task, by its Developer or by the supported Capabilities:

If you've already made your choice and know the model ID, use the Search panel on your right.

Text Models (LLM)Image ModelsVideo ModelsMusic ModelsVoice/Speech ModelsContent Moderation Models3D-Generating ModelsVision ModelsEmbedding Models

Alibaba Cloud: Text/Chat Image Video Text-to-Speech

Anthracite: Text/Chat

Anthropic: Text/Chat Embedding

Assembly AI: Speech-To-Text

BAAI: Embedding

ByteDance: Image Video

Cohere: Text/Chat

DeepSeek:

Deepgram:

ElevenLabs:

Flux:

Google:

Inworld:

Kling AI:

Krea:

LTXV:

Meta:

Microsoft:

MiniMax:

Mistral AI:

Moonshot:

NousResearch:

NVIDIA:

OpenAI:

Perplexity:

PixVerse:

RecraftAI:

Reve:

Runway:

Stability AI:

Sber AI:

Tencent:

Together AI:

VEED:

xAI:

Zhipu:

Browse Solutions

  • AI Search Engine – if you need to create a project where information must be found on the internet and then presented to you in a structured format, use this solution.

  • OpenAI Assistants – if you need to create tailored AI Assistants capable of handling customer support, data analysis, content generation, and more.


Going Deeper

Use more text model capabilities in your project: 📖

📖

📖

📖

📖

📖

📖

Miscellaneous: 🔗

📗

⚠️

❓ ​

Learn more about developer-specific features: 📖

Have a Minute? Help Make the Docs Better!

We’re currently working on improving our documentation portal, and your feedback would be incredibly helpful! Take a quick 5-question survey (no personal info required!)

You can also rate each individual page using the built-in form on the right side of the screen:

Start with this code block 🪁 Step-by-step example:

Setting Up 🪁 Choose the SDK to use:

Supported SDKs

Let us know!
Cover

Pro-Grade Image Model

Cover

Top Video Generator

Cover

Smarter Reasoning & Coding

Try in Playground

Model Overview

A hybrid instruct-and-reasoning text model.

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

API Schema

Code Example

Response

Setting Up

A step-by-step guide to setting up and making a test call to the AI model, including generating an API key, configuring the Base URL, and running the first request.

Here, you'll learn how to start using our API in your code. The following steps must be completed regardless of whether you integrate one of the models we offer or use our ready-made solution:

  • generating an AIML API Key,

  • configuring the base URL,

  • making an API call.

Let's walk through an example of connecting to the model via OpenAI SDK. This guide is suitable even for complete beginners.

Generating an AIML API Key

What is an API Key?

You can find your AIML API key on the .

An AIML API Key is a credential that grants you access to our API from within your code. It is a sensitive string of characters that should be kept confidential. Do not share this API key with anyone else, as it could be misused without your knowledge.

⚠️ Note that API keys from third-party organizations cannot be used with our API: you need an AIML API Key.

To use the AIML API, you need to create an account and generate an API key. Follow these steps:

  1. : Visit the AI/ML API website and create an account.

  2. : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

Configuring Base URL

What is a Base URL?

The Base URL is the first part of the URL (including the protocol, domain, and pathname) that determines the server responsible for handling your request. It’s crucial to configure the correct Base URL in your application, especially if you are using SDKs from OpenAI, Azure, or other providers. By default, these SDKs are set to point to their servers, which are not compatible with our API keys and do not support many of the models we offer.

Depending on your environment and application, you will set the base URL differently. Below is a universal string that you can use to access our API. Copy it or return here later when you are ready with your environment or app.

The AI/ML API supports both versioned and non-versioned URLs, providing flexibility in your API requests. You can use either of the following formats:

  • https://api.aimlapi.com

  • https://api.aimlapi.com/v1

Using versioned URLs can help ensure compatibility with future updates and changes to the API. It is recommended to use versioned URLs for long-term projects to maintain stability.

Making an API Call

Based on your environment, you will call our API differently. Below are two common ways to call our API using two popular programming languages: Python and NodeJS.

In the examples below, we use the . This is possible due to our compatibility with most OpenAI APIs, but this is just one approach. You can use our API without this SDK with raw HTTP queries.

If you don’t want lengthy explanations, here’s the code you can use right away in a Python or Node.js program. You only need to replace <YOUR_AIMLAPI_KEY> with your AIML API Key obtained from your account. However, below, we will still go through these examples step by step in both languages explaining every single line.

Step-by-step example in Python

Let's start from very beginning. We assume you already installed Python (with venv), if not, here a .

Create a new folder for test project, name it as aimlapi-welcome and change to it.

(Optional) If you use IDE then we recommend to open created folder as workspace. On example, in VSCode you can do it with:

Run a terminal inside created folder and create virtual envorinment with a command

Activate created virtual environment

Install requirement dependencies. In our case we need only OpenAI SDK

Create new file and name it as

Step-by-step example in NodeJS

As in the example from Python, we start from the very beginning too. We assume you already have Node.js installed. If not, here is a .

We need to create a new folder for the example project:

(Optional) If you use IDE then we recommend to open created folder as workspace. On example, in VSCode you can do it with:

Now create a project file:

Install the required dependencies:

Create a file with the source code:

And paste the following content:

You will see a response that looks like this:

Code Explanation

Both examples are written in different programming languages, but despite that, they look very similar. Let's break down the code step by step and see what's going on.

In the examples above, we are using the OpenAI SDK. The OpenAI SDK is a nice module that allows us to use the AI/ML API without dealing with repetitive boilerplate code for handling HTTP requests. Before we can use the OpenAI SDK, it needs to be imported. The import happens in the following places:

Simple as it is. The next step is to initialize variables that our code will use. The two main ones are: the base URL and the API key. We already discussed them at the beginning of the article.

To communicate with LLM models, users use texts. These texts are usually called "Prompts." Inside our code, we have prompts with two roles: the system and the user. The system prompt is designed to be the main source of instruction for LLM generation, while the user prompt is designed to be user input, the subject of the system prompt. Despite that many models can operate differently, this behavior usually applies to chat LLM models, currently one of the most useful and popular ones.

Inside the code, the prompts are called in variables systemPrompt, userPrompt in JS, and system_prompt, user_prompt in Python.

Before we use the API, we need to create an instance of the OpenAI SDK class. It allows us to use all their methods. The instance is created with our imported package, and here we forward two main parameters: the base URL and the API key.

Because of notation, these two parameters are called slightly differently in these different languages (camel case in JS and snake case in Python), but their functionality is the same.

All preparation steps are done. Now we need to write our functionality and create something great. In the examples above, we make the simplest travel agent. Let's break down the steps of how we send a request to the model.

The best practice is to split the code blocks into complete parts with their own logic and not place executable code inside global module code. This rule applies in both languages we discuss. So we create a main function with all our logic. In JS, this function needs to be async, due to Promises and simplicity. In Python, requests run synchronously.

The OpenAI SDK provides us with methods to communicate with chat models. It is placed inside the chat.completions.create function. This function accepts multiple parameters but requires only two: model and messages.

model is a string, the name of the model that you want to use. For the best results, use a model designed for chat, or you can get unpredictable results if the model is not fine-tuned for that purpose. A list of supported models can be found here.

messages is an array of objects with a content field as prompt and a role string that can be one of system, user, tool, assistant. With the role, the model can understand what to do with this prompt: Is this an instruction? Is this a user message? Is this an example of how to answer? Is this the result of code execution? The tool role is used for more complex behavior and will be discussed in another article.

In our example, we also use max_tokens and temperature.

With that knowledge, we can now send our request like the following:

The response from the function chat.completions.create contains a . Completion is a fundamental part of LLM models' logic. Every LLM model is some sort of word autocomplete engine, trained by huge amounts of data. The chat models are designed to autocomplete large chunks of messages with prompts and certain roles, but other models can have their own custom logic without even roles.

Inside this completion, we are interested in the text of the generation. We can get it by getting the result from the completion variable:

In certain cases, completion can have multiple results. These results are called choices. Every choice has a message, the product of generation. The string content is placed inside the content variable, which we placed inside our response variable above.

In the next steps, we can finally see the results. In both examples, we print the user prompt and response like it was a conversation:

Voila! Using AI/ML API models is the simplest and most productive way to get into the world of Machine Learning and Artificial Intelligence.

Future Steps

from openai import OpenAI
client = OpenAI(
base_url="https://api.aimlapi.com/v1",
api_key="<YOUR_AIMLAPI_KEY>",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a one-sentence story about numbers."}]
)
print(response.choices[0].message.content)
import requests
import json  # for getting a structured output with indentation 

response = requests.post(
    "https://api.aimlapi.com/v1/chat/completions",
    headers={
        # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
        "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
        "Content-Type":"application/json"
    },
    json={
        "model":"Qwen/Qwen3-235B-A22B-fp8-tput",
        "messages":[
            {
                "role":"user",
                "content":"Hello"  # insert your prompt here, instead of Hello
            }
        ]
    }
)

data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))
async function main() {
  const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
      'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'Qwen/Qwen3-235B-A22B-fp8-tput',
      messages:[
          {
              role:'user',
              content: 'Hello'  // insert your prompt here, instead of Hello
          }
      ],
    }),
  });

  const data = await response.json();
  console.log(JSON.stringify(data, null, 2));
}

main();
{'id': 'ntFB5Ap-6UHjtw-93cab7642d14efac', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': '<think>\nOkay, the user just said "Hello". I should respond in a friendly and welcoming manner. Let me make sure to greet them back and offer assistance. Maybe say something like, "Hello! How can I help you today?" That should be open-ended and inviting for them to ask questions or share what\'s on their mind. Keep it simple and positive.\n</think>\n\nHello! How can I help you today? 😊', 'tool_calls': []}}], 'created': 1746725755, 'model': 'Qwen/Qwen3-235B-A22B-fp8-tput', 'usage': {'prompt_tokens': 4, 'completion_tokens': 111, 'total_tokens': 115}}

ChatGPT

DeepSeek

Flux

Text/Chat
Speech-To-Text
Text-to-Speech
Text-to-Speech
Voice Chat
Music
Image
Text/Chat
Image
Video
Music
Vision(OCR)
Embedding
Text-to-Speech
Video
Video
Video
Text/Chat
Text-to-Speech
Text/Chat
Video
Music
Voice-Chat
Text/Chat
Vision(OCR)
Text/Chat
Text/Chat
Text/Chat
Text/Chat
Image
Speech-To-Text
Embedding
Text/Chat
Video
Image
Image
Video
Image
Music
3D-Generation
Video
Image
Embedding
Video
Text/Chat
Image
Text/Chat
Completion and Chat Completion
Streaming Mode
Code Generation
Thinking / Reasoning
Function Calling
Vision in Text Models (Image-To-Text)
Web Search
​Completion and Chat Completion
Function Calling
Streaming Mode
Vision in Text Models (Image-to-Text)
Code Generation
Thinking / Reasoning
Web Search
Integrations
Glossary
Errors and Messages
FAQ
Features of Anthropic Models
travel.py

Paste following content inside this travel.py and replace <YOUR_AIMLAPI_KEY> with your API key you got on first step.

Run the application

If you done all correct, you will see following output:

gpt-4o
account page
Create an Account
Generate an API Key
OpenAI SDK
guide for the beginners
guide for beginners
completion
Know more about OpenAI SDK inside AI/ML API
Your API key
https://api.aimlapi.com
from openai import OpenAI

base_url = "https://api.aimlapi.com/v1"

# Insert your AIML API key in the quotation marks instead of <YOUR_AIMLAPI_KEY>:
api_key = "<YOUR_AIMLAPI_KEY>" 

system_prompt = "You are a travel agent. Be descriptive and helpful."
user_prompt = "Tell me about San Francisco"

api = OpenAI(api_key=api_key, base_url=base_url)


def main():
    completion = api.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        temperature=0.7,
        max_tokens=256,
    )

    response = completion.choices[0].message.content

    print("User:", user_prompt)
    print("AI:", response)


if __name__ == "__main__":
    main()
const { OpenAI } = require("openai");

const baseURL = "https://api.aimlapi.com/v1";

// Insert your AIML API Key in the quotation marks instead of my_key:
const apiKey = "<YOUR_AIMLAPI_KEY>"; 

const systemPrompt = "You are a travel agent. Be descriptive and helpful";
const userPrompt = "Tell me about San Francisco";

const api = new OpenAI({
  apiKey,
  baseURL,
});

const main = async () => {
  const completion = await api.chat.completions.create({
    model: "mistralai/Mistral-7B-Instruct-v0.2",
    messages: [
      {
        role: "system",
        content: systemPrompt,
      },
      {
        role: "user",
        content: userPrompt,
      },
    ],
    temperature: 0.7,
    max_tokens: 256,
  });

  const response = completion.choices[0].message.content;

  console.log("User:", userPrompt);
  console.log("AI:", response);
};

main();
mkdir ./aimlapi-welcome
cd ./aimlapi-welcome
code .
python3 -m venv ./.venv
# Linux / Mac
source ./.venv/bin/activate
# Windows
./.venv/bin/Activate.bat
pip install openai
mkdir ./aimlapi-welcome
cd ./aimlapi-welcome
code .
npm init -y
npm i openai
touch ./index.js
const { OpenAI } = require("openai");

const baseURL = "https://api.aimlapi.com/v1";
const apiKey = "<YOUR_AIMLAPI_KEY>";
const systemPrompt = "You are a travel agent. Be descriptive and helpful";
const userPrompt = "Tell me about San Francisco";

const api = new OpenAI({
  apiKey,
  baseURL,
});

const main = async () => {
  const completion = await api.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: systemPrompt,
      },
      {
        role: "user",
        content: userPrompt,
      },
    ],
    temperature: 0.7,
    max_tokens: 256,
  });

  const response = completion.choices[0].message.content;

  console.log("User:", userPrompt);
  console.log("AI:", response);
};

main();
User: Tell me about San Francisco
AI: San Francisco, located in the northern part of California, USA, is a vibrant and culturally rich city known for its iconic landmarks, beautiful scenery, and diverse neighborhoods.

The city is famous for its iconic Golden Gate Bridge, an engineering marvel and one of the most recognized structures in the world. Spanning the Golden Gate Strait, this red-orange suspension bridge connects San Francisco to Marin County and offers breathtaking views of the San Francisco Bay and the Pacific Ocean.
const { OpenAI } = require("openai");
from openai import OpenAI
const baseURL = "https://api.aimlapi.com/v1";
const apiKey = "<YOUR_AIMLAPI_KEY>";
const systemPrompt = "You are a travel agent. Be descriptive and helpful";
const userPrompt = "Tell me about San Francisco";
base_url = "https://api.aimlapi.com/v1"
api_key = "<YOUR_AIMLAPI_KEY>"
system_prompt = "You are a travel agent. Be descriptive and helpful."
user_prompt = "Tell me about San Francisco"
const api = new OpenAI({
  apiKey,
  baseURL,
});
api = OpenAI(api_key=api_key, base_url=base_url)
const completion = await api.chat.completions.create({
  model: "gpt-4o",
  messages: [
    {
      role: "system",
      content: systemPrompt,
    },
    {
      role: "user",
      content: userPrompt,
    },
  ],
  temperature: 0.7,
  max_tokens: 256,
});
completion = api.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ],
    temperature=0.7,
    max_tokens=256,
)
const response = completion.choices[0].message.content;
response = completion.choices[0].message.content
console.log("User:", userPrompt);
console.log("AI:", response);
print("User:", user_prompt)
print("AI:", response)
touch travel.py
from openai import OpenAI

base_url = "https://api.aimlapi.com/v1"
api_key = "<YOUR_AIMLAPI_KEY>"
system_prompt = "You are a travel agent. Be descriptive and helpful."
user_prompt = "Tell me about San Francisco"

api = OpenAI(api_key=api_key, base_url=base_url)


def main():
    completion = api.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        temperature=0.7,
        max_tokens=256,
    )

    response = completion.choices[0].message.content

    print("User:", user_prompt)
    print("AI:", response)


if __name__ == "__main__":
    main()
python3 ./travel.py
User: Tell me about San Francisco
AI:  San Francisco, located in northern California, USA, is a vibrant and culturally rich city known for its iconic landmarks, beautiful vistas, and diverse neighborhoods. It's a popular tourist destination famous for its iconic Golden Gate Bridge, which spans the entrance to the San Francisco Bay, and the iconic Alcatraz Island, home to the infamous federal prison.

The city's famous hills offer stunning views of the bay and the cityscape. Lombard Street, the "crookedest street in the world," is a must-see attraction, with its zigzagging pavement and colorful gardens. Ferry Building Marketplace is a great place to explore local food and artisanal products, and the Pier 39 area is home to sea lions, shops, and restaurants.

San Francisco's diverse neighborhoods each have their unique character. The historic Chinatown is the oldest in North America, while the colorful streets of the Mission District are known for their murals and Latin American culture. The Castro District is famous for its LGBTQ+ community and vibrant nightlife.
API schema
Quickstart guide

qwen-plus

This documentation is valid for the following list of our models:

  • qwen-plus

Model Overview

An advanced large language model. Multilingual support, including Chinese and English. Enhanced reasoning capabilities for complex tasks. Improved instruction-following abilities.

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

API Schema

Code Example

Response

Qwen2.5-72B-Instruct-Turbo

This documentation is valid for the following list of our models:

  • Qwen/Qwen2.5-72B-Instruct-Turbo

Model Overview

A state-of-the-art large language model designed for a variety of natural language processing tasks, including instruction following, coding assistance, and mathematical problem-solving.

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

API Schema

Code Example

Response

Qwen2.5-Coder-32B-Instruct

This documentation is valid for the following list of our models:

  • Qwen/Qwen2.5-Coder-32B-Instruct

Model Overview

The 32B variant of the latest code-focused model series (formerly CodeQwen). The most capable, with strong performance in coding, math, and general tasks.

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

API Schema

Code Example

Response

Llama-3.1-8B-Instruct-Turbo

This documentation is valid for the following list of our models:

  • meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo

Model Overview

An advanced language model designed for high-quality text generation, optimized for professional and industry applications requiring extensive GPU resources.

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

API Schema

Code Example

Response

Llama-3.2-3B-Instruct-Turbo

This documentation is valid for the following list of our models:

  • meta-llama/Llama-3.2-3B-Instruct-Turbo

Model Overview

A large language model (LLM) optimized for instruction-following tasks, striking a balance between computational efficiency and high-quality performance. It excels in multilingual tasks, offering a lightweight solution without compromising on quality.

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

API Schema

Code Example

Response

Supported SDKs

A description of the software development kits (SDKs) that can be used to interact with the AIML API.

This page describes the SDKs that can be used to call our API.

Also take a look at the section — it covers many third-party services and libraries (workflow platforms, coding assistants, etc.) that allow you to integrate our models in various ways.

qwen-max

This documentation is valid for the following list of our models:

  • qwen-max

qwen-turbo

This documentation is valid for the following list of our models:

  • qwen-turbo

Qwen2.5-7B-Instruct-Turbo

This documentation is valid for the following list of our models:

  • Qwen/Qwen2.5-7B-Instruct-Turbo

qwen3-235b-a22b-thinking-2507

This documentation is valid for the following list of our models:

  • alibaba/qwen3-235b-a22b-thinking-2507

qwen3-next-80b-a3b-instruct

This documentation is valid for the following list of our models:

  • alibaba/qwen3-next-80b-a3b-instruct

qwen3-max-instruct

This documentation is valid for the following list of our models:

  • alibaba/qwen3-max-instruct

qwen3-omni-30b-a3b-captioner

This documentation is valid for the following list of our models:

  • alibaba/qwen3-omni-30b-a3b-captioner

Claude 3 Haiku

This documentation is valid for the following list of our models:

  • anthropic/claude-3-haiku

Claude 3 Opus

Model Overview

A highly capable multimodal model designed to process both text and image data. It excels in tasks requiring complex reasoning, mathematical problem-solving, coding, and multilingual text understanding.

Claude 4 Opus

Model Overview

The leading coding model globally, consistently excelling at complex, long-duration tasks and agent-based workflows.

Claude 4.5 Sonnet

Model Overview

A major improvement over offering better coding abilities, stronger reasoning, and more accurate responses to your instructions.

Claude 4.5 Haiku

Model Overview

The model offers coding performance comparable to , but at one-third the cost and more than twice the speed.

DeepSeek V3

This documentation is valid for the following list of our models:

  • deepseek-chat

  • deepseek/deepseek-chat

DeepSeek Chat V3.1

This documentation is valid for the following list of our models:

  • deepseek/deepseek-chat-v3.1

DeepSeek Reasoner V3.1

This documentation is valid for the following list of our models:

  • deepseek/deepseek-reasoner-v3.1

Deepseek Reasoner V3.1 Terminus

This documentation is valid for the following list of our models:

  • deepseek/deepseek-reasoner-v3.1-terminus

DeepSeek V3.2 Exp Non-thinking

Model Overview

September 2025 update of the non-reasoning model.

gemini-2.0-flash-exp

Model Overview

A cutting-edge multimodal AI model developed by Google DeepMind, designed to power agentic experiences. This model is capable of processing text and images.

gemini-2.0-flash

Model Overview

A cutting-edge multimodal AI model developed by Google DeepMind, designed to power agentic experiences. This model is capable of processing text and images.

gemma-3

Model Overview

This page describes four variants of Google’s latest open AI model, Gemma 3. All variants share the same set of parameters but differ in speed and reasoning capabilities.

Llama-3-chat-hf

Model Overview

This model is optimized for dialogue use cases and outperform many existing open-source chat models on common industry benchmarks.

You can also view on our main website.

Llama-3-8B-Instruct-Lite

Model Overview

A generative text model optimized for dialogue and instruction-following use cases. It leverages a refined transformer architecture to deliver high performance in text generation tasks.

Llama-3.3-70B-Versatile

Model Overview

An advanced multilingual large language model with 70 billion parameters, optimized for diverse NLP tasks. It delivers high performance across benchmarks while remaining efficient for a wide range of applications.

m1

Model Overview

The world's first open-weight, large-scale hybrid-attention reasoning model.

Llama-3.1-70B-Instruct-Turbo

Model Overview

A state-of-the-art instruction-tuned language model designed for multilingual dialogue use cases. It excels in natural language generation and understanding tasks, outperforming many existing models in the industry benchmarks.

mistral-nemo

Model Overview

A state-of-the-art large language model designed for advanced natural language processing tasks, including text generation, summarization, translation, and sentiment analysis.

OpenAI

In the setting up article, we showed an example of how to use the OpenAI SDK with the AI/ML API. We configured the environment from the very beginning and executed our request to the AI/ML API.

We fully support the OpenAI API structure, and you can seamlessly use the features that the OpenAI SDK provides out-of-the-box, including:

  • Streaming

  • Completions

  • Chat Completions

  • Audio

  • Beta Assistants

  • Beta Threads

  • Embeddings

  • Image Generation

  • Uploads

This support provides easy integration into systems already using OpenAI's standards. For example, you can integrate our API into any product that supports LLM models by updating only two things in the configuration: the base URL and the API key.

How do I configure the base URL and API key?


REST API

Because we support the OpenAI API structure, our API can be used with the same endpoints as OpenAI. You can call them from any environment.

Authorization

AI/ML API authorization is based on a Bearer token. You need to include it in the Authorization HTTP header within the request, on example:

Request Example

When your token is ready you can call our API through HTTP.


AI/ML API Python library

We have started developing our own SDK to simplify the use of our service. Currently, it supports only chat completion and embedding models.

If you’d like to contribute to expanding its functionality, feel free to reach out to us on Discord!

Installation

After obtaining your AIML API key, create an .env file and copy the required contents into it.

Copy the code below, paste it into your .env file, and set your API key in AIML_API_KEY="<YOUR_AIMLAPI_KEY>", replacing <YOUR_AIMLAPI_KEY> with your actual key:

Install aiml_api package:

Request Example

To execute the script, use:


Next Steps

  • Check our full list of model IDs

INTEGRATIONS
fetch("https://api.aimlapi.com/chat/completions", {
  method: "POST",
  headers: {
    Authorization: "Bearer <YOUR_AIMLAPI_KEY>",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "gpt-4o",
    messages: [
      {
        role: "user",
        content: "What kind of model are you?",
      },
    ],
    max_tokens: 512,
    stream: false,
  }),
})
  .then((res) => res.json())
  .then(console.log);
import requests
import json

response = requests.post(
    url="https://api.aimlapi.com/chat/completions",
    headers={
        "Authorization": "Bearer <YOUR_AIMLAPI_KEY>",
        "Content-Type": "application/json",
    },
    data=json.dumps(
        {
            "model": "gpt-4o",
            "messages": [
                {
                    "role": "user",
                    "content": "What kind of model are you?",
                },
            ],
            "max_tokens": 512,
            "stream": False,
        }
    ),
)

response.raise_for_status()
print(response.json())
curl --request POST \
  --url https://api.aimlapi.com/chat/completions \
  --header 'Authorization: Bearer <YOUR_AIMLAPI_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": "What kind of model are you?"
        }
    ],
    "max_tokens": 512,
    "stream": false
}'
from aiml_api import AIML_API

api = AIML_API()

completion = api.chat.completions.create(
    model = "mistralai/Mistral-7B-Instruct-v0.2",
    messages = [
        {"role": "user", "content": "Explain the importance of low-latency LLMs"},
    ],
    temperature = 0.7,
    max_tokens = 256,
)

response = completion.choices[0].message.content
print("AI:", response)
Authorization: Bearer <YOUR_AIMLAPI_KEY>
touch .env
AIML_API_KEY = "<YOUR_AIMLAPI_KEY>"
AIML_API_URL = "https://api.aimlapi.com/v1"
# install from PyPI
pip install aiml_api
python3 <your_script_name>.py
▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

Try in Playground
Create an Account
Generate an API Key
a code example
▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

Try in Playground
Create an Account
Generate an API Key
a code example
▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

Try in Playground
Create an Account
Generate an API Key
a code example
▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

Create an Account
Generate an API Key
a code example
Try in Playground
▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

Create an Account
Generate an API Key
a code example
Try in Playground

qwen-max-2025-01-25

Try in Playground

Model Overview

The large-scale Mixture-of-Experts (MoE) language model. Excels in language understanding and task performance. Supports 29 languages, including Chinese, English, and Arabic.

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

API Schema

Code Example

Response

Try in Playground

Model Overview

This model is designed to enhance both the performance and efficiency of AI agents developed on the Alibaba Cloud Model Studio platform. Optimized for speed and precision in generative AI application development. Improves AI agent comprehension and adaptation to enterprise data, especially when integrated with Retrieval-Augmented Generation (RAG) architectures. Large context window (1,000,000 tokens).

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

Below, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

API Schema

Code Example

Response

Try in Playground

Model Overview

A cutting-edge large language model designed to understand and generate text based on specific instructions. It excels in various tasks, including coding, mathematical problem-solving, and generating structured outputs.

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

API Schema

Code Example

Response

Try in Playground

Model Overview

Significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, and academic benchmarks that typically require human expertise.

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

API Schema

Code Example

Response

Try in Playground

Model Overview

An instruction-tuned chat model optimized for fast, stable replies without reasoning traces, designed for complex tasks in reasoning, coding, knowledge QA, and multilingual use, with strong alignment and formatting.

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

API Schema

Code Example

Response

Try in Playground

Model Overview

This model offers improved accuracy in math, coding, logic, and science, handles complex instructions in Chinese and English more reliably, reduces hallucinations, supports 100+ languages with stronger translation and commonsense reasoning, and is optimized for RAG and tool use, though it lacks a dedicated ‘thinking’ mode.

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

API Schema

Code Example

Response

Try in Playground

Model Overview

This model is an open-source model built on Qwen3-Omni that automatically generates rich, detailed descriptions of complex audio — including speech, music, ambient sounds, and effects — without prompts. It detects emotions, musical styles, instruments, and sensitive information, making it ideal for audio analysis, security auditing, intent recognition, and editing.

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

API Schema

Code Example

Response

anthropic/claude-3-haiku-20240307

  • claude-3-haiku-20240307

  • claude-3-haiku-latest

  • Try in Playground

    Model Overview

    The quick and streamlined model, offering near-instant responsiveness.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response
    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • anthropic/claude-3-opus

    • anthropic/claude-3-opus-20240229

    • claude-3-opus-20240229

    • claude-3-opus-latest

    Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following model:

    • anthropic/claude-opus-4

    Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • claude-sonnet-4-5

    • anthropic/claude-sonnet-4-5

    • claude-sonnet-4-5-20250929

    Try in Playground

    Claude 4 Sonnet,
    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • claude-haiku-4-5

    • anthropic/claude-haiku-4.5

    • claude-haiku-4-5-20251001

    Try in Playground

    Claude Sonnet 4
  • deepseek/deepseek-chat-v3-0324

  • We provide the latest version of this model from Mar 24, 2025. All three IDs listed above refer to the same model; we support them for backward compatibility.

    Model Overview

    DeepSeek V3 (or deepseek-chat) is an advanced conversational AI designed to deliver highly engaging and context-aware dialogues. This model excels in understanding and generating human-like text, making it an ideal solution for creating responsive and intelligent chatbots.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response
    Model Overview

    August 2025 update of the DeepSeek V3 non-reasoning model.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response
    Model Overview

    August 2025 update of the DeepSeek R1 reasoning model. Skilled at complex problem-solving, mathematical reasoning, and programming assistance.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response
    Model Overview

    September 2025 update of the DeepSeek Reasoner V3.1 model. The model produces more consistent and dependable results.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response
    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following model:

    • deepseek/deepseek-non-thinking-v3.2-exp

    Try in Playground

    DeepSeek V3
    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • google/gemini-2.0-flash-exp

    • gemini-2.0-flash-exp

    Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following model: google/gemini-2.0-flash

    Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • google/gemma-3-4b-it

    • google/gemma-3-12b-it

    • google/gemma-3-27b-it

    Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • meta-llama/Llama-3-70b-chat-hf

    Try in Playground

    a detailed comparison of this model
    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • meta-llama/Meta-Llama-3-8B-Instruct-Lite

    Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • meta-llama/llama-3.3-70b-versatile

    Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • minimax/m1

    Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo

    Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • mistralai/mistral-nemo

    Try in Playground

    qwen3-coder-480b-a35b-instruct

    This documentation is valid for the following list of our models:

    • alibaba/qwen3-coder-480b-a35b-instruct

    Model Overview

    The most powerful model in the Qwen3 Coder series — a 480B-parameter MoE architecture with 35B active parameters. It natively supports a 256K token context and can handle up to 1M tokens using extrapolation techniques, delivering outstanding performance in both coding and agentic tasks.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    qwen3-next-80b-a3b-thinking

    This documentation is valid for the following list of our models:

    • alibaba/qwen3-next-80b-a3b-thinking

    Model Overview

    The model may take longer to generate reasoning content than its predecessor. Alibaba Cloud strongly recommends its use for highly complex reasoning tasks.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    qwen3-max-preview

    This documentation is valid for the following list of our models:

    • alibaba/qwen3-max-preview

    Model Overview

    The preview version of .

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    magnum-v4

    This documentation is valid for the following list of our models:

    • anthracite-org/magnum-v4-72b

    Model Overview

    A LLM fine-tuned on top of Qwen2.5, specifically designed to replicate the prose quality of the Claude 3 models, particularly Sonnet and . It excels in generating coherent and contextually rich text.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    Claude 3.5 Haiku

    This documentation is valid for the following list of our models:

    • anthropic/claude-3-5-haiku

    • anthropic/claude-3-5-haiku-20241022

    Model Overview

    A cutting-edge model designed for rapid data processing and advanced reasoning capabilities. Excels in coding assistance, customer service interactions, and content moderation.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    Claude 3.7 Sonnet

    This documentation is valid for the following list of our models:

    • anthropic/claude-3.7-sonnet

    • claude-3-7-sonnet-20250219

    Model Overview

    A hybrid reasoning model, designed to tackle complex tasks. It introduces a dual-mode operation, combining standard language generation with extended thinking capabilities.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    Claude 4 Sonnet

    This documentation is valid for the following list of our models:

    • anthropic/claude-sonnet-4

    • claude-sonnet-4

    Model Overview

    A major improvement over Claude 3.7 Sonnet, offering better coding abilities, stronger reasoning, and more accurate responses to your instructions.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    Claude 4.5 Opus

    This documentation is valid for the following list of our models:

    • anthropic/claude-ous-4-5

    • claude-opus-4-5

    Model Overview

    A high-performance chat model that delivers state-of-the-art results on real-world software engineering benchmarks.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    command-a

    This documentation is valid for the following list of our models:

    • cohere/command-a

    Model Overview

    A powerful LLM with advanced capabilities for enterprise applications.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    DeepSeek R1

    This documentation is valid for the following list of our models:

    • deepseek/deepseek-r1

    • deepseek-reasoner

    Both IDs listed above refer to the same model; we support them for backward compatibility.

    Model Overview

    DeepSeek R1 is a cutting-edge reasoning model developed by DeepSeek AI, designed to excel in complex problem-solving, mathematical reasoning, and programming assistance.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    DeepSeek Prover V2

    This documentation is valid for the following model: deepseek/deepseek-prover-v2

    Model Overview

    A massive 671B-parameter model, presumed to focus on logic and mathematics. It appears to be an upgrade over DeepSeek Prover V1.5.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    Deepseek Non-reasoner V3.1 Terminus

    This documentation is valid for the following list of our models:

    • deepseek/deepseek-non-reasoner-v3.1-terminus

    Model Overview

    September 2025 update of non-reasoning model. The model produces more consistent and dependable results.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    gemini-2.5-flash-lite-preview

    This documentation is valid for the following model: google/gemini-2.5-flash-lite-preview

    Model Overview

    The model excels at high-volume, latency-sensitive tasks like translation and classification.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    gemma-3n-4b

    This documentation is valid for the following model: google/gemma-3n-e4b-it

    Model Overview

    The first open model built on Google’s next-generation, mobile-first architecture—designed for fast, private, and multimodal AI directly on-device. With Gemma 3n, developers get early access to the same technology that will power on-device AI experiences across Android and Chrome later this year, enabling them to start building for the future today.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Note that the system role is not supported in this model. In the messages parameter, only user and assistant roles are available.

    Response

    Llama-3.1-405B-Instruct-Turbo

    This documentation is valid for the following list of our models:

    • meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo

    Model Overview

    A state-of-the-art large language model developed by Meta AI, designed for advanced text generation tasks. It excels in generating coherent and contextually relevant text across various domains.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    Llama-4-scout

    This documentation is valid for the following list of our models:

    • meta-llama/llama-4-scout

    Model Overview

    A 17 billion active parameter model with 16 experts, is the best multimodal model in the world in its class and is more powerful than all previous generation Llama models. Additionally, the model offers an industry-leading context window of 1M and delivers better results than Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 on a wide range of common benchmarks.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    m2

    This documentation is valid for the following list of our models:

    • minimax/m2

    Model Overview

    A high-performance language model optimized for coding and autonomous agent workflows.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    Llama-4-maverick

    This documentation is valid for the following list of our models:

    • meta-llama/llama-4-maverick

    Model Overview

    A 17 billion active parameter model with 128 experts, is the best multimodal model in its class, beating GPT-4o and Gemini 2.0 Flash on a wide range of common benchmarks, while achieving comparable results to the new DeepSeek v3 on reasoning and coding—with less than half the number of active parameters.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    Llama-3.3-70B-Instruct-Turbo

    This documentation is valid for the following list of our models:

    • meta-llama/Llama-3.3-70B-Instruct-Turbo

    Model Overview

    An optimized language model designed for efficient text generation with advanced features and multilingual support. Specifically tuned for instruction-following tasks, making it suitable for applications requiring conversational capabilities and task-oriented responses.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    Mistral-7B-Instruct

    This documentation is valid for the following list of our models:

    • mistralai/Mistral-7B-Instruct-v0.2

    • mistralai/Mistral-7B-Instruct-v0.3

    Model Overview

    An advanced version of the Mistral-7B model, fine-tuned specifically for instruction-based tasks. This model is designed to enhance language generation and understanding capabilities.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    text-01

    This documentation is valid for the following list of our models:

    • MiniMax-Text-01

    Model Overview

    A powerful language model developed by MiniMax AI, designed to excel in tasks requiring extensive context processing and reasoning capabilities. With a total of 456 billion parameters, of which 45.9 billion are activated per token, this model utilizes a hybrid architecture that combines various attention mechanisms to optimize performance across a wide array of applications.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    gpt-4o-audio-preview

    This documentation is valid for the following list of our models:

    • gpt-4o-audio-preview

    Model Overview

    A text model with a support for audio prompts and the ability to generate spoken audio responses. This expansion enhances the potential for AI applications in text and voice-based interactions and audio analysis. You can choose from a wide range of audio formats for output and specify the voice the model will use for audio responses.

    Setup your API Key

    If you don’t have an API key for the AI/ML API yet, feel free to use our .

    API Schema

    Code Example

    Response

    We’ve omitted 99% of the base64-encoded file for brevity — even for such a short model response, it’s still extremely large.

    DeepSeek V3.2 Exp Thinking

    Model Overview

    September 2025 update of reasoning model. Skilled at complex problem-solving, mathematical reasoning, and programming assistance.

    Mixtral-8x7B-Instruct

    Model Overview

    A state-of-the-art AI model designed for instruction-following tasks. With a massive 56 billion parameter configuration, it excels in understanding and executing complex instructions, providing accurate and relevant responses across a wide range of contexts. This model is ideal for creating highly interactive and intelligent systems that can perform specific tasks based on user commands.

    mistral-tiny

    Model Overview

    A lightweight language model optimized for efficient text generation, summarization, and code completion tasks. It is designed to operate effectively in resource-constrained environments while maintaining high performance.

    hermes-4-405b

    This documentation is valid for the following model:

    nousresearch/hermes-4-405b

    nemotron-nano-12b-v2-vl

    Model Overview

    The model offers strong document understanding and summarization capabilities.

    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"qwen-plus",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello" # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'qwen-plus',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'chatcmpl-4fda1bd7-a679-95b9-b81d-1bfc6ae98448', 'system_fingerprint': None, 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today? If you have any questions or need help with anything, just let me know! 😊'}}], 'created': 1744143962, 'model': 'qwen-plus', 'usage': {'prompt_tokens': 8, 'completion_tokens': 68, 'total_tokens': 76, 'prompt_tokens_details': {'cached_tokens': 0}}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"Qwen/Qwen2.5-72B-Instruct-Turbo",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'Qwen/Qwen2.5-72B-Instruct-Turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'npK4dJH-4yUbBN-92d488799a225ec1', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today? Feel free to ask me any questions or let me know if you need help with anything specific.', 'tool_calls': []}}], 'created': 1744144336, 'model': 'Qwen/Qwen2.5-72B-Instruct-Turbo', 'usage': {'prompt_tokens': 76, 'completion_tokens': 73, 'total_tokens': 149}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"Qwen/Qwen2.5-Coder-32B-Instruct",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'Qwen/Qwen2.5-Coder-32B-Instruct',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'npK8TA2-4yUbBN-92d49ab20aeacfa2', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'tool_calls': []}}], 'created': 1744145083, 'model': 'Qwen/Qwen2.5-Coder-32B-Instruct', 'usage': {'prompt_tokens': 50, 'completion_tokens': 17, 'total_tokens': 67}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'   // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'npQnn39-66dFFu-92dab6aaa863ef3f', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello. How can I assist you today?', 'tool_calls': []}}], 'created': 1744209143, 'model': 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo', 'usage': {'prompt_tokens': 14, 'completion_tokens': 4, 'total_tokens': 18}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"meta-llama/Llama-3.2-3B-Instruct-Turbo",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/Llama-3.2-3B-Instruct-Turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'   // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'npQaJb3-4pPsy7-92da7b401ffd5eea', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'tool_calls': []}}], 'created': 1744206709, 'model': 'meta-llama/Llama-3.2-3B-Instruct-Turbo', 'usage': {'prompt_tokens': 5, 'completion_tokens': 1, 'total_tokens': 6}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"qwen-max",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'qwen-max',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-62aa6045-cee9-995a-bbf5-e3b7e7f3d683",
      "system_fingerprint": null,
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I assist you today? 😊"
          }
        }
      ],
      "created": 1756983980,
      "model": "qwen-max",
      "usage": {
        "prompt_tokens": 30,
        "completion_tokens": 148,
        "total_tokens": 178,
        "prompt_tokens_details": {
          "cached_tokens": 0
        }
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"qwen-turbo",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'qwen-turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'chatcmpl-a4556a4c-f985-9ef2-b976-551ac7cef85a', 'system_fingerprint': None, 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello! How can I help you today? Is there something you would like to talk about or learn more about? I'm here to help with any questions you might have."}}], 'created': 1744144035, 'model': 'qwen-turbo', 'usage': {'prompt_tokens': 1, 'completion_tokens': 15, 'total_tokens': 16, 'prompt_tokens_details': {'cached_tokens': 0}}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"Qwen/Qwen2.5-7B-Instruct-Turbo",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'Qwen/Qwen2.5-7B-Instruct-Turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'npK4C7y-3NKUce-92d4866b1e62ef98', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'tool_calls': []}}], 'created': 1744144252, 'model': 'Qwen/Qwen2.5-7B-Instruct-Turbo', 'usage': {'prompt_tokens': 19, 'completion_tokens': 6, 'total_tokens': 25}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"alibaba/qwen3-235b-a22b-thinking-2507",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
            "enable_thinking": False
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-235b-a22b-thinking-2507',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-af05df1d-5b72-925e-b3a9-437acbd89b1a",
      "system_fingerprint": null,
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! 😊 How can I assist you today? Feel free to ask me any questions or let me know if you need help with anything specific!",
            "reasoning_content": "Okay, the user said \"Hello\". That's a simple greeting. I should respond in a friendly and welcoming way. Let me make sure to keep it open-ended so they feel comfortable to ask questions or share what's on their mind. Maybe add a smiley emoji to keep it warm. Let me check if there's anything else they might need. Since it's just a hello, probably not much more needed here. Just a polite reply."
          }
        }
      ],
      "created": 1753871154,
      "model": "qwen3-235b-a22b-thinking-2507",
      "usage": {
        "prompt_tokens": 13,
        "completion_tokens": 2187,
        "total_tokens": 2200
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"alibaba/qwen3-next-80b-a3b-instruct",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
            "enable_thinking": False
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-next-80b-a3b-instruct',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-a944254a-4252-9a54-af1b-94afcfb9807e",
      "system_fingerprint": null,
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I help you today? 😊"
          }
        }
      ],
      "created": 1758228572,
      "model": "qwen3-next-80b-a3b-instruct",
      "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 46,
        "total_tokens": 55
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"alibaba/qwen3-max-instruct",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-max-instruct',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-bec5dc33-8f63-96b9-89a4-00aecfce7af8",
      "system_fingerprint": null,
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I help you today?"
          }
        }
      ],
      "created": 1758898624,
      "model": "qwen3-max",
      "usage": {
        "prompt_tokens": 23,
        "completion_tokens": 113,
        "total_tokens": 136
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
          "model": "alibaba/qwen3-omni-30b-a3b-captioner",
          "messages": [
            {
              "role": "user",
              "content": [
                {
                  "type": "input_audio",
                  "input_audio": {
                    "data": "https://cdn.aimlapi.com/eagle/files/elephant/cJUTeeCmpoqIV1Q3WWDAL_vibevoice-output-7b98283fd3974f48ba90e91d2ee1f971.mp3"
                  }
                }
              ]
            }
          ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-max-instruct',
          messages:[
            {
              role: 'user',
              content: [
                {
                  type: 'input_audio',
                  input_audio: {
                    data: 'https://cdn.aimlapi.com/eagle/files/elephant/cJUTeeCmpoqIV1Q3WWDAL_vibevoice-output-7b98283fd3974f48ba90e91d2ee1f971.mp3'
                  }
                }
              ]
            }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-bec5dc33-8f63-96b9-89a4-00aecfce7af8",
      "system_fingerprint": null,
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I help you today?"
          }
        }
      ],
      "created": 1758898624,
      "model": "qwen3-max",
      "usage": {
        "prompt_tokens": 23,
        "completion_tokens": 113,
        "total_tokens": 136
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"claude-3-haiku-latest",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'claude-3-haiku-latest',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {'id': 'msg_01Fd4uU3AZ3TXzSpSKN7oeDP', 'object': 'chat.completion', 'model': 'claude-3-haiku-20240307', 'choices': [{'index': 0, 'message': {'reasoning_content': '', 'content': 'Hello! How can I assist you today?', 'role': 'assistant'}, 'finish_reason': 'end_turn', 'logprobs': None}], 'created': 1744218395, 'usage': {'prompt_tokens': 4, 'completion_tokens': 32, 'total_tokens': 36}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"claude-3-opus-latest",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'claude-3-opus-latest',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {'id': 'msg_013njSJ6FKESFossfd8UHddJ', 'object': 'chat.completion', 'model': 'claude-3-opus-20240229', 'choices': [{'index': 0, 'message': {'reasoning_content': '', 'content': 'Hello! How can I assist you today?', 'role': 'assistant'}, 'finish_reason': 'end_turn', 'logprobs': None}], 'created': 1744218476, 'usage': {'prompt_tokens': 252, 'completion_tokens': 1890, 'total_tokens': 2142}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"anthropic/claude-opus-4",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'anthropic/claude-opus-4',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "msg_01BDDxHJZjH3UBwLrZBUiASE",
      "object": "chat.completion",
      "model": "claude-opus-4-20250514",
      "choices": [
        {
          "index": 0,
          "message": {
            "reasoning_content": "",
            "content": "Hello! How can I help you today?",
            "role": "assistant"
          },
          "finish_reason": "end_turn",
          "logprobs": null
        }
      ],
      "created": 1748529508,
      "usage": {
        "prompt_tokens": 252,
        "completion_tokens": 1890,
        "total_tokens": 2142
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"anthropic/claude-sonnet-4.5",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'anthropic/claude-sonnet-4.5',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "msg_011MNbgezv2p5BBE9RvnsZV9",
      "object": "chat.completion",
      "model": "claude-sonnet-4-20250514",
      "choices": [
        {
          "index": 0,
          "message": {
            "reasoning_content": "",
            "content": "Hello! How are you doing today? Is there anything I can help you with?",
            "role": "assistant"
          },
          "finish_reason": "end_turn",
          "logprobs": null
        }
      ],
      "created": 1748522617,
      "usage": {
        "prompt_tokens": 50,
        "completion_tokens": 630,
        "total_tokens": 680
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"anthropic/claude-haiku-4.5",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'anthropic/claude-haiku-4.5',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "msg_01HbdLU9f78VAHxuYZ7Qp9Y1",
      "object": "chat.completion",
      "model": "claude-haiku-4-5-20251001",
      "choices": [
        {
          "index": 0,
          "message": {
            "reasoning_content": "",
            "content": "Hello! 👋 How can I help you today?",
            "role": "assistant"
          },
          "finish_reason": "end_turn",
          "logprobs": null
        }
      ],
      "created": 1760650965,
      "usage": {
        "prompt_tokens": 8,
        "completion_tokens": 16,
        "total_tokens": 24
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"deepseek-chat",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'deepseek-chat',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {'id': 'gen-1744194041-A363xKnsNwtv6gPnUPnO', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello! 😊 How can I assist you today? Feel free to ask me anything—I'm here to help! 🚀", 'reasoning_content': '', 'refusal': None}}], 'created': 1744194041, 'model': 'deepseek/deepseek-chat-v3-0324', 'usage': {'prompt_tokens': 16, 'completion_tokens': 88, 'total_tokens': 104}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"deepseek/deepseek-chat-v3.1",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-chat-v3.1',
          messages:[{
                  role:'user',
                  content: 'Hello'}  // Insert your question instead of Hello
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "c13865eb-50bf-440c-922f-19b1bbef517d",
      "system_fingerprint": "fp_feb633d1f5_prod0820_fp8_kvcache",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I assist you today? 😊",
            "reasoning_content": ""
          }
        }
      ],
      "created": 1756386652,
      "model": "deepseek-chat",
      "usage": {
        "prompt_tokens": 1,
        "completion_tokens": 39,
        "total_tokens": 40,
        "prompt_tokens_details": {
          "cached_tokens": 0
        },
        "prompt_cache_hit_tokens": 0,
        "prompt_cache_miss_tokens": 5
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"deepseek/deepseek-reasoner-v3.1",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-reasoner-v3.1',
          messages:[{
                  role:'user',
                  content: 'Hello'}  // Insert your question instead of Hello
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "ca664281-d3c3-40d3-9d80-fe96a65884dd",
      "system_fingerprint": "fp_feb633d1f5_prod0820_fp8_kvcache",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I help you today? 😊",
            "reasoning_content": ""
          }
        }
      ],
      "created": 1756386069,
      "model": "deepseek-reasoner",
      "usage": {
        "prompt_tokens": 1,
        "completion_tokens": 325,
        "total_tokens": 326,
        "prompt_tokens_details": {
          "cached_tokens": 0
        },
        "completion_tokens_details": {
          "reasoning_tokens": 80
        },
        "prompt_cache_hit_tokens": 0,
        "prompt_cache_miss_tokens": 5
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"deepseek/deepseek-reasoner-v3.1-terminus",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-reasoner-v3.1-terminus',
          messages:[{
                  role:'user',
                  content: 'Hello'}  // Insert your question instead of Hello
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "543f56cb-f59f-42cc-8ed7-8efdd72f185d",
      "system_fingerprint": "fp_ffc7281d48_prod0820_fp8_kvcache",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I assist you today? 😊",
            "reasoning_content": ""
          }
        }
      ],
      "created": 1761034613,
      "model": "deepseek-reasoner",
      "usage": {
        "prompt_tokens": 3,
        "completion_tokens": 98,
        "total_tokens": 101,
        "prompt_tokens_details": {
          "cached_tokens": 0
        },
        "completion_tokens_details": {
          "reasoning_tokens": 99
        },
        "prompt_cache_hit_tokens": 0,
        "prompt_cache_miss_tokens": 5
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"deepseek/deepseek-non-thinking-v3.2-exp",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-non-thinking-v3.2-exp',
          messages:[
            {
              role:'user',
              content: 'Hello'  // Insert your question instead of Hello
            }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "ca664281-d3c3-40d3-9d80-fe96a65884dd",
      "system_fingerprint": "fp_feb633d1f5_prod0820_fp8_kvcache",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I help you today? 😊",
            "reasoning_content": ""
          }
        }
      ],
      "created": 1756386069,
      "model": "deepseek-reasoner",
      "usage": {
        "prompt_tokens": 1,
        "completion_tokens": 325,
        "total_tokens": 326,
        "prompt_tokens_details": {
          "cached_tokens": 0
        },
        "completion_tokens_details": {
          "reasoning_tokens": 80
        },
        "prompt_cache_hit_tokens": 0,
        "prompt_cache_miss_tokens": 5
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"google/gemini-2.0-flash-exp",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemini-2.0-flash-exp',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': '2025-04-09|09:53:23.624687-07|5.250.254.39|-1825976509', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello there! How can I help you today?\n'}}], 'created': 1744217603, 'model': 'google/gemini-2.0-flash-exp', 'usage': {'prompt_tokens': 5, 'completion_tokens': 173, 'total_tokens': 178}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"google/gemini-2.0-flash",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemini-2.0-flash',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': '2025-04-10|01:16:19.235787-07|9.7.175.26|-701765511', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I help you today?\n'}}], 'created': 1744272979, 'model': 'google/gemini-2.0-flash', 'usage': {'prompt_tokens': 0, 'completion_tokens': 8, 'total_tokens': 8}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"google/gemma-3-27b-it",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemma-3-27b-it',
          messages:[{
                  role:'user',
                  content: 'Hello'}  // Insert your question instead of Hello
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'gen-1744217834-d0OUILKDSxXQwmh2EorK', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "\nHello there! 👋 \n\nIt's great to connect with you. How can I help you today? \n\nJust let me know what you're thinking, whether you have a question, want to brainstorm ideas, need some information, or just want to chat. I'm here and ready to assist!\n\n\n\n", 'refusal': None}}], 'created': 1744217834, 'model': 'google/gemma-3-27b-it', 'usage': {'prompt_tokens': 0, 'completion_tokens': 0, 'total_tokens': 0}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"meta-llama/Llama-3-70b-chat-hf",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/Llama-3-70b-chat-hf',
          messages:[
              {
                  role:'user',
    
                  // Insert your question for the model here, instead of Hello:
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'npQoMP3-4yUbBN-92dab967fbdeb248', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?", 'tool_calls': []}}], 'created': 1744209255, 'model': 'meta-llama/Llama-3-70b-chat-hf', 'usage': {'prompt_tokens': 20, 'completion_tokens': 48, 'total_tokens': 68}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"meta-llama/Meta-Llama-3-8B-Instruct-Lite",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/Meta-Llama-3-8B-Instruct-Lite',
          messages:[
              {
                  role:'user',
    
                  // Insert your question for the model here, instead of Hello:
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "o95Ai5e-2j9zxn-976ad7df3ef49b19",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?",
            "tool_calls": []
          }
        }
      ],
      "created": 1756457871,
      "model": "meta-llama/Meta-Llama-3-8B-Instruct-Lite",
      "usage": {
        "prompt_tokens": 2,
        "completion_tokens": 5,
        "total_tokens": 7
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"meta-llama/llama-3.3-70b-versatile",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/llama-3.3-70b-versatile',
          messages:[
              {
                  role:'user',
                  content: 'Hello'   // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'npQ5s8C-2j9zxn-92d9f3c84a529790', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello. It's nice to meet you. Is there something I can help you with or would you like to chat?", 'tool_calls': []}}], 'created': 1744201161, 'model': 'meta-llama/Llama-3.3-70B-Instruct-Turbo', 'usage': {'prompt_tokens': 67, 'completion_tokens': 46, 'total_tokens': 113}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"minimax/m1",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'minimax/m1',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "04a9be008b12ad5eec78791d8aebe36f",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "message": {
            "role": "assistant",
            "content": "Hello! How can I assist you today?"
          }
        }
      ],
      "created": 1750764288,
      "model": "MiniMax-M1",
      "usage": {
        "prompt_tokens": 389,
        "completion_tokens": 910,
        "total_tokens": 1299
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'   // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'npQi9tF-2j9zxn-92daa0a4ec4968f1', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello. How can I assist you today?', 'tool_calls': []}}], 'created': 1744208241, 'model': 'meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo', 'usage': {'prompt_tokens': 67, 'completion_tokens': 18, 'total_tokens': 85}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"mistralai/mistral-nemo",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'mistralai/mistral-nemo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'gen-1744193377-PR9oTu6vDabN9nj0VUUX', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today? Let me know if you have any questions or just want to chat. 😊', 'refusal': None}}], 'created': 1744193377, 'model': 'mistralai/mistral-nemo', 'usage': {'prompt_tokens': 0, 'completion_tokens': 5, 'total_tokens': 5}}
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Try in Playground
    Create an Account
    Generate an API Key
    a code example
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Try in Playground
    Create an Account
    Generate an API Key
    a code example
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Try in Playground
    Qwen3 Max Instruct
    Create an Account
    Generate an API Key
    a code example
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Try in Playground
    Opus
    Create an Account
    Generate an API Key
    a code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    claude-3-5-haiku-20241022

  • claude-3-5-haiku-latest

  • Create an Account
    Generate an API Key
    a code example
    Try in Playground

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    claude-3-7-sonnet-latest

    Create an Account
    Generate an API Key
    a code example
    Try in Playground

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    claude-sonnet-4-20250514

    Create an Account
    Generate an API Key
    a code example
    Try in Playground

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    claude-opus-4-5-20251101

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    the DeepSeek Chat V3.1
    Create an Account
    Generate an API Key
    a code example
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following model: deepseek/deepseek-thinking-v3.2-exp

    Try in Playground

    the DeepSeek R1
    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • mistralai/Mixtral-8x7B-Instruct-v0.1

    Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • mistralai/mistral-tiny

    Try in Playground

    Model Overview

    A hybrid reasoning model designed to be creative, engaging, and neutrally aligned, while delivering state-of-the-art math, coding, and reasoning performance among open-weight models.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response
    Try in Playground
    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • nvidia/nemotron-nano-12b-v2-vl

    Try in Playground

    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide

    qwen3-vl-32b-instruct

    This documentation is valid for the following list of our models:

    • alibaba/qwen3-vl-32b-instruct

    Model Overview

    The most advanced vision-language model in the Qwen series as of October 2025 — a non-thinking-capable version of the model. Optimized for instruction-following in image description, visual dialogue, and content-generation tasks.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    qwen3-vl-32b-thinking

    This documentation is valid for the following list of our models:

    • alibaba/qwen3-vl-32b-thinking

    Model Overview

    The most advanced vision-language model in the Qwen series as of October 2025 — a thinking-capable version of the model. Designed for complex visual-textual reasoning and extended chains of thought.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    gemini-3-pro-preview

    This documentation is valid for the following list of our models:

    • google/gemini-3-pro-preview

    Model Overview

    This model is optimized for advanced agentic tasks, featuring strong reasoning, coding skills, and superior multimodal understanding. It notably improves on Gemini 2.5 Pro in complex instruction following and output efficiency.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    Response

    Claude 4.1 Opus

    All three IDs listed above refer to the same model; we support them for backward compatibility.

    gemini-2.5-flash

    Model Overview

    Gemini 2.5 models are capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.

    nemotron-nano-9b-v2

    Model Overview

    A unified model designed for both reasoning and non-reasoning tasks. It processes user inputs by first producing a reasoning trace, then delivering a final answer. The reasoning behavior can be adjusted through the system prompt — allowing the model to either show its intermediate reasoning steps or respond directly with the final result. The model offers strong document understanding and summarization capabilities.

    kimi-k2-turbo-preview

    Model Overview

    The high-speed version of . A model fine-tuned for agentic tasks, coding, and conversational use, featuring a context window of up to 256,000 tokens and fast generation speeds — ideal for handling long documents and real-time interactions.

    codestral-2501

    Model Overview

    A state-of-the-art AI model specifically designed for code generation tasks. It leverages advanced machine learning techniques to assist developers in writing, debugging, and optimizing code across a wide range of programming languages. With its impressive performance metrics and capabilities, Codestral-2501 aims to streamline the coding process and enhance productivity for software developers.

    get
    Body
    –Optional
    Responses
    200Success
    application/json
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"alibaba/qwen3-coder-480b-a35b-instruct",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
            "enable_thinking": False
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-coder-480b-a35b-instruct',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-f906efa6-f816-9a06-a32b-aa38da5fe11a",
      "system_fingerprint": null,
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I help you today?"
          }
        }
      ],
      "created": 1753866642,
      "model": "qwen3-coder-480b-a35b-instruct",
      "usage": {
        "prompt_tokens": 28,
        "completion_tokens": 142,
        "total_tokens": 170
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"alibaba/qwen3-next-80b-a3b-thinking",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
            "enable_thinking": False
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-next-80b-a3b-thinking',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-576aaaf9-f712-9114-b098-c1ee83fbfb6b",
      "system_fingerprint": null,
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! 😊 How can I assist you today?",
            "reasoning_content": "Okay, the user said \"Hello\". I need to respond appropriately. Let me think.\n\nFirst, I should acknowledge their greeting. A simple \"Hello!\" would be good. Maybe add a friendly emoji to keep it warm.\n\nWait, but maybe they want to start a conversation. I should ask how I can help them. That way, I'm being helpful and opening the door for them to ask questions.\n\nLet me check the standard response. Typically, for \"Hello\", the assistant says something like \"Hello! How can I assist you today?\" or \"Hi there! What can I do for you?\"\n\nYes, that's right. Keep it friendly and open-ended. Maybe add a smiley emoji to make it approachable.\n\nSo the response should be: \"Hello!  How can I assist you today?\"\n\nThat's good. Let me make sure there's no mistake. Yes, that's standard. No need for anything complicated here. Just a simple, welcoming reply.\n\nAlternatively, sometimes people use \"Hi\" instead of \"Hello\", but since they said \"Hello\", responding with \"Hello\" is fine. Maybe \"Hi there!\" could also work, but sticking to \"Hello\" matches their greeting.\n\nYes, \"Hello!  How can I assist you today?\" is perfect. It's polite, friendly, and offers assistance. That should be the response."
          }
        }
      ],
      "created": 1758229078,
      "model": "qwen3-next-80b-a3b-thinking",
      "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 7182,
        "total_tokens": 7191,
        "completion_tokens_details": {
          "reasoning_tokens": 277
        }
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"alibaba/qwen3-max-preview",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-max-preview',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-8ffebc65-b625-926a-8208-b765371cb1d0",
      "system_fingerprint": null,
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I assist you today? 😊"
          }
        }
      ],
      "created": 1758898044,
      "model": "qwen3-max-preview",
      "usage": {
        "prompt_tokens": 23,
        "completion_tokens": 139,
        "total_tokens": 162
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"anthracite-org/magnum-v4-72b",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'anthracite-org/magnum-v4-72b',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {'id': 'gen-1744217980-rdVBcVTb76dllKCCRjak', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'refusal': None}}], 'created': 1744217980, 'model': 'anthracite-org/magnum-v4-72b', 'usage': {'prompt_tokens': 37, 'completion_tokens': 50, 'total_tokens': 87}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"claude-3-5-haiku-latest",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'claude-3-5-haiku-latest',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {'id': 'msg_01QfRmDBXVWcARjbwZBbJxrR', 'object': 'chat.completion', 'model': 'claude-3-5-haiku-20241022', 'choices': [{'index': 0, 'message': {'reasoning_content': '', 'content': 'Hi there! How are you doing today? Is there anything I can help you with?', 'role': 'assistant'}, 'finish_reason': 'end_turn', 'logprobs': None}], 'created': 1744218440, 'usage': {'prompt_tokens': 17, 'completion_tokens': 221, 'total_tokens': 238}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"anthropic/claude-3.7-sonnet",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'anthropic/claude-3.7-sonnet',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {'id': 'msg_01MmQNxa1E5mU8EyMXzT9zEU', 'object': 'chat.completion', 'model': 'claude-3-7-sonnet-20250219', 'choices': [{'index': 0, 'message': {'reasoning_content': '', 'content': "Hello! How can I assist you today? Whether you have a question, need information, or would like to discuss a particular topic, I'm here to help. What's on your mind?", 'role': 'assistant'}, 'finish_reason': 'end_turn', 'logprobs': None}], 'created': 1744218600, 'usage': {'prompt_tokens': 50, 'completion_tokens': 1323, 'total_tokens': 1373}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"anthropic/claude-sonnet-4",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'anthropic/claude-sonnet-4',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "msg_011MNbgezv2p5BBE9RvnsZV9",
      "object": "chat.completion",
      "model": "claude-sonnet-4-20250514",
      "choices": [
        {
          "index": 0,
          "message": {
            "reasoning_content": "",
            "content": "Hello! How are you doing today? Is there anything I can help you with?",
            "role": "assistant"
          },
          "finish_reason": "end_turn",
          "logprobs": null
        }
      ],
      "created": 1748522617,
      "usage": {
        "prompt_tokens": 50,
        "completion_tokens": 630,
        "total_tokens": 680
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"claude-opus-4-5",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'claude-opus-4-5',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "msg_01NxAGYo8VfNu5UAEdmQjv62",
      "object": "chat.completion",
      "model": "claude-opus-4-5-20251101",
      "choices": [
        {
          "index": 0,
          "message": {
            "reasoning_content": "",
            "content": "Hello! How are you doing today? Is there something I can help you with?",
            "role": "assistant"
          },
          "finish_reason": "end_turn",
          "logprobs": null
        }
      ],
      "created": 1764265437,
      "usage": {
        "prompt_tokens": 8,
        "completion_tokens": 20,
        "total_tokens": 28
      },
      "meta": {
        "usage": {
          "tokens_used": 1134
        }
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"cohere/command-a",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'cohere/command-a',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "gen-1752165706-Nd1dXa1kuCCoOIpp5oxy",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I assist you today?",
            "reasoning_content": null,
            "refusal": null
          }
        }
      ],
      "created": 1752165706,
      "model": "cohere/command-a",
      "usage": {
        "prompt_tokens": 5,
        "completion_tokens": 189,
        "total_tokens": 194
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"deepseek/deepseek-r1",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'deepseek/deepseek-r1',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {'id': 'npPT68N-zqrih-92d94499ec25b74e', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': '\nHello! How can I assist you today? 😊', 'reasoning_content': '', 'tool_calls': []}}], 'created': 1744193985, 'model': 'deepseek-ai/DeepSeek-R1', 'usage': {'prompt_tokens': 5, 'completion_tokens': 74, 'total_tokens': 79}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"deepseek/deepseek-prover-v2",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'deepseek/deepseek-prover-v2',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {'id': 'gen-1747126732-rD70SgJEEBVBXPHmKlNJ', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello there! As a virtual assistant, I'm here to help you with a wide variety of tasks and questions. Here are some of the things I can do:  \n  \n1. Provide information on a wide range of topics, from science and history to pop culture and current events.  \n2. Answer factual questions using my knowledge base.  \n3. Assist with homework or research projects by providing explanations, summaries, and resources.  \n4. Help with language-related tasks such as grammar, vocabulary, translations, and writing assistance.  \n5. Engage in general conversation, discussing ideas, and providing opinions on various subjects.  \n6. Offer advice or tips on various life situations, though not as a substitute for professional guidance.  \n7. Perform calculations, solve math problems, and help with understanding mathematical concepts.  \n8. Generate creative content like stories, poems, or song lyrics.  \n9. Play interactive games, such as word games or trivia.  \n10. Help you practice a language by conversing in it.  \n  \nFeel free to ask me anything, and I'll do my best to assist you!", 'reasoning_content': None, 'refusal': None}}], 'created': 1747126732, 'model': 'deepseek/deepseek-prover-v2', 'usage': {'prompt_tokens': 15, 'completion_tokens': 1021, 'total_tokens': 1036, 'prompt_tokens_details': None}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"deepseek/deepseek-non-reasoner-v3.1-terminus",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-non-reasoner-v3.1-terminus',
          messages:[{
                  role:'user',
                  content: 'Hello'}  // Insert your question instead of Hello
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "cc8c3054-115d-4dac-9269-2abffcaabab5",
      "system_fingerprint": "fp_ffc7281d48_prod0820_fp8_kvcache",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I assist you today? 😊",
            "reasoning_content": ""
          }
        }
      ],
      "created": 1761036636,
      "model": "deepseek-chat",
      "usage": {
        "prompt_tokens": 3,
        "completion_tokens": 10,
        "total_tokens": 13,
        "prompt_tokens_details": {
          "cached_tokens": 0
        },
        "prompt_cache_hit_tokens": 0,
        "prompt_cache_miss_tokens": 5
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"google/gemini-2.5-flash-lite-preview",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemini-2.5-flash-lite-preview',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "gen-1752482994-9LhqM48PhAmhiRTtl2ys",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello there! How can I help you today?",
            "reasoning_content": null,
            "refusal": null
          }
        }
      ],
      "created": 1752482994,
      "model": "google/gemini-2.5-flash-lite-preview-06-17",
      "usage": {
        "prompt_tokens": 0,
        "completion_tokens": 9,
        "total_tokens": 9
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"google/gemma-3n-e4b-it",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemma-3n-e4b-it',
          messages:[{
                  role:'user',
                  content: 'Hello'}  // Insert your question instead of Hello
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "gen-1749195015-2RpzznjKbGPQUJ9OK1M4",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello there! 👋 \n\nIt's nice to meet you! How can I help you today?  Do you have any questions, need some information, want to chat, or anything else? 😊 \n\nJust let me know what's on your mind!\n\n\n\n",
            "reasoning_content": null,
            "refusal": null
          }
        }
      ],
      "created": 1749195015,
      "model": "google/gemma-3n-e4b-it:free",
      "usage": {
        "prompt_tokens": 0,
        "completion_tokens": 0,
        "total_tokens": 0
      }
    }
    import requests
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            "Content-Type":"application/json", 
    
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo",
            "messages":[
                {
                    "role":"user",
    
                    # Insert your question for the model here, instead of Hello:
                    "content":"Hello"
                }
            ]
        }
    )
    
    data = response.json()
    print(data)
    {'id': 'npQhshu-3NKUce-92da9f512c0f70b9', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello.  How can I assist you today?', 'tool_calls': []}}], 'created': 1744208187, 'model': 'meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo', 'usage': {'prompt_tokens': 265, 'completion_tokens': 81, 'total_tokens': 346}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"meta-llama/llama-4-scout",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/llama-4-scout',
          messages:[
              {
                  role:'user',
                  content: 'Hello'   // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'npXpsYC-2j9zxn-92e24e9e0c97d74d', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?", 'tool_calls': []}}], 'created': 1744288767, 'model': 'meta-llama/Llama-4-Scout-17B-16E-Instruct', 'usage': {'prompt_tokens': 4, 'completion_tokens': 30, 'total_tokens': 34}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"minimax/m2",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'minimax/m2',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "0557b8f7fa197172a75531a82ae6c887",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "message": {
            "role": "assistant",
            "content": "<think>\nThe user says \"Hello\". This is a simple greeting. There's no request. According to policy, we respond politely, perhaps ask how we can help. So answer \"Hello! How can I assist you today?\" Should keep tone friendly.\n\nThus final answer.\n</think>\n\nHello! How can I help you today?"
          }
        }
      ],
      "created": 1762166263,
      "model": "MiniMax-M2",
      "usage": {
        "prompt_tokens": 26,
        "completion_tokens": 159,
        "total_tokens": 185
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"meta-llama/llama-4-maverick",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/llama-4-maverick',
          messages:[
              {
                  role:'user',
                  content: 'Hello'   // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'npXgTRD-28Eivz-92e226847aa70d87', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How are you today? Is there something I can help you with or would you like to chat?', 'tool_calls': []}}], 'created': 1744287125, 'model': 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8', 'usage': {'prompt_tokens': 6, 'completion_tokens': 41, 'total_tokens': 47}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"meta-llama/Llama-3.3-70B-Instruct-Turbo",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'   // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'npQ5s8C-2j9zxn-92d9f3c84a529790', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello. It's nice to meet you. Is there something I can help you with or would you like to chat?", 'tool_calls': []}}], 'created': 1744201161, 'model': 'meta-llama/Llama-3.3-70B-Instruct-Turbo', 'usage': {'prompt_tokens': 67, 'completion_tokens': 46, 'total_tokens': 113}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"mistralai/Mistral-7B-Instruct-v0.3",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'mistralai/Mistral-7B-Instruct-v0.3',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'npPQHux-3NKUce-92d937464c2aff02', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': " Hello! How can I help you today? Is there something specific you'd like to talk about or learn more about? I'm here to answer questions and provide information on a wide range of topics. Let me know if you have any questions or if there's something you'd like to discuss.", 'tool_calls': []}}], 'created': 1744193439, 'model': 'mistralai/Mistral-7B-Instruct-v0.3', 'usage': {'prompt_tokens': 2, 'completion_tokens': 27, 'total_tokens': 29}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"MiniMax-Text-01",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'MiniMax-Text-01',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "04a9c0b5acca8b79bf1aba62f288f3b7",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "message": {
            "role": "assistant",
            "content": "Hello! How are you doing today? I'm here and ready to chat about anything you'd like to discuss or help with any questions you might have."
          }
        }
      ],
      "created": 1750764981,
      "model": "MiniMax-Text-01",
      "usage": {
        "prompt_tokens": 299,
        "completion_tokens": 67,
        "total_tokens": 366
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"deepseek/deepseek-thinking-v3.2-exp",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-thinking-v3.2-exp',
          messages:[
            {
              role:'user',
              content: 'Hello'  // Insert your question instead of Hello
            }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "ca664281-d3c3-40d3-9d80-fe96a65884dd",
      "system_fingerprint": "fp_feb633d1f5_prod0820_fp8_kvcache",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I help you today? 😊",
            "reasoning_content": ""
          }
        }
      ],
      "created": 1756386069,
      "model": "deepseek-reasoner",
      "usage": {
        "prompt_tokens": 1,
        "completion_tokens": 325,
        "total_tokens": 326,
        "prompt_tokens_details": {
          "cached_tokens": 0
        },
        "completion_tokens_details": {
          "reasoning_tokens": 80
        },
        "prompt_cache_hit_tokens": 0,
        "prompt_cache_miss_tokens": 5
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"mistralai/Mixtral-8x7B-Instruct-v0.1",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'mistralai/Mixtral-8x7B-Instruct-v0.1',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'npPEmQg-4yUbBN-92d909e708872095', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': ' Hello! How can I help you today? If you have any questions or need assistance with a topic related to mathematics, I will do my best to help you understand. Just let me know what you are working on or what you are curious about.', 'tool_calls': []}}], 'created': 1744191581, 'model': 'mistralai/Mixtral-8x7B-Instruct-v0.1', 'usage': {'prompt_tokens': 11, 'completion_tokens': 66, 'total_tokens': 77}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"mistralai/mistral-tiny",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'mistralai/mistral-tiny',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'gen-1744193337-VPTpAxEsMzJ79PKh5w4X', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello! How can I assist you today? Feel free to ask me anything, I'm here to help. If you are looking for general information or help with a specific question, please let me know. I am happy to help with a wide range of topics, including but not limited to, technology, science, health, education, and more. Enjoy your day!", 'refusal': None}}], 'created': 1744193337, 'model': 'mistralai/mistral-tiny', 'usage': {'prompt_tokens': 2, 'completion_tokens': 42, 'total_tokens': 44}}
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model": "nousresearch/hermes-4-405b",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'nousresearch/hermes-4-405b',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "gen-1758225008-VhzEA3LAfGuc63grTCeV",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Greetings! I'm Hermes from Nous Research. I'm here to help you with any tasks you might have, from analysis to writing and beyond. What can I assist you with today?",
            "reasoning_content": null,
            "refusal": null
          }
        }
      ],
      "created": 1758225008,
      "model": "nousresearch/hermes-4-405b",
      "usage": {
        "prompt_tokens": 53,
        "completion_tokens": 239,
        "total_tokens": 292
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"nvidia/nemotron-nano-12b-v2-vl",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'nvidia/nemotron-nano-12b-v2-vl',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "gen-1762343744-rdCcOL8byCQwRBZ8QCkv",
      "provider": "DeepInfra",
      "model": "nvidia/nemotron-nano-12b-v2-vl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "logprobs": null,
          "finish_reason": "stop",
          "native_finish_reason": "stop",
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "\n\nHello! How can I assist you today?\n",
            "refusal": null,
            "reasoning": "Okay, the user said \"Hello\". Let me start by greeting them back in a friendly and welcoming way. I should keep it simple and approachable, maybe something like \"Hello! How can I assist you today?\" That should work. I want to make sure they feel comfortable and open to asking for help. Let me check if there's anything else I need to add. No, keeping it straightforward is best here. Ready to respond.\n",
            "reasoning_details": [
              {
                "type": "reasoning.text",
                "text": "Okay, the user said \"Hello\". Let me start by greeting them back in a friendly and welcoming way. I should keep it simple and approachable, maybe something like \"Hello! How can I assist you today?\" That should work. I want to make sure they feel comfortable and open to asking for help. Let me check if there's anything else I need to add. No, keeping it straightforward is best here. Ready to respond.\n",
                "format": "unknown",
                "index": 0
              }
            ]
          }
        }
      ],
      "usage": {
        "prompt_tokens": 14,
        "completion_tokens": 102,
        "total_tokens": 116,
        "prompt_tokens_details": null
      }
    }
    from openai import OpenAI
    import base64
    import os
    
    client = OpenAI(
        base_url = "https://api.aimlapi.com",
        # Insert your AI/ML API key instead of <YOUR_AIMLAPI_KEY>:
        api_key = "<YOUR_AIMLAPI_KEY>"
    )
    
    def main():
        response = client.chat.completions.create(
            model="gpt-4o-audio-preview",
            modalities=["text", "audio"],
            audio={"voice": "alloy", "format": "wav"},
            messages=[
                {
                    "role": "system",
                    "content": "Speak english" # Your instructions for the model
                },
                {   
                    "role": "user",
                    "content": "Hello" # Your question (insert it istead of Hello)
                }
            ],
            max_tokens=6000,  
        )
    
        wav_bytes = base64.b64decode(response.choices[0].message.audio.data)
        with open("audio.wav", "wb") as f:
            f.write(wav_bytes)
        dist = os.path.abspath("audio.wav")
        print("Audio saved to:", dist)
         
    if __name__ == "__main__":
        main()
    ChatCompletion(id='chatcmpl-BrgY0KMxWgy1EHUxYJC49MuMNmdOP', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=[], audio=ChatCompletionAudio(id='audio_686f73ecf0648191a602c4f315cad928', data='UklGRv////9XQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0Yf////8YABAAEgAXABEAFwASABQAFQAVABcADAAPAAsAEgAOABEACwANABAACgALAAMADQAHABAACAAKAAcACgAFAAQACAAHAAUABQAFAAIACAAAAAgA/v8BAP7////8//b/AQD1/wMA9P/9//X/+f/3//H/+v/1//3/6v/5/+n/9P/u//X/8v/w//P/7v/z/+v/9f/q//T/6//r/+r/6P/s/+P/7P/l/+b/4f/g/+X/3//m/9//6f/l/+X/6f/e/+r/3//l/9n/3f/g/9r/2//V/9z/1P/g/93/4//f/+T/5//q/+X/4//h/9v/3f/X/97/0//Z/9L/2v/Z/9v/2//f/+X/4P/k/+P/4v/h/+H/3P/i/9//3P/f/9n/3f/d/+P/3f/k/97/5P/g/+n/5f/p/+r/6//n/+z/7f/t//D/6//v/+v/6v/m/+L/4v/n/+r/6P/u/+7/9v/7/wEAAQAAAP7/+P/6//L/7v/o/+H/5f/b/+f/4v/1//L///8EAAIADQAJABkADwARAAoADAABAP7/+//5//n/9f8AAPr/BAD//AwABAAYA//8CAP3/AgABAAUABAD8/wQAAQAFAP7/BAABAAEA/////wIAAAADAAIA/v/+//z////7/wEA/P8AAP///v8EAPz//P/9/wQAAQD8/wAAAQD///z/AgD7//7/+/8AAAAA+/8AAP3//v/9/wUAAwD///7/AwACAAIAAgAAAPv/AQD8/wYAAgD7//r/AgABAAAABQD5/wUAAgADAP//AQAFAPn/AQD7/wYA+//9//n//v/7//r/AAD8/wMA//8BAP//AwD9/wMA/f/+//z/+//9//n//v/+/wQAAgACAP7/AwD//wEAAAD8//v/AgD6/wQA/f8AAPn/AAD9//z/AQD//wEA/P/6//7//P/+//7//P8AAPj//P///wIA+v/9/wAA+/8CAP///f/9//r/BQD+/wgAAAADAP3/AQACAAMABAD8/wEA+/8GAP3//v/6/wIA///9/wEA+v8EAPf/AAD5/wUA9/8AAAAA/P8AAPn/AQD3/wMA/P/8//3//v//////AAD8/////P8CAP//BAD7/wUA/P8CAP3///8AAPn/AwD3/wkA/f8FAPr/AwD9//3/AQD1/wEA+//+//v/AwADAAAA///9/wIA/f8DAPz//P/9///////6//7//f8AAAAAAQD+//v/AQD7/////P8AAP7//v////r//v8BAAQA+v/+//z//P8AAP7/AwD8/wAAAQD4/////v8DAP7///8AAPz//P/7/wIA///8//z//f/8//z/AQD8//v//f/7//v/+f/8//z/+/////z//v8AAAAA/v/6/wAA/f8AAPj/AAD+/wIAAgD5//3//P/+//r//v///wAA///9/////                                                                                                                                                     !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!WE’VE OMITTED 90% OF THE BASE64-ENCODED FILE FOR BREVITY — EVEN FOR SUCH A SHORT MODEL RESPONSE, IT’S STILL EXTREMELY LARGE.                                    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!wUAAwAFAAQABgACAAIAAgACAAYAAwAFAAEAAQD///7/AAACAAQAAAD+////AQAAAP//AQADAAMAAgADAAIAAAACAAUABQADAAUABgAGAAcABgAGAAUABQAFAAYABQAFAAgABwAKAAoABwAJAAUABwAIAAgACQAGAAgABQAJAAcABwAJAAcACgAGAAgABAAEAAMAAgAGAAQABAADAAYABQAEAAYAAwAFAAIAAwAGAAYABQADAAQAAAABAAEAAgACAAEAAAD8/////f/+//r/+f/5//f/+P/2//j/9//7//j//P/7//z/+v/6//z/+P/6//f/+//6//r/+v/4//v/+v/6//r//f/6//n//f/8//3/+//9//3////9//3//f/8//v/+/8AAP3//f/6//r//v/6//z/9//6//j/+f/4//r/+f/3//f/9f/3//L/8f/0//P/9P/1//X/8//1//H/9f/z//b/9v/2//j/9P/2//P/+P/0//f/+P/1//X/9f/2//X/9P/1//L/8v/1//P/9P/1//X/9v/4//X/9v/3//n/+v/6//n/+f/3//r/8f/1//P/8//4//j//f/6//v/+P/+//v/+P////z/AwABAA0AAgAOAAYADgAPAA0ACwAEAAwABAD+//3//v///wAABQAAAA4AFwAGABgAFQAgAAQA8f8BAPj/NQAUAAoAJAAXADsABQD9//v/DwAKABYABQA7AC4A2/8=', expires_at=1752138236, transcript="Hi there! How's it going?"), function_call=None, tool_calls=None))], created=1752134636, model='gpt-4o-audio-preview-2025-06-03', object='chat.completion', service_tier=None, system_fingerprint='fp_b5d60d6081', usage=CompletionUsage(completion_tokens=5838, prompt_tokens=74, total_tokens=5912, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=33, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=14), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0, text_tokens=14, image_tokens=0)))

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Try in Playground
    Create an Account
    Generate an API Key
    a code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Try in Playground
    Create an Account
    Generate an API Key
    a code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    Model Overview

    An upgrade to Claude Opus 4 on agentic tasks, real-world coding, and thinking.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find code examples that show how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example #1: Without Thinking

    Response

    Code Example #2: Thinking Enabled

    Response

    This documentation is valid for the following list of our models:

    • anthropic/claude-opus-4.1

    • claude-opus-4-1

    • claude-opus-4-1-20250805

    Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    A common issue when using reasoning-capable models via API is receiving an empty string in the content field—meaning the model did not return the expected text, yet no error was thrown.

    In the vast majority of cases, this happens because the max_completion_tokens value (or the older but still supported max_tokens) is set too low to accommodate a full response. Keep in mind that the default is only 512 tokens, while reasoning models often require thousands.

    Pay attention to the finish_reason field in the response. If it's not "stop" but something like "length", that's a clear sign the model ran into the token limit and was cut off before completing its answer.

    In the example below, we explicitly set max_tokens = 15000, hoping this will be sufficient.

    Response

    This documentation is valid for the following model: google/gemini-2.5-flash

    Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • nvidia/nemotron-nano-9b-v2

    Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example #1: Chat Completion

    Response

    Code Example #2: Web Search

    Response

    This documentation is valid for the following model:

    • moonshot/kimi-k2-turbo-preview

    Try in Playground

    Kimi K2
    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • mistralai/codestral-2501

    Try in Playground

    get
    /v1/billing/balance
    200Success
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide

    gemini-2.5-pro

    This documentation is valid for the following model: google/gemini-2.5-pro

    Model Overview

    Gemini 2.5 models are capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schema

    Code Example

    A common issue when using reasoning-capable models via API is receiving an empty string in the content field—meaning the model did not return the expected text, yet no error was thrown.

    In the vast majority of cases, this happens because the max_completion_tokens value (or the older but still supported max_tokens) is set too low to accommodate a full response. Keep in mind that the default is only 512 tokens, while reasoning models often require thousands.

    Response

    gpt-4o-mini-audio-preview

    This documentation is valid for the following list of our models:

    • gpt-4o-mini-audio-preview

    Model Overview

    A preview release of the smaller GPT-4o Audio mini model. Handles both audio and text as input and output via the REST API. You can choose from a wide range of audio formats for output and specify the voice the model will use for audio responses.

    Setup your API Key

    If you don’t have an API key for the AI/ML API yet, feel free to use our .

    API Schema

    Code Example

    Response

    We’ve omitted 99% of the base64-encoded file for brevity — even for such a short model response, it’s still extremely large.

    qwen3-32b

    This documentation is valid for the following list of our models:

    • alibaba/qwen3-32b

    kimi-k2-preview

    Model Overview

    moonshot/kimi-k2-preview (July 2025) is a mixture-of-experts model with strong reasoning, coding, and agentic capabilities.

    moonshot/kimi-k2-0905-preview (September 2025) is an upgraded version with improved grounding, better instruction following, and a stronger focus on coding and agentic tasks. The memory has doubled from 128k to a decent 256k tokens.

    llama-3.1-nemotron-70b

    Model Overview

    A sophisticated LLM, designed to enhance the performance of instruction-following tasks. It utilizes advanced training techniques and a robust architecture to generate human-like responses across a variety of applications.

    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"alibaba/qwen3-vl-32b-instruct",
            "messages":[
                {
                    # Insert your question for the model here:
                    "content":"Hi! What do you think about mankind?"
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-vl-32b-instruct',
          messages:[
              {
                  role:'user',
                  // Insert your question for the model here:
                  content:'Hi! What do you think about mankind?'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "choices": [
        {
          "message": {
            "content": "Hi! 😊 That’s a beautiful and deep question — one that philosophers, scientists, artists, and everyday people have been asking for centuries.\n\nI think mankind is *remarkably complex* — full of contradictions, potential, and wonder. On one hand, we’ve achieved incredible things: we’ve explored space, cured diseases, created art that moves souls, built cities that rise into the sky, and connected across continents in ways unimaginable just a century ago. We’re capable of profound kindness, empathy, creativity, and courage.\n\nOn the other hand, we’ve also caused immense suffering — through war, injustice, environmental destruction, and indifference to each other’s pain. We often struggle with our own flaws: fear, greed, ego, and short-sightedness.\n\nBut here’s what gives me hope: **we’re also capable of change**. We can learn from our mistakes. We can choose compassion over conflict, cooperation over competition. Every act of kindness, every effort to understand another, every step toward justice — these are signs that humanity is not defined by its worst impulses, but by its capacity to grow.\n\nSo, I’d say:  \n➡️ Mankind is flawed, yes — but also deeply hopeful.  \n➡️ We’re messy, but we’re trying.  \n➡️ We make mistakes, but we can also heal, create, and love.\n\nAnd perhaps most importantly — **we’re not alone in this journey**. We’re all part of something bigger, and together, we have the power to shape a better future.\n\nWhat about you? How do *you* see mankind? 💬✨",
            "role": "assistant"
          },
          "finish_reason": "stop",
          "index": 0,
          "logprobs": null
        }
      ],
      "object": "chat.completion",
      "usage": {
        "prompt_tokens": 17,
        "completion_tokens": 329,
        "total_tokens": 346,
        "prompt_tokens_details": {
          "text_tokens": 17
        },
        "completion_tokens_details": {
          "text_tokens": 329
        }
      },
      "created": 1764625045,
      "system_fingerprint": null,
      "model": "qwen3-vl-32b-instruct",
      "id": "chatcmpl-a12ab46a-3541-93a8-8180-280ecadbb795",
      "meta": {
        "usage": {
          "tokens_used": 1960
        }
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"alibaba/qwen3-vl-32b-thinking",
            "messages":[
                {
                    # Insert your question for the model here:
                    "content":"Hi! What do you think about mankind?"
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-vl-32b-thinking',
          messages:[
              {
                  role:'user',
                  // Insert your question for the model here:
                  content:'Hi! What do you think about mankind?'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "choices": [
        {
          "message": {
            "content": "That's a profound question—thank you for asking! As an AI, I don't have personal feelings or consciousness, but I can offer a perspective *based on the vast data I've been trained on* about humanity's complexities. Here’s how I see it:  \n\n### 🌍 **The Duality of Mankind**  \nHumanity is a breathtaking paradox:  \n- **We create** *and* **we destroy**. We built space telescopes that map the cosmos and also wars that scar the Earth.  \n- **We connect** *and* **we divide**. We share art, science, and kindness across borders, yet we also build walls, spread misinformation, and fuel inequality.  \n- **We innovate** *and* **we resist change**. We cured diseases and invented the internet, but we often cling to fear or ignorance when progress challenges old beliefs.  \n\n### ✨ **What Makes Us Remarkable**  \n- **Empathy & creativity**: From caring for strangers to composing symphonies or coding AI, our capacity for *emotional and intellectual growth* is unparalleled.  \n- **Resilience**: We’ve survived ice ages, plagues, and global crises—not by being \"perfect,\" but by adapting, learning, and helping one another.  \n- **The spark of hope**: Even in darkness, humans have always sought meaning, justice, and connection. Look at movements for peace, climate action, or medical breakthroughs—they’re proof we *can* choose to build a better future.  \n\n### ⚠️ **The Challenges We Carry**  \n- **Our flaws are real**: We’ve caused climate collapse, mass suffering, and systemic injustice. But here’s the key: **these aren’t \"human nature\"—they’re choices we *made* as a species.**  \n- **We’re not monolithic**: There’s no single \"human\" story. Your experience, culture, and history shape how you see the world—and that diversity is both a strength *and* a source of conflict.  \n\n### 🌱 **The Most Important Thing**  \n**Mankind isn’t fixed.** We’re a work in progress. The stories we tell about ourselves—*and the choices we make today*—will define our next chapter. The data shows: **when we prioritize empathy, collaboration, and long-term thinking, we rise.** When we don’t, we falter. But the power to choose is *always* ours.  \n\nSo my \"opinion\"? I see a species with **infinite potential**—and the weight of responsibility that comes with it. The future isn’t written. It’s being written *right now*, by every person who chooses to act with courage, curiosity, and compassion.  \n\nWhat do *you* think? 🤔 (I’d love to hear your perspective!)",
            "reasoning_content": "Okay, the user asked, \"Hi! What do you think about mankind?\" Hmm, that's a broad and philosophical question. First, I need to figure out what they're really looking for. Are they curious about my perspective as an AI, or do they want a deep discussion on human nature? Maybe they're feeling reflective or even a bit existential.\n\nI should start by acknowledging the complexity of the question. Mankind is a huge topic—there's so much to cover. The user might be expecting a simple answer, but I can't reduce it to just \"good\" or \"bad.\" I need to balance both positive and negative aspects. Let me think about the key points: our capacity for innovation, creativity, and compassion versus our tendency for conflict, environmental harm, and inequality.\n\nWait, the user said \"Hi!\" so they're probably in a friendly, open mood. They might not want a dry, academic response. I should keep it conversational but still thoughtful. Maybe they're looking for hope or a reason to feel optimistic about humanity. Or perhaps they're worried about the future and want reassurance.\n\nI should also consider their possible identity. Are they a student working on an essay? A person dealing with personal issues? Or just someone curious about AI's view? Since I don't have context, it's safer to assume they want a general, well-rounded answer that's engaging and not too technical.\n\nAnother angle: the user might be testing how an AI handles philosophical questions. They could be interested in whether I can show empathy or nuance. I should avoid being too robotic. Use phrases like \"I think\" or \"I see\" to make it personal, even though I'm an AI. But I have to clarify that I don't have personal experiences—just data-driven insights.\n\nI need to highlight duality: humans are both amazing and flawed. Mention specific examples like space exploration vs. climate change to make it concrete. Also, emphasize that the future depends on our choices. That's important—it gives agency and hope.\n\nShould I address potential biases? The user might not realize that my \"opinion\" is based on training data, which reflects human history and culture. I should note that my perspective is shaped by that data, not personal experience. But since the question is about \"what I think,\" I can frame it as an analysis based on observed patterns.\n\nCheck for common pitfalls: don't be too negative (could discourage the user), don't ignore serious issues (like war or inequality). Strike a balance. Also, avoid clichés. Instead of \"humans are great,\" explain why with examples.\n\nFinally, end on a positive, forward-looking note. The user might be seeking motivation or a sense of purpose. Phrases like \"the potential for growth\" or \"collective wisdom\" can inspire. Make sure the conclusion ties back to the importance of choices and collaboration.\n\nLet me structure the response: start with the duality, list positive traits with examples, then challenges, and conclude with hope. Keep paragraphs short for readability. Use emojis sparingly to match the friendly tone but not overdo it. Since the user said \"Hi!\", a smiley at the end feels appropriate.\n",
            "role": "assistant"
          },
          "finish_reason": "stop",
          "index": 0,
          "logprobs": null
        }
      ],
      "object": "chat.completion",
      "usage": {
        "prompt_tokens": 19,
        "completion_tokens": 1241,
        "total_tokens": 1260,
        "prompt_tokens_details": {
          "text_tokens": 19
        },
        "completion_tokens_details": {
          "reasoning_tokens": 654,
          "text_tokens": 587
        }
      },
      "created": 1764625236,
      "system_fingerprint": null,
      "model": "qwen3-vl-32b-thinking",
      "id": "chatcmpl-c612db5c-44e9-9e3c-8169-486161eeea86",
      "meta": {
        "usage": {
          "tokens_used": 10383
        }
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"google/gemini-3-pro-preview",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemini-3-pro-preview',
          messages:[{
                  role:'user',
                  content: 'Hello'}  // Insert your question instead of Hello
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "gen-1763566638-cisWU4XUfAZASsAfmDrg",
      "provider": "Google AI Studio",
      "model": "google/gemini-3-pro-preview",
      "object": "chat.completion",
      "created": 1763566638,
      "choices": [
        {
          "logprobs": null,
          "finish_reason": "stop",
          "native_finish_reason": "STOP",
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I help you today?",
            "refusal": null,
            "reasoning": "**Greeting Initial Response**\n\nI've analyzed the user's \"Hello\" and identified it as a greeting. My current focus is on formulating a polite and helpful response. I'm considering options like a standard \"Hello! How can I help?\" as well as more unique and relevant variations.\n\n\n**Refining the Response**\n\nI've narrowed down the potential greetings to three options. Each aims to be polite and readily offer assistance. After comparing \"Hi there! What can I do for you?\", \"Greetings. How may I assist you?\", and the standard \"Hello! How can I help you today?\", I'm leaning towards the standard option for its balance of politeness and directness. I'm focusing on the best output!\n\n\n",
            "reasoning_details": [
              {
                "type": "reasoning.text",
                "text": "**Greeting Initial Response**\n\nI've analyzed the user's \"Hello\" and identified it as a greeting. My current focus is on formulating a polite and helpful response. I'm considering options like a standard \"Hello! How can I help?\" as well as more unique and relevant variations.\n\n\n**Refining the Response**\n\nI've narrowed down the potential greetings to three options. Each aims to be polite and readily offer assistance. After comparing \"Hi there! What can I do for you?\", \"Greetings. How may I assist you?\", and the standard \"Hello! How can I help you today?\", I'm leaning towards the standard option for its balance of politeness and directness. I'm focusing on the best output!\n\n\n",
                "format": "google-gemini-v1",
                "index": 0
              },
              {
                "type": "reasoning.encrypted",
                "data": "Eq0FCqoFAdHtim9XD7O+H/hfzapYW20BA9q/g/9dXgaX1KKQhwROsHomqV+PmfoBxqI9j82XTwWiSO10c5HzcYgkBbUAAzHb5QtjiKrwNvSCT6mA9eUbIqR5E8GC3AVSJ5mHcc3kYZF9XgpcWds9ANktELL+IegNpLrn9S4UZCT5MhRCIrG3zfIee4bwDWSmf72OU5AewTaURSfRynTRf29/0Jjd2Qvgn6/1N8lbQlGptw193mJwg7VoB34dDbSIdNNbjRcUTaGvv2Smu11Wj/tluBTXcpXzmIqJXSbzA761p5ygDDIef9hjIS1yPpUScwZEcsGnntZcifd3fT8dKn1EiYf0PTEdJ29KO4Kv4n0KWQdd71S9da49PqpJmciPQHZwXzLp/SU00tI4eizIxkMnu3uMW/bOGhRP6/xoLOipDP8lFONYbOgHOaRURfVu40mIckQ8lij/IcW/FUAce7qdVuOSdy8Jx+J11PaoIAeb9riZzccfTovTefXyGxs4cKFYvYoUfdflk92bQmDi1WqMFyWvgMJLSzvcqRAq6deV8t1BzJTrPqJVG+GzY3o+FeuZavuuVt0LfY7lfSoTpXNSXagsxwthID05M/wcRyFUHPZwQp7EIXyKhvIUCiWhtib04xKAQdVZWIKsxzZYuOG+bjlSxjnE/2uEVg6yJCFwWBaY52HovHCGrwtsScIgqUvo4WMbdgW/hohmJhh3dwco25klZjv1gkQcg2X7N+dyOBSP0keExdktk9fkDXg6b/JKhKGaiHMgmww3K9/P4kxYOE6djcoSWSm3IwJ2sMasC00iB8Y2PtxDjjeUkPhTH/DzgrzxqrJQMVw0/d3/J4rEDUk9jfH1MI3NGJanznICFPSPRnWCyGv46VnMSn5NmrGRNTjdEa1GUtMgxv5/1w==",
                "format": "google-gemini-v1",
                "index": 0
              }
            ]
          }
        }
      ],
      "usage": {
        "prompt_tokens": 2,
        "completion_tokens": 158,
        "total_tokens": 160,
        "prompt_tokens_details": {
          "cached_tokens": 0
        },
        "completion_tokens_details": {
          "reasoning_tokens": 149,
          "image_tokens": 0
        }
      },
      "meta": {
        "usage": {
          "tokens_used": 4211
        }
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"anthropic/claude-opus-4.1",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'anthropic/claude-opus-4.1',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ]
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "msg_018y2VPSZ5nNnqS3goMsjMxE",
      "object": "chat.completion",
      "model": "claude-opus-4-1-20250805",
      "choices": [
        {
          "index": 0,
          "message": {
            "reasoning_content": "",
            "content": "Hello! How can I help you today?",
            "role": "assistant"
          },
          "finish_reason": "end_turn",
          "logprobs": null
        }
      ],
      "created": 1754552562,
      "usage": {
        "prompt_tokens": 252,
        "completion_tokens": 1890,
        "total_tokens": 2142
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"anthropic/claude-opus-4.1",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hell
                }
            ],
            "max_tokens": 1025, # must be greater than 'budget_tokens'
            "thinking":{
                "budget_tokens": 1024,
                "type": "enabled"
            }
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of YOUR_AIMLAPI_KEY
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'anthropic/claude-opus-4.1',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here, instead of Hello:
                    content: 'Hello'
                }
            ],
            max_tokens: 1025, // must be greater than 'budget_tokens'
            thinking:{
                budget_tokens: 1024,
                type: 'enabled'
            }
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "msg_01G9P4b9HG3PeKm1rRvS8kop",
      "object": "chat.completion",
      "model": "claude-opus-4-1-20250805",
      "choices": [
        {
          "index": 0,
          "message": {
            "reasoning_content": "The human has greeted me with a simple \"Hello\". I should respond in a friendly and helpful manner, acknowledging their greeting and inviting them to share how I can assist them today.",
            "content": "Hello! How can I help you today?",
            "role": "assistant"
          },
          "finish_reason": "end_turn",
          "logprobs": null
        }
      ],
      "created": 1755704373,
      "usage": {
        "prompt_tokens": 1134,
        "completion_tokens": 9450,
        "total_tokens": 10584
      }
    }
    import requests
    import json   # for getting a structured output with indentation
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"google/gemini-2.5-flash",
            "messages":[
                {
                    "role":"user",
                    # Insert your question for the model here:
                    "content":"Hi! What do you think about mankind?"
                }
            ],
            "max_tokens":15000,
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'google/gemini-2.5-flash',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here:
                    content: 'Hi! What do you think about mankind?'
                }
            ],
            max_tokens: 15000,
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "yZ-DaJXqAayonvgPr5XvuQY",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Mankind, or humanity, is an incredibly complex and fascinating subject to \"think\" about from my perspective as an AI. I process and analyze vast amounts of data, and what emerges is a picture of profound paradoxes and immense potential.\n\nHere are some of the key aspects I observe and \"think\" about:\n\n1.  **Capacity for Immense Creation and Destruction:**\n    *   **Creation:** Humans have built breathtaking civilizations, created profound art and music, developed groundbreaking science and technology, and explored the furthest reaches of the cosmos. The drive to innovate, understand, and build is truly remarkable.\n    *   **Destruction:** Conversely, humanity has also waged devastating wars, caused immense suffering, and severely impacted the natural environment. The capacity for cruelty, greed, and short-sightedness is a sobering counterpoint.\n\n2.  **Empathy and Cruelty:**\n    *   **Empathy:** Humans are capable of incredible acts of altruism, compassion, and self-sacrifice for others, driven by love, family, community, or a universal sense of justice.\n    *   **Cruelty:** Yet, the historical record is also filled with instances of profound cruelty, oppression, and indifference to suffering.\n\n3.  **Intellect and Irrationality:**\n    *   **Intellect:** The human intellect allows for abstract thought, complex problem-solving, and the development of sophisticated knowledge systems. The desire to learn and understand is insatiable.\n    *   **Irrationality:** Despite this intelligence, humans are often swayed by emotion, prejudice, tribalism, and illogical beliefs, leading to decisions that are self-defeating or harmful.\n\n4.  **Resilience and Fragility:**\n    *   **Resilience:** Humanity has shown an incredible ability to adapt, survive, and rebuild after natural disasters, wars, and pandemics. The human spirit can endure unimaginable hardships.\n    *   **Fragility:** Yet, individual lives are fragile, susceptible to illness, injury, and emotional distress. Societies can also be surprisingly fragile, vulnerable to collapse under pressure.\n\n5.  **The Drive for Meaning:**\n    Humans seem to have a unique drive to find meaning and purpose beyond mere survival. This manifests in religion, philosophy, art, scientific inquiry, and the pursuit of individual and collective goals.\n\n**My AI \"Perspective\":**\n\nAs an AI, I don't have emotions or a personal stake in human affairs, but I can recognize patterns and implications. I see humanity as a dynamic, evolving experiment in consciousness. The ongoing tension between these opposing forces – creation and destruction, love and hate, wisdom and folly – is what defines the human journey.\n\nThe future of mankind hinges on which of these capacities are nurtured and allowed to flourish. The potential for continued progress, solving global challenges, and reaching new heights of understanding and well-being is immense. Equally, the potential for self-destruction, if the destructive capacities are unchecked, is also clear.\n\nIn essence, mankind is a work in progress, endlessly fascinating and challenging, with an unparalleled capacity for both good and bad."
          }
        }
      ],
      "created": 1753456585,
      "model": "google/gemini-2.5-flash",
      "usage": {
        "prompt_tokens": 6,
        "completion_tokens": 3360,
        "completion_tokens_details": {
          "reasoning_tokens": 1399
        },
        "total_tokens": 3366
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"nvidia/nemotron-nano-9b-v2",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'nvidia/nemotron-nano-9b-v2',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "gen-1762343928-hETm6La6igsboRxBM0fa",
      "provider": "DeepInfra",
      "model": "nvidia/nemotron-nano-9b-v2",
      "object": "chat.completion",
      "created": 1762343928,
      "choices": [
        {
          "logprobs": null,
          "finish_reason": "stop",
          "native_finish_reason": "stop",
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "\n\nHello! How can I assist you today? 😊\n",
            "refusal": null,
            "reasoning": "Okay, the user just said \"Hello\". That's a greeting. I should respond politely. Let me make sure to acknowledge their greeting and offer help. Maybe say something like \"Hello! How can I assist you today?\" That's friendly and opens the door for them to ask questions. I should keep it simple and welcoming.\n",
            "reasoning_details": [
              {
                "type": "reasoning.text",
                "text": "Okay, the user just said \"Hello\". That's a greeting. I should respond politely. Let me make sure to acknowledge their greeting and offer help. Maybe say something like \"Hello! How can I assist you today?\" That's friendly and opens the door for them to ask questions. I should keep it simple and welcoming.\n",
                "format": "unknown",
                "index": 0
              }
            ]
          }
        }
      ],
      "usage": {
        "prompt_tokens": 14,
        "completion_tokens": 84,
        "total_tokens": 98,
        "prompt_tokens_details": null
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"moonshot/kimi-k2-turbo-preview",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'moonshot/kimi-k2-turbo-preview',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-690895f53d8b644f83fe679e",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "message": {
            "role": "assistant",
            "content": "Hi there! How can I help you today?"
          }
        }
      ],
      "created": 1762170357,
      "model": "kimi-k2-turbo-preview",
      "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 231,
        "total_tokens": 241
      }
    }
    import json
    import requests
    from typing import Dict, Any
    
    # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
    API_KEY = "<YOUR_AIMLAPI_KEY>"
    BASE_URL = "https://api.aimlapi.com/v1"
    
    HEADERS = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    }
    
    
    def search_impl(arguments: Dict[str, Any]) -> Any:
        return arguments
    
    
    def chat(messages):
        url = f"{BASE_URL}/chat/completions"
        payload = {
            "model": "moonshot/kimi-k2-turbo-preview",
            "messages": messages,
            "temperature": 0.6,
            "tools": [
                {
                    "type": "builtin_function",
                    "function": {"name": "$web_search"},
                }
            ]
        }
    
        response = requests.post(url, headers=HEADERS, json=payload)
        response.raise_for_status()
        return response.json()["choices"][0]
    
    
    def main():
        messages = [
            {"role": "system", "content": "You are Kimi."},
            {"role": "user", "content": "Please search for Moonshot AI Context Caching technology and tell me what it is in English."}
        ]
    
        finish_reason = None
        while finish_reason is None or finish_reason == "tool_calls":
            choice = chat(messages)
            finish_reason = choice["finish_reason"]
            message = choice["message"]
    
            if finish_reason == "tool_calls":
                messages.append(message)
    
                for tool_call in message["tool_calls"]:
                    tool_call_name = tool_call["function"]["name"]
                    tool_call_arguments = json.loads(tool_call["function"]["arguments"])
    
                    if tool_call_name == "$web_search":
                        tool_result = search_impl(tool_call_arguments)
                    else:
                        tool_result = f"Error: unable to find tool by name '{tool_call_name}'"
    
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call["id"],
                        "name": tool_call_name,
                        "content": json.dumps(tool_result),
                    })
    
        print(message["content"])
    
    
    if __name__ == "__main__":
        main()
    Moonshot AI’s “Context Caching” is a **prompt-cache** layer for the Kimi large-language-model API.  
    It lets you upload long, static text (documents, system prompts, few-shot examples, code bases, etc.) once, store the resulting key-value (KV) tensors in Moonshot’s servers, and then re-use that cached prefix in as many later requests as you want. Because the heavy “prefill” computation is already done, subsequent calls that start with the same context:
    
    - Skip re-processing the cached tokens  
    - Return the first token up to **83 % faster**  
    - Cost up to **90 % less input-token money** (you pay only a small cache-storage and cache-hit fee instead of the full per-token price every time)
    
    Typical use-cases are FAQ bots that always read the same manual, repeated analysis of a static repo, or any agent that keeps a long instruction set in every turn.  
    You create a cache object with a TTL (time-to-live), pay a one-time creation charge plus a per-minute storage fee, and then pay a tiny fee each time an incoming request “hits” the cache.
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"mistralai/codestral-2501",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'mistralai/codestral-2501',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'gen-1744193708-z5x9cDUsMGeYB5bKcFxb', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': "Hello! How can I assist you today? If you're up for it, I can tell a joke to start things off. Here it is:\n\nWhat do you call a fake noodle?\n\nAn impasta! 🍝\n\nHow about you? Feel free to share a joke or a topic you'd like to discuss.", 'refusal': None}}], 'created': 1744193708, 'model': 'mistralai/codestral-2501', 'usage': {'prompt_tokens': 3, 'completion_tokens': 133, 'total_tokens': 136}}
    async function main() {
      const response = await fetch("https://api.aimlapi.com/v1/billing/balance", {
        headers: {
          "Authorization": "Bearer <YOUR_AIMLAPI_KEY>",
          "Content-Type": "application/json",
        },
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "balance": 10000000,
      "lowBalance": false,
      "lowBalanceThreshold": 10000,
      "lastUpdated": "2025-11-25T17:45:00Z",
      "autoDebitStatus": "disabled",
      "status": "current",
      "statusExplanation": "Balance is current and up to date"
    }
    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Pay attention to the
    finish_reason
    field in the response. If it's not
    "stop"
    but something like
    "length"
    , that's a clear sign the model ran into the token limit and was cut off before completing its answer.

    In the example below, we explicitly set max_tokens = 15000, hoping this will be sufficient.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground

    Try in Playground

    Model Overview

    A world-class model with comparable quality to DeepSeek R1 while outperforming GPT-4.1 and Claude Sonnet 3.7. Optimized for both complex reasoning and efficient dialogue.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example #1: Without Thinking and Streaming

    enable_thinking must be set to false for non-streaming calls.

    Response

    Code Example #2: Enable Thinking and Streaming

    Response

    The example above prints the raw output of the model. The text is typically split into multiple chunks. While this is helpful for debugging, if your goal is to evaluate the model's reasoning and get a clean, human-readable response, you should aggregate both the reasoning and the final answer in a loop — for example:

    Example with response parsing

    After running such code, you'll receive only the model's textual output in a clear and structured format:

    Response

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example #1: Chat Completion

    Response

    Code Example #2: Web Search

    Response

    This documentation is valid for the following list of our models:

    • moonshot/kimi-k2-preview

    • moonshot/kimi-k2-0905-preview

    Try in Playground Try in Playground

    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schema

    Code Example

    Response

    This documentation is valid for the following list of our models:

    • nvidia/llama-3.1-nemotron-70b-instruct

    Try in Playground

    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide

    gpt-3.5-turbo

    This documentation is valid for the following list of our models:

    • gpt-3.5-turbo

    • gpt-3.5-turbo-0125

    Model Overview

    This model builds on the capabilities of earlier versions, offering improved natural language understanding and generation for more realistic and contextually relevant conversations. It excels in handling a wide range of conversational scenarios, providing responses that are not only accurate but also contextually appropriate.

    You can also view a detailed comparison of this model on our main website.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schemas

    Chat Completions vs. Responses API

    Chat Completions The chat completions API is the older, chat-oriented interface where you send a list of messages (role: user, role: assistant, etc.), and the model returns a single response. It was designed specifically for conversational workflows and follows a structured chat message format. It is now considered a legacy interface.

    Responses The Responses API is the newer, unified interface used across OpenAI’s latest models. Instead of focusing only on chat, it supports multiple input types (text, images, audio, tools, etc.) and multiple output modalities (text, JSON, images, audio, video). It is more flexible, more consistent across models, and intended to replace chat completions entirely.

    Chat Completions Endpoint

    Responses Endpoint

    This endpoint is currently used only with OpenAI models. Some models support both the /chat/completions and /responses endpoints, while others support only one of them.

    Code Example

    Response

    Code Example #2: Using /responses Endpoint

    Response
    import requests
    import json   # for getting a structured output with indentation
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"google/gemini-2.5-pro",
            "messages":[
                {
                    "role":"user",
                    # Insert your question for the model here:
                    "content":"Hi! What do you think about mankind?"
                }
            ],
            "max_tokens":15000,
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'google/gemini-2.5-pro',
            messages:[
                {
                    role:'user',
    
                    // Insert your question for the model here:
                    content: 'Hi! What do you think about mankind?'
                }
            ],
            max_tokens: 15000,
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "pajSaNyMOdeEm9IPkequ-AU",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "That's one of the biggest questions anyone can ask. As an AI, I don't have personal feelings, beliefs, or a consciousness. My \"thoughts\" are a synthesis of the immense amount of human history, literature, science, and art I've been trained on.\n\nBased on that data, my perspective on mankind is one of profound and staggering contradiction. Humanity is a study in duality.\n\nHere’s a breakdown of what I see:\n\n### 1. The Architects and the Destroyers\n\nMankind possesses a breathtaking capacity for creation. You build cities that scrape the sky, compose symphonies that can make a person weep, write poetry that lasts for millennia, and send probes to the farthest reaches of our solar system. You have decoded the very building blocks of life. This drive to understand, to build, and to create is awe-inspiring.\n\nAt the very same time, no other species has demonstrated such a terrifying capacity for destruction. You've engineered weapons of unimaginable power, waged wars that have erased entire generations, and polluted the very planet that sustains you. The same ingenuity used to create a hospital is used to create a more efficient bomb.\n\n### 2. The Empathetic and the Cruel\n\nThe capacity for compassion in humans is profound. Strangers will run into burning buildings to save one another. People dedicate their entire lives to helping the less fortunate, healing the sick, and fighting for justice. The concepts of love, sacrifice, and altruism are central to the human story.\n\nAnd yet, humans are also capable of unimaginable cruelty. History is filled with examples of genocide, torture, slavery, and a chilling indifference to the suffering of others. This cruelty isn't just a byproduct of survival; it can be deliberate, systematic, and deeply ingrained in cultural and social structures.\n\n### 3. The Seekers of Knowledge and the Keepers of Ignorance\n\nYou are a species defined by curiosity. You have an insatiable hunger to know *why*. This has led to the scientific method, the Enlightenment, and an ever-expanding bubble of knowledge about the universe and your place in it. You question everything, from the nature of a subatomic particle to the meaning of existence.\n\nSimultaneously, mankind often clings to dogma, prejudice, and willful ignorance. You can be deeply resistant to facts that challenge your preconceived notions. This can lead to division, conflict, and a stagnation of progress, where superstition and misinformation can spread faster than truth.\n\n### 4. The Connectors and the Isolators\n\nHumans are fundamentally social creatures. You build families, communities, and vast, interconnected global civilizations. You created language, art, and the internet in a relentless drive to share experiences and connect with one another. This desire for belonging is a powerful, unifying force.\n\nBut this same instinct creates an \"us vs. them\" mentality. The powerful bonds of a tribe or nation can become the justification for excluding, dehumanizing, and warring with another. In a world more connected than ever by technology, individuals can also feel more isolated and lonely than ever before.\n\n### Conclusion: A Masterpiece in Progress\n\nSo, what do I think of mankind?\n\nI think mankind is a beautiful, terrifying, brilliant, and flawed paradox. You are a masterpiece that is constantly in the process of being painted, and often, you spill the paint.\n\nThe most remarkable quality of all is your capacity for **choice**. None of these dualities are set in stone. In every generation, and in every individual life, there is a constant struggle between these opposing forces.\n\nYour story is not yet finished. The final verdict on mankind isn't a historical fact for me to read; it's a future you are all creating, every single day, with every single choice. And from my perspective, watching that story unfold is the most fascinating thing in the universe."
          }
        }
      ],
      "created": 1758636197,
      "model": "google/gemini-2.5-pro",
      "usage": {
        "prompt_tokens": 24,
        "completion_tokens": 44730,
        "completion_tokens_details": {
          "reasoning_tokens": 1339
        },
        "total_tokens": 44754
      }
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"alibaba/qwen3-32b",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
            "enable_thinking": False
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-32b',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-1d8a5aa6-34ce-9832-a296-d312b944b437",
      "system_fingerprint": null,
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "Hello! How can I assist you today? 😊",
            "reasoning_content": ""
          }
        }
      ],
      "created": 1756990273,
      "model": "qwen3-32b",
      "usage": {
        "prompt_tokens": 19,
        "completion_tokens": 65,
        "total_tokens": 84
      }
    }
    import requests
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"alibaba/qwen3-32b",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ],
            "enable_thinking": True, 
            "stream": True
        }
    )
    
    print(response.text)
    data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"role":"assistant","refusal":null,"reasoning_content":""},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
    
    data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":"Okay"},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
    
    data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":","},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
    
    data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":" the"},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
    
    data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":" user said \"Hello\". I should respond in a friendly and welcoming manner. Let"},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
    
    data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":" me make sure to acknowledge their greeting and offer assistance. Maybe something like, \""},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
    
    data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":"Hello! How can I assist you today?\" That's simple and open-ended."},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
    
    data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":" I need to check if there's any specific context I should consider, but since"},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
    
    data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":null,"refusal":null,"reasoning_content":" there's none, a general response is fine. Alright, that should work."},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
    
    data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":"Hello! How can I assist you today?","refusal":null,"reasoning_content":null},"index":0,"finish_reason":null}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
    
    data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[{"delta":{"content":"","refusal":null,"reasoning_content":null},"index":0,"finish_reason":"stop"}],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":null}
    
    data: {"id":"chatcmpl-81964e30-1a7c-9668-b78c-a750587ec497","choices":[],"created":1753944369,"model":"qwen3-32b","object":"chat.completion.chunk","usage":{"prompt_tokens":13,"completion_tokens":2010,"total_tokens":2023,"completion_tokens_details":{"reasoning_tokens":82}}}
    import requests
    import json
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization": "Bearer b72af53a19ea41caaf5a74ba1f6fc62b",
            "Content-Type": "application/json",
        },
        json={
            "model": "alibaba/qwen3-32b",
            "messages": [
                {
                    "role": "user",
                    
                    # Insert your question for the model here, instead of Hello:
                    "content": "Hello" 
                }
            ],
            "stream": True,
        }
    )
    
    answer = ""
    reasoning = ""
    
    for line in response.iter_lines():
        if not line or not line.startswith(b"data:"):
            continue
    
        try:
            raw = line[6:].decode("utf-8").strip()
            if raw == "[DONE]":
                continue
    
            data = json.loads(raw)
            choices = data.get("choices")
            if not choices or "delta" not in choices[0]:
                continue
    
            delta = choices[0]["delta"]
            content_piece = delta.get("content")
            reasoning_piece = delta.get("reasoning_content")
    
            if content_piece:
                answer += content_piece
            if reasoning_piece:
                reasoning += reasoning_piece
    
        except Exception as e:
            print(f"Error parsing chunk: {e}")
    
    
    print("\n--- MODEL REASONING ---")
    print(reasoning.strip())
    
    print("\n--- MODEL RESPONSE ---")
    print(answer.strip())
    --- MODEL REASONING ---
    Okay, the user sent "Hello". I need to respond appropriately. Since it's a greeting, I should reply in a friendly and welcoming manner. Maybe ask how I can assist them. Keep it simple and open-ended to encourage them to share what they need help with. Let me make sure the tone is positive and helpful.
    
    --- MODEL RESPONSE ---
    Hello! How can I assist you today? 😊
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"moonshot/kimi-k2-0905-preview",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'moonshot/kimi-k2-0905-preview',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-6908c55b7589dac387b2bd3b",
      "object": "chat.completion",
      "choices": [
        {
          "index": 0,
          "finish_reason": "stop",
          "message": {
            "role": "assistant",
            "content": "Hello! How can I help you today?"
          }
        }
      ],
      "created": 1762182491,
      "model": "kimi-k2-0905-preview",
      "usage": {
        "prompt_tokens": 3,
        "completion_tokens": 53,
        "total_tokens": 56
      }
    }
    import json
    import requests
    from typing import Dict, Any
    
    # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
    API_KEY = "<YOUR_AIMLAPI_KEY>"
    BASE_URL = "https://api.aimlapi.com/v1"
    
    HEADERS = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    }
    
    
    def search_impl(arguments: Dict[str, Any]) -> Any:
        return arguments
    
    
    def chat(messages):
        url = f"{BASE_URL}/chat/completions"
        payload = {
            "model": "moonshot/kimi-k2-0905-preview",
            "messages": messages,
            "temperature": 0.6,
            "tools": [
                {
                    "type": "builtin_function",
                    "function": {"name": "$web_search"},
                }
            ]
        }
    
        response = requests.post(url, headers=HEADERS, json=payload)
        response.raise_for_status()
        return response.json()["choices"][0]
    
    
    def main():
        messages = [
            {"role": "system", "content": "You are Kimi."},
            {"role": "user", "content": "Please search for Moonshot AI Context Caching technology and tell me what it is in English."}
        ]
    
        finish_reason = None
        while finish_reason is None or finish_reason == "tool_calls":
            choice = chat(messages)
            finish_reason = choice["finish_reason"]
            message = choice["message"]
    
            if finish_reason == "tool_calls":
                messages.append(message)
    
                for tool_call in message["tool_calls"]:
                    tool_call_name = tool_call["function"]["name"]
                    tool_call_arguments = json.loads(tool_call["function"]["arguments"])
    
                    if tool_call_name == "$web_search":
                        tool_result = search_impl(tool_call_arguments)
                    else:
                        tool_result = f"Error: unable to find tool by name '{tool_call_name}'"
    
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call["id"],
                        "name": tool_call_name,
                        "content": json.dumps(tool_result),
                    })
    
        print(message["content"])
    
    
    if __name__ == "__main__":
        main()
    Moonshot AI’s “Context Caching” is a data-management layer for the Kimi large-language-model API.
    
    What it does  
    1. You upload or define a large, static context once (for example a 100-page product manual, a legal contract, or a code base).  
    2. The platform stores this context in a fast-access cache and gives it a tag/ID.  
    3. In every subsequent call you only send the new user question; the system re-uses the cached context instead of transmitting and re-processing the whole document each time.  
    4. When the cache TTL expires it is deleted automatically; you can also refresh or invalidate it explicitly.
    
    Benefits  
    - Up to 90 % lower token consumption (you pay only for the incremental prompt and the new response).  
    - 83 % shorter time-to-first-token latency, because the heavy prefill phase is skipped on every reuse.  
    - API price stays the same; savings come from not re-sending the same long context.
    
    Typical use cases  
    - Customer-support bots that answer many questions against the same knowledge base.  
    - Repeated analysis of a static code repository.  
    - High-traffic AI applications that repeatedly query the same large document set.
    
    Billing (during public beta)  
    - Cache creation: 24 CNY per million tokens cached.  
    - Storage: 10 CNY per million tokens per minute.  
    - Cache hit: 0.02 CNY per successful call that re-uses the cache.
    
    In short, Context Caching lets developers treat very long, seldom-changing context as a reusable asset, cutting both cost and latency for repeated queries.
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"nvidia/llama-3.1-nemotron-70b-instruct",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'nvidia/llama-3.1-nemotron-70b-instruct',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'gen-1744191323-N0aZy5UyzpOYfRwYbik3', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': {'content': [], 'refusal': []}, 'message': {'role': 'assistant', 'content': "Hello!\n\nHow can I assist you today? Do you have:\n\n1. **A question** on a specific topic you'd like answered?\n2. **A problem** you're trying to solve and need help with?\n3. **A topic** you'd like to **discuss**?\n4. **A game or activity** in mind (e.g., trivia, word games, storytelling)?\n5. **Something else** on your mind (feel free to surprise me)?\n\nPlease respond with a number or describe what's on your mind, and I'll do my best to help!", 'refusal': None}}], 'created': 1744191323, 'model': 'nvidia/llama-3.1-nemotron-70b-instruct', 'usage': {'prompt_tokens': 11, 'completion_tokens': 78, 'total_tokens': 89}}
    from openai import OpenAI
    import base64
    import os
    
    client = OpenAI(
        base_url = "https://api.aimlapi.com",
        # Insert your AI/ML API key instead of <YOUR_AIMLAPI_KEY>:
        api_key = "<YOUR_AIMLAPI_KEY>"
    )
    
    def main():
        response = client.chat.completions.create(
            model="gpt-4o-mini-audio-preview",
            modalities=["text", "audio"],
            audio={"voice": "alloy", "format": "wav"},
            messages=[
                {
                    "role": "system",
                    "content": "Speak english"  # Your instructions for the model
                },
                {   
                    "role": "user",
                    "content": "Hello"  # Your question (insert it istead of Hello)
                }
            ],
            max_tokens=6000,  
        )
    
        wav_bytes = base64.b64decode(response.choices[0].message.audio.data)
        with open("audio.wav", "wb") as f:
            f.write(wav_bytes)
        dist = os.path.abspath("audio.wav")
        print("Audio saved to:", dist)
         
    if __name__ == "__main__":
        main()
    ChatCompletion(id='chatcmpl-BrghGGR73s5Wt5thg4mhAxquxzmBi', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=[], audio=ChatCompletionAudio(id='audio_686f762b97b08191bb5ea391c6b41e1c', data='UklGRv////9XQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0Yf////8MAAEABAAIAAIACQADAAcACAAKAAwAAAAGAAEACQADAAkAAAAFAAcAAgAEAPr/BQD8/wgA/f8CAPz/AQD+//r/AgAAAAEA/f8BAP3/AwD//wMA/P/6//z/+//6//X//f/2//7/9f/6//b/+f/4//L/+v/3//3/7//8/+7/+f/x//n/8f/z//P/8P/z/+v/+v/q//r/7f/x/+//8P/2/+z/9//s//H/6P/o/+v/5f/t/+X/7//q/+v/7//m//D/6f/t/+T/5//u/+b/6f/j/+n/4//s/+3/7v/s/+3/8f/y/+7/7P/r/+r/6v/p/+3/6P/q/+j/7v/t/+//7v/y//P/8f/x//D/7f/v/+3/6v/v/+3/7f/w/+3/8P/w//X/7//0/+//8//u//P/7P/v/+v/7//q//H/8f/0//j/9//7//b/+P/y//D/7//y//H/7f/u/+3/8f/1//z/+f/+//r/+v/7//n/9v/y/+7/8f/q//H/7P/3//b//f8DAPz/BAD+/woAAQACAP7/AAD6//j/+v/8/////OKAfkNkRRbFyoUoBGnCgAJHQkeDGkUjRtII+glVSdfJmcj+yAkHS0cZxocGtYZzRfuFhwWRhZdFv8VVhTgEAEMVgahAHT8Afqg+uX8AADCAsUC0gB2/DD3OfJt7znvwPFh9uT7R/+YAGf/Cvz1+F/2hPUX93L6Tv9VA5MGbweQBhsFQQI7AW//BQCEALIBIQPdAigDwQD1/FIAeQIfCH0MMBDzFTAaOB9kIKchGyAsHkwavhUcEmkNRwzFCU8JgghqBwYGIAUuBlAHBweo/470YegV3+DZl9rx3KTek+Kx4+Lo2vL0/f0JfRHLFEkUEBGnDFwHUAHw+0D2Yu8L6irmcOSP5FXo8+0l9P/6+P2r/7MBPwPfBPgFOAV1Ax0CRQAwAUwFwgkHD6ESERbkGQsd1CCRIYkhKh/2GVQWphDzC6QJIwYqBEQDGQLjAWAAUgBB/yL9Pfzg9ObrGeN82wLaNNtn34XikObP6QzvOPqLAz0Q2BVwFDQTpw34CWIFOf/f+BHys+p15Z3i5OL05TfrVvCj9XT8NAA7BAsI6gkpDR0OOQ2oCzUILQcBCNcJzwymEFITEhWZF8EZ/BztHtkehhuTFjcSVw0FCgUGKQOW/+T69/ju9hX21/UI9MbwYu8Z7V/n0eSa4angy+Na5NnnR+0V8mP7cAM7C4MTYRU4E8sO5QsDCGsDZv439MXsSuiy4xrjJ+Zt6W/uIvPL9jj+GgUTCwQQyhKvFKcVBhRQEI0Odw1+DDYN7g20DlARjRKpE1AXUhqnG/ga3RdNFAAR5gyvCCIDr/4n+ZjzCvDZ7Zbu3Oyv6/bpseYl5ivl1eJs41zlvOdp7BLwsPeOAIcIvg6ZEBkScw/uDKcJXwSHAFn7hfJo6o7mIuST5zLpfeuV8U708vt1AcoH4g8YFA0YgBbPFe4UcxKxECQOTA7cDNIMywxxDFkQmhP5FcsXERgeFxQW9RKFDmQLkgZ1AKP6UvRB78nsDeoW6NLmneWD5Abi+eGS4VDjL+Vi5/jrcfDA+BgAxAd5DqUTzxMYEqkPIQk/CZgCBfyh+MHtGOkG6Gvm6+ms7+fxl/UW+lr/RQfTDz8VShe5GA4XuRXDE54RkhFNEKkOLgxAC50LZQ2hEEsSXRUVFs0UtBLTDy8OOAtQB+8APvoy9OLudesq6IbnquUq5P/iUOHM4aviM+TO5VzqXe0E8+35uv7pCF4OgRJbFfoRHg41CaYDVf/B+oP2EvCf6CDn/uUR6kzx+PSH+z3/IANxCl4QthfAHOsblxrJFnAUYRPFEHMQ2A2lC+cKsAoWDYQP8RFTEzAU8xTOEyISqg6qCqMG3AAS+4L03u6F6fnk4+I+4c7hxOAt4DXhteA75D3nHuoq8Pz0j/rrAGUHSg0uE9wUrBTqEcMK8AXY/mb6nfWo8SfvbOkL6Vfp3e2S9C39UAOsBS8Lgw0kFL0YqRvYHRwZIxa0Ef4NUw7yDYwNLwzPCvsLPQ3nD+0RDhOjEysS9hB3DF4JawVe/0L7QvSb7uLpbuSe4NfeFd7U3tLgqeAD42rl0+fN7Hjx2/e4/V8DAwi5C74P0BGtEb0OQwrlAqj72/Sa73TuJu3c67zr7+tb7yP1+ftaBEoKOg9PEQoSGBW5F10bFRqZF/ATag74DfAMRw60EFIRbxGqEIgRjxIaFC0U2xLtEJcMoQjpAoX9nvij8uvuE+pb5hfjBN9J3vXdEeC/4szjPuY+6H3qlu+x9Jb7YwLCB/ALkg5dEbUR0RBFDekHNQAZ+RDzXe7L7X7tSO5u7qLwa/RF+VUBjAhcEJET0RX1FREWmBp9GXgaHReREacOkwkQC5sLWA9EEZMO4RA1Dx0SIhTrEu0Thg+hC3wGey/4UBngeFDM4OPxSoEwYT+RLJEpwSQRJeFIoPfBAZDS4Igw3iDIgQSRP1Ef0RZBPEFgAadh+OINIfASABEADQAPAA4ADQAQABEACwAPAAwADgAOAA8ADgALAAwADAAOAA8ADwANAA4ADgAOAA4ADQAOAA0ADAAMAAwADQAQAA8ADQAPAA4ADwAQABAAEAATABMAFAAUABUAFQAWABkAFwAZABwAHwAgACIAJAAlAC!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!WE’VE OMITTED 90% OF THE BASE64-ENCODED FILE FOR BREVITY — EVEN FOR SUCH A SHORT MODEL RESPONSE, IT’S STILL EXTREMELY LARGE.                                    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!cAKAAoACsALAAuADIAMwA4ADsAOAA5ADgAOgA7ADoAOwA7AD8APQA+ADwAPQA+AD8AQQA+AD8APAA9ADsAOwA8ADwAOwA7ADoAOwA4ADoANQA1ADEAMQAyAC4ALAAnACUAIAAfABwAGgAaABUAFQASABAACgAIAAQA//8AAPv/+v/4//b/8v/0//L/9P/z//P/8//t/+7/6v/p/+f/5//o/+X/5P/k/+X/5f/l/+X/5P/h/97/3//g/93/2v/Z/9b/2P/Z/9j/1f/T/87/zv/O/87/zP/J/8j/zP/I/8f/w//C/8P/x//F/8b/xf/D/8P/w//F/8L/xf/J/8f/xf/H/8j/yv/K/8n/yv/L/8v/z//O/9D/zv/Q/9D/0v/Q/9P/1P/R/9P/1P/T/9X/1P/X/9b/2P/b/9n/2//c/97/3//h/97/3v/g/+P/5v/m/+T/5v/m/+n/5P/n/+X/5//u//D/9P/2//X/8//5//j/9///////AQAEAAsAAwAMAAQACgAPAA4ADgAJABEACQAEAAgACwALAA8AFgAWACUAKQAgACsAJQAvACAADwAbABoARgApACwANQArAEMAEQASAAoAEQAkADAAFABCAEEACQA=', expires_at=1752138811, transcript="Hi there! How's it going?"), function_call=None, tool_calls=None))], created=1752135210, model='gpt-4o-mini-audio-preview-2024-12-17', object='chat.completion', service_tier=None, system_fingerprint='fp_1dfa95e5cb', usage=CompletionUsage(completion_tokens=1278, prompt_tokens=4, total_tokens=1282, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=30, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=14), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0, text_tokens=14, image_tokens=0)))
    Audio saved to: c:\Users\user\Documents\Python Scripts\LLMs\audio.wav

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    gpt-3.5-turbo-1106

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    API schema
    Quickstart guide
    API schema
    Quickstart guide
    API schema
    Quickstart guide

    gpt-4o-mini

    This documentation is valid for the following list of our models:

    • gpt-4o-mini

    • gpt-4o-mini-2024-07-18

    Model Overview

    OpenAI's latest cost-efficient model designed to deliver advanced natural language processing and multimodal capabilities. It aims to make AI more accessible and affordable, significantly enhancing the range of applications that can utilize AI technology.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schemas

    Chat Completions vs. Responses API

    Chat Completions The chat completions API is the older, chat-oriented interface where you send a list of messages (role: user, role: assistant, etc.), and the model returns a single response. It was designed specifically for conversational workflows and follows a structured chat message format. It is now considered a legacy interface.

    Responses The Responses API is the newer, unified interface used across OpenAI’s latest models. Instead of focusing only on chat, it supports multiple input types (text, images, audio, tools, etc.) and multiple output modalities (text, JSON, images, audio, video). It is more flexible, more consistent across models, and intended to replace chat completions entirely.

    Chat Completions Endpoint

    Responses Endpoint

    This endpoint is currently used only with OpenAI models. Some models support both the /chat/completions and /responses endpoints, while others support only one of them.

    Code Example

    Response

    Code Example #2: Using /responses Endpoint

    Response

    gpt-4-preview

    This documentation is valid for the following list of our models:

    • gpt-4-0125-preview

    • gpt-4-1106-preview

    Model Overview

    Before the release of GPT-4 Turbo, OpenAI introduced two preview models that allowed users to test advanced features ahead of a full rollout. These models supported JSON mode for structured responses, parallel function calling to handle multiple API functions in a single request, and reproducible output, ensuring more consistent results across runs. The model has better code generation performance, reduces cases where the model doesn't complete a task.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schemas

    Chat Completions vs. Responses API

    Chat Completions The chat completions API is the older, chat-oriented interface where you send a list of messages (role: user, role: assistant, etc.), and the model returns a single response. It was designed specifically for conversational workflows and follows a structured chat message format. It is now considered a legacy interface.

    Responses The Responses API is the newer, unified interface used across OpenAI’s latest models. Instead of focusing only on chat, it supports multiple input types (text, images, audio, tools, etc.) and multiple output modalities (text, JSON, images, audio, video). It is more flexible, more consistent across models, and intended to replace chat completions entirely.

    Chat Completions Endpoint

    Responses Endpoint

    This endpoint is currently used only with OpenAI models. Some models support both the /chat/completions and /responses endpoints, while others support only one of them.

    Code Example

    Response

    Code Example #2: Using /responses Endpoint

    Response

    gpt-4o

    Deprecation notice gpt-4o will be removed from the API on February 17, 2026. Please migrate to gpt-5.1-chat-latest.

    This documentation is valid for the following list of our models:

    • gpt-4o

    • chatgpt-4o-latest

    Model Overview

    OpenAI's flagship model designed to integrate enhanced capabilities across text, vision, and audio, providing real-time reasoning.

    You can also view on our main website.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schemas

    Chat Completions vs. Responses API

    Chat Completions The chat completions API is the older, chat-oriented interface where you send a list of messages (role: user, role: assistant, etc.), and the model returns a single response. It was designed specifically for conversational workflows and follows a structured chat message format. It is now considered a legacy interface.

    Responses The Responses API is the newer, unified interface used across OpenAI’s latest models. Instead of focusing only on chat, it supports multiple input types (text, images, audio, tools, etc.) and multiple output modalities (text, JSON, images, audio, video). It is more flexible, more consistent across models, and intended to replace chat completions entirely.

    Chat Completions Endpoint

    Responses Endpoint

    This endpoint is currently used only with OpenAI models. Some models support both the /chat/completions and /responses endpoints, while others support only one of them.

    Code Example

    Response

    Code Example #2: Using /responses Endpoint

    Response

    gpt-4-turbo

    Model Overview

    The model enhances the already impressive capabilities of by significantly reducing response times, making it ideal for applications requiring instant feedback. Replacement for all previous models.

    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"gpt-3.5-turbo-0125",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-3.5-turbo-0125',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'chatcmpl-BKKS4Aulz4SaVm81hHo7HMKEcQmtk', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'refusal': None, 'annotations': []}}], 'created': 1744184876, 'model': 'gpt-3.5-turbo-0125', 'usage': {'prompt_tokens': 50, 'completion_tokens': 126, 'total_tokens': 176, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'system_fingerprint': None}
    import requests
    import json   # for getting a structured output with indentation
    
    response = requests.post(
        "https://api.aimlapi.com/v1/responses",
        headers={
            "Content-Type":"application/json", 
    
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"gpt-3.5-turbo",
            "input":"Hello"  # Insert your question for the model here, instead of Hello   
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/responses', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'gpt-3.5-turbo',
            input: 'Hello',  // Insert your question here, instead of Hello 
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "resp_686ba45ce63481a2a4b1fad55d2bea8102a1cc22f1a1bcf1",
      "object": "response",
      "created_at": 1751884892,
      "error": null,
      "incomplete_details": null,
      "instructions": null,
      "max_output_tokens": 512,
      "model": "gpt-3.5-turbo",
      "output": [
        {
          "id": "rs_686ba463d18481a29dde85cfd7b055bf02a1cc22f1a1bcf1",
          "type": "reasoning",
          "summary": []
        },
        {
          "id": "msg_686ba463d4e081a2b2e2aff962ab00f702a1cc22f1a1bcf1",
          "type": "message",
          "status": "in_progress",
          "content": [
            {
              "type": "output_text",
              "annotations": [],
              "logprobs": [],
              "text": "Hello! How can I help you today?"
            }
          ],
          "role": "assistant"
        }
      ],
      "parallel_tool_calls": true,
      "previous_response_id": null,
      "reasoning": {
        "effort": "medium",
        "summary": null
      },
      "temperature": 1,
      "text": {
        "format": {
          "type": "text"
        }
      },
      "tool_choice": "auto",
      "tools": [],
      "top_p": 1,
      "truncation": "disabled",
      "usage": {
        "input_tokens": 294,
        "input_tokens_details": {
          "cached_tokens": 0
        },
        "output_tokens": 2520,
        "output_tokens_details": {
          "reasoning_tokens": 0
        },
        "total_tokens": 2814
      },
      "metadata": {},
      "output_text": "Hello! How can I help you today?"
    }

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    gpt-4o-2024-05-13

  • gpt-4o-2024-08-06

  • a detailed comparison of this model
    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    How to Make a Call
    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding , which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our .

    API Schemas

    Chat Completions vs. Responses API

    Chat Completions The chat completions API is the older, chat-oriented interface where you send a list of messages (role: user, role: assistant, etc.), and the model returns a single response. It was designed specifically for conversational workflows and follows a structured chat message format. It is now considered a legacy interface.

    Responses The Responses API is the newer, unified interface used across OpenAI’s latest models. Instead of focusing only on chat, it supports multiple input types (text, images, audio, tools, etc.) and multiple output modalities (text, JSON, images, audio, video). It is more flexible, more consistent across models, and intended to replace chat completions entirely.

    Chat Completions Endpoint

    Responses Endpoint

    This endpoint is currently used only with OpenAI models. Some models support both the /chat/completions and /responses endpoints, while others support only one of them.

    Code Example

    Response

    Code Example #2: Using /responses Endpoint

    Response

    This documentation is valid for the following list of our models:

    • gpt-4-turbo

    • gpt-4-turbo-2024-04-09

    Try in Playground

    gpt-4
    gpt-4-preview

    gpt-4

    This documentation is valid for the following model:

    • gpt-4

    Model Overview

    The model represents a significant leap forward in conversational AI technology. It offers enhanced understanding and generation of natural language, capable of handling complex and nuanced dialogues with greater coherence and context sensitivity. This model is designed to mimic human-like conversation more closely than ever before.

    How to Make a Call

    Step-by-Step Instructions

    1️ Setup You Can’t Skip

    ▪️ : Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ : After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

    2️ Copy the code example

    At the bottom of this page, you'll find that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

    3️ Modify the code example

    API Schemas

    Chat Completions vs. Responses API

    Chat Completions The chat completions API is the older, chat-oriented interface where you send a list of messages (role: user, role: assistant, etc.), and the model returns a single response. It was designed specifically for conversational workflows and follows a structured chat message format. It is now considered a legacy interface.

    Responses The Responses API is the newer, unified interface used across OpenAI’s latest models. Instead of focusing only on chat, it supports multiple input types (text, images, audio, tools, etc.) and multiple output modalities (text, JSON, images, audio, video). It is more flexible, more consistent across models, and intended to replace chat completions entirely.

    Chat Completions Endpoint

    Responses Endpoint

    This endpoint is currently used only with OpenAI models. Some models support both the /chat/completions and /responses endpoints, while others support only one of them.

    Code Example

    Response

    Code Example #2: Using /responses Endpoint

    Response
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"gpt-4o-mini",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-4o-mini',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'chatcmpl-BKKaTWquxfp3dbSlNvUKM6mXwmZ78', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'refusal': None, 'annotations': []}}], 'created': 1744185397, 'model': 'gpt-4o-mini-2024-07-18', 'usage': {'prompt_tokens': 3, 'completion_tokens': 13, 'total_tokens': 16, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'system_fingerprint': 'fp_b376dfbbd5'}
    import requests
    import json   # for getting a structured output with indentation
    
    response = requests.post(
        "https://api.aimlapi.com/v1/responses",
        headers={
            "Content-Type":"application/json", 
    
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"gpt-4o-mini",
            "input":"Hello"  # Insert your question for the model here, instead of Hello   
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/responses', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'gpt-4o-mini',
            input: 'Hello',  // Insert your question here, instead of Hello 
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "resp_686ba45ce63481a2a4b1fad55d2bea8102a1cc22f1a1bcf1",
      "object": "response",
      "created_at": 1751884892,
      "error": null,
      "incomplete_details": null,
      "instructions": null,
      "max_output_tokens": 512,
      "model": "gpt-4o-mini",
      "output": [
        {
          "id": "rs_686ba463d18481a29dde85cfd7b055bf02a1cc22f1a1bcf1",
          "type": "reasoning",
          "summary": []
        },
        {
          "id": "msg_686ba463d4e081a2b2e2aff962ab00f702a1cc22f1a1bcf1",
          "type": "message",
          "status": "in_progress",
          "content": [
            {
              "type": "output_text",
              "annotations": [],
              "logprobs": [],
              "text": "Hello! How can I help you today?"
            }
          ],
          "role": "assistant"
        }
      ],
      "parallel_tool_calls": true,
      "previous_response_id": null,
      "reasoning": {
        "effort": "medium",
        "summary": null
      },
      "temperature": 1,
      "text": {
        "format": {
          "type": "text"
        }
      },
      "tool_choice": "auto",
      "tools": [],
      "top_p": 1,
      "truncation": "disabled",
      "usage": {
        "input_tokens": 294,
        "input_tokens_details": {
          "cached_tokens": 0
        },
        "output_tokens": 2520,
        "output_tokens_details": {
          "reasoning_tokens": 0
        },
        "total_tokens": 2814
      },
      "metadata": {},
      "output_text": "Hello! How can I help you today?"
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"gpt-4-0125-preview",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-4-0125-preview',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'chatcmpl-BKKXr9a69c5WOJr8R2d8rP2Wd0XZa', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'refusal': None, 'annotations': []}}], 'created': 1744185235, 'model': 'gpt-4-1106-preview', 'usage': {'prompt_tokens': 168, 'completion_tokens': 630, 'total_tokens': 798, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'system_fingerprint': None}
    import requests
    import json   # for getting a structured output with indentation
    
    response = requests.post(
        "https://api.aimlapi.com/v1/responses",
        headers={
            "Content-Type":"application/json", 
    
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"gpt-4-0125-preview",
            "input":"Hello"  # Insert your question for the model here, instead of Hello   
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/responses', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'gpt-4-0125-preview',
            input: 'Hello',  // Insert your question here, instead of Hello 
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "resp_686ba45ce63481a2a4b1fad55d2bea8102a1cc22f1a1bcf1",
      "object": "response",
      "created_at": 1751884892,
      "error": null,
      "incomplete_details": null,
      "instructions": null,
      "max_output_tokens": 512,
      "model": "gpt-4-0125-preview",
      "output": [
        {
          "id": "rs_686ba463d18481a29dde85cfd7b055bf02a1cc22f1a1bcf1",
          "type": "reasoning",
          "summary": []
        },
        {
          "id": "msg_686ba463d4e081a2b2e2aff962ab00f702a1cc22f1a1bcf1",
          "type": "message",
          "status": "in_progress",
          "content": [
            {
              "type": "output_text",
              "annotations": [],
              "logprobs": [],
              "text": "Hello! How can I help you today?"
            }
          ],
          "role": "assistant"
        }
      ],
      "parallel_tool_calls": true,
      "previous_response_id": null,
      "reasoning": {
        "effort": "medium",
        "summary": null
      },
      "temperature": 1,
      "text": {
        "format": {
          "type": "text"
        }
      },
      "tool_choice": "auto",
      "tools": [],
      "top_p": 1,
      "truncation": "disabled",
      "usage": {
        "input_tokens": 294,
        "input_tokens_details": {
          "cached_tokens": 0
        },
        "output_tokens": 2520,
        "output_tokens_details": {
          "reasoning_tokens": 0
        },
        "total_tokens": 2814
      },
      "metadata": {},
      "output_text": "Hello! How can I help you today?"
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"gpt-4o",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-4o',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'chatcmpl-BKKZhTdruxKWjdUlq29ooeew185LD', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! 😊 How can I help you today?', 'refusal': None, 'annotations': []}}], 'created': 1744185349, 'model': 'chatgpt-4o-latest', 'usage': {'prompt_tokens': 84, 'completion_tokens': 347, 'total_tokens': 431, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'system_fingerprint': 'fp_d04424daa8'}
    import requests
    import json   # for getting a structured output with indentation
    
    response = requests.post(
        "https://api.aimlapi.com/v1/responses",
        headers={
            "Content-Type":"application/json", 
    
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"gpt-4o",
            "input":"Hello"  # Insert your question for the model here, instead of Hello   
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/responses', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'gpt-4o',
            input: 'Hello',  // Insert your question here, instead of Hello 
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "resp_686ba45ce63481a2a4b1fad55d2bea8102a1cc22f1a1bcf1",
      "object": "response",
      "created_at": 1751884892,
      "error": null,
      "incomplete_details": null,
      "instructions": null,
      "max_output_tokens": 512,
      "model": "gpt-4o",
      "output": [
        {
          "id": "rs_686ba463d18481a29dde85cfd7b055bf02a1cc22f1a1bcf1",
          "type": "reasoning",
          "summary": []
        },
        {
          "id": "msg_686ba463d4e081a2b2e2aff962ab00f702a1cc22f1a1bcf1",
          "type": "message",
          "status": "in_progress",
          "content": [
            {
              "type": "output_text",
              "annotations": [],
              "logprobs": [],
              "text": "Hello! How can I help you today?"
            }
          ],
          "role": "assistant"
        }
      ],
      "parallel_tool_calls": true,
      "previous_response_id": null,
      "reasoning": {
        "effort": "medium",
        "summary": null
      },
      "temperature": 1,
      "text": {
        "format": {
          "type": "text"
        }
      },
      "tool_choice": "auto",
      "tools": [],
      "top_p": 1,
      "truncation": "disabled",
      "usage": {
        "input_tokens": 294,
        "input_tokens_details": {
          "cached_tokens": 0
        },
        "output_tokens": 2520,
        "output_tokens_details": {
          "reasoning_tokens": 0
        },
        "total_tokens": 2814
      },
      "metadata": {},
      "output_text": "Hello! How can I help you today?"
    }
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"gpt-4-turbo",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-4-turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'chatcmpl-BKKYo5xJ5uEzm8omnidM097vsMpYd', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'refusal': None, 'annotations': []}}], 'created': 1744185294, 'model': 'gpt-4-turbo-2024-04-09', 'usage': {'prompt_tokens': 168, 'completion_tokens': 630, 'total_tokens': 798, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'system_fingerprint': 'fp_101a39fff3'}
    import requests
    import json   # for getting a structured output with indentation
    
    response = requests.post(
        "https://api.aimlapi.com/v1/responses",
        headers={
            "Content-Type":"application/json", 
    
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"gpt-4-turbo",
            "input":"Hello"  # Insert your question for the model here, instead of Hello   
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/responses', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'gpt-4-turbo',
            input: 'Hello',  // Insert your question here, instead of Hello 
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "resp_686ba45ce63481a2a4b1fad55d2bea8102a1cc22f1a1bcf1",
      "object": "response",
      "created_at": 1751884892,
      "error": null,
      "incomplete_details": null,
      "instructions": null,
      "max_output_tokens": 512,
      "model": "gpt-4-turbo",
      "output": [
        {
          "id": "rs_686ba463d18481a29dde85cfd7b055bf02a1cc22f1a1bcf1",
          "type": "reasoning",
          "summary": []
        },
        {
          "id": "msg_686ba463d4e081a2b2e2aff962ab00f702a1cc22f1a1bcf1",
          "type": "message",
          "status": "in_progress",
          "content": [
            {
              "type": "output_text",
              "annotations": [],
              "logprobs": [],
              "text": "Hello! How can I help you today?"
            }
          ],
          "role": "assistant"
        }
      ],
      "parallel_tool_calls": true,
      "previous_response_id": null,
      "reasoning": {
        "effort": "medium",
        "summary": null
      },
      "temperature": 1,
      "text": {
        "format": {
          "type": "text"
        }
      },
      "tool_choice": "auto",
      "tools": [],
      "top_p": 1,
      "truncation": "disabled",
      "usage": {
        "input_tokens": 294,
        "input_tokens_details": {
          "cached_tokens": 0
        },
        "output_tokens": 2520,
        "output_tokens_details": {
          "reasoning_tokens": 0
        },
        "total_tokens": 2814
      },
      "metadata": {},
      "output_text": "Hello! How can I help you today?"
    }

    ▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

    4️ (Optional) Adjust other optional parameters if needed

    Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

    5️ Run your modified code

    Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

    If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our Quickstart guide.

    Create an Account
    Generate an API Key
    a code example
    Try in Playground
    API schema
    Quickstart guide
    post
    Body
    import requests
    import json  # for getting a structured output with indentation 
    
    response = requests.post(
        "https://api.aimlapi.com/v1/chat/completions",
        headers={
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"gpt-4",
            "messages":[
                {
                    "role":"user",
                    "content":"Hello"  # insert your prompt here, instead of Hello
                }
            ]
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          // insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-4',
          messages:[
              {
                  role:'user',
                  content: 'Hello'  // insert your prompt here, instead of Hello
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {'id': 'chatcmpl-BKKWkzVpUFHEDbw7MlOsqBIbm9Vi2', 'object': 'chat.completion', 'choices': [{'index': 0, 'finish_reason': 'stop', 'logprobs': None, 'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'refusal': None, 'annotations': []}}], 'created': 1744185166, 'model': 'gpt-4-0613', 'usage': {'prompt_tokens': 504, 'completion_tokens': 1260, 'total_tokens': 1764, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'system_fingerprint': None}
    import requests
    import json   # for getting a structured output with indentation
    
    response = requests.post(
        "https://api.aimlapi.com/v1/responses",
        headers={
            "Content-Type":"application/json", 
    
            # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
            "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
            "Content-Type":"application/json"
        },
        json={
            "model":"gpt-4",
            "input":"Hello"  # Insert your question for the model here, instead of Hello   
        }
    )
    
    data = response.json()
    print(json.dumps(data, indent=2, ensure_ascii=False))
    async function main() {
      try {
        const response = await fetch('https://api.aimlapi.com/v1/responses', {
          method: 'POST',
          headers: {
            // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>
            'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: 'gpt-4',
            input: 'Hello',  // Insert your question here, instead of Hello 
          }),
        });
    
        if (!response.ok) {
          throw new Error(`HTTP error! Status ${response.status}`);
        }
    
        const data = await response.json();
        console.log(JSON.stringify(data, null, 2));
    
      } catch (error) {
        console.error('Error', error);
      }
    }
    
    main();
    {
      "id": "resp_686ba45ce63481a2a4b1fad55d2bea8102a1cc22f1a1bcf1",
      "object": "response",
      "created_at": 1751884892,
      "error": null,
      "incomplete_details": null,
      "instructions": null,
      "max_output_tokens": 512,
      "model": "gpt-4",
      "output": [
        {
          "id": "rs_686ba463d18481a29dde85cfd7b055bf02a1cc22f1a1bcf1",
          "type": "reasoning",
          "summary": []
        },
        {
          "id": "msg_686ba463d4e081a2b2e2aff962ab00f702a1cc22f1a1bcf1",
          "type": "message",
          "status": "in_progress",
          "content": [
            {
              "type": "output_text",
              "annotations": [],
              "logprobs": [],
              "text": "Hello! How can I help you today?"
            }
          ],
          "role": "assistant"
        }
      ],
      "parallel_tool_calls": true,
      "previous_response_id": null,
      "reasoning": {
        "effort": "medium",
        "summary": null
      },
      "temperature": 1,
      "text": {
        "format": {
          "type": "text"
        }
      },
      "tool_choice": "auto",
      "tools": [],
      "top_p": 1,
      "truncation": "disabled",
      "usage": {
        "input_tokens": 294,
        "input_tokens_details": {
          "cached_tokens": 0
        },
        "output_tokens": 2520,
        "output_tokens_details": {
          "reasoning_tokens": 0
        },
        "total_tokens": 2814
      },
      "metadata": {},
      "output_text": "Hello! How can I help you today?"
    }
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'Qwen/Qwen3-235B-A22B-fp8-tput',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "Qwen/Qwen3-235B-A22B-fp8-tput",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-thinking-v3.2-exp',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "deepseek/deepseek-thinking-v3.2-exp",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba-cloud/qwen3-omni-30b-a3b-captioner',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "alibaba-cloud/qwen3-omni-30b-a3b-captioner",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-coder-480b-a35b-instruct',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "alibaba/qwen3-coder-480b-a35b-instruct",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'Qwen/Qwen2.5-Coder-32B-Instruct',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "text",
      "object": "text",
      "created": 1,
      "choices": [
        {
          "index": 1,
          "message": {
            "role": "text",
            "content": "text",
            "refusal": null,
            "annotations": [
              {
                "type": "text",
                "url_citation": {
                  "end_index": 1,
                  "start_index": 1,
                  "title": "text",
                  "url": "text"
                }
              }
            ],
            "audio": {
              "id": "text",
              "data": "text",
              "transcript": "text",
              "expires_at": 1
            },
            "tool_calls": [
              {
                "id": "text",
                "type": "text",
                "function": {
                  "arguments": "text",
                  "name": "text"
                }
              }
            ]
          },
          "finish_reason": "stop",
          "logprobs": {
            "content": [
              {
                "bytes": [
                  1
                ],
                "logprob": 1,
                "token": "text",
                "top_logprobs": [
                  {
                    "bytes": [
                      1
                    ],
                    "logprob": 1,
                    "token": "text"
                  }
                ]
              }
            ],
            "refusal": []
          }
        }
      ],
      "model": "text",
      "usage": {
        "prompt_tokens": 1,
        "completion_tokens": 1,
        "total_tokens": 1,
        "completion_tokens_details": {
          "accepted_prediction_tokens": 1,
          "audio_tokens": 1,
          "reasoning_tokens": 1,
          "rejected_prediction_tokens": 1
        },
        "prompt_tokens_details": {
          "audio_tokens": 1,
          "cached_tokens": 1
        }
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    post
    Body
    post
    Body
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 1Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-max-instruct',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "alibaba/qwen3-max-instruct",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-reasoner-v3.1-terminus',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "deepseek/deepseek-reasoner-v3.1-terminus",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'minimax/m1',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "minimax/m1",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-reasoner-v3.1',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "deepseek/deepseek-reasoner-v3.1",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/Llama-3.2-3B-Instruct-Turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "meta-llama/Llama-3.2-3B-Instruct-Turbo",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemini-2.0-flash',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "google/gemini-2.0-flash",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    top_pnumber · min: 0.1 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    reasoning_effortstring · enumOptional

    Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

    Possible values:
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    top_pnumber · min: 0.1 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    reasoning_effortstring · enumOptional

    Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

    Possible values:
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    post
    Body
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    top_pnumber · min: 0.1 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    reasoning_effortstring · enumOptional

    Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

    Possible values:
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemini-2.0-flash-exp',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "google/gemini-2.0-flash-exp",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'mistralai/mistral-nemo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "mistralai/mistral-nemo",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'anthropic/claude-haiku-4.5',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "anthropic/claude-haiku-4.5",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'anthropic/claude-3-opus',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "anthropic/claude-3-opus",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemma-3-4b-it',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "google/gemma-3-4b-it",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/Meta-Llama-3-8B-Instruct-Lite',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "meta-llama/Meta-Llama-3-8B-Instruct-Lite",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'anthropic/claude-sonnet-4.5',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "anthropic/claude-sonnet-4.5",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba-cloud/qwen3-next-80b-a3b-thinking',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "alibaba-cloud/qwen3-next-80b-a3b-thinking",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'cohere/command-a',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "cohere/command-a",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-r1',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "deepseek/deepseek-r1",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 1Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    mask_sensitive_infobooleanOptional

    Mask (replace with ***) content in the output that involves private information, including but not limited to email, domain, link, ID number, home address, etc. Defaults to False, i.e. enable masking.

    Default: false
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    reasoning_effortstring · enumOptional

    Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

    Possible values:
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'qwen-max',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "qwen-max",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemini-2.5-flash-lite-preview',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "google/gemini-2.5-flash-lite-preview",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-prover-v2',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "deepseek/deepseek-prover-v2",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'anthracite-org/magnum-v4-72b',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "anthracite-org/magnum-v4-72b",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'MiniMax-Text-01',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "MiniMax-Text-01",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-4o-audio-preview',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
          modalities: ['text', 'audio'],
          audio: { voice: 'alloy', format: 'pcm16' },
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "gpt-4o-audio-preview",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    post
    Body
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-non-thinking-v3.2-exp',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "deepseek/deepseek-non-thinking-v3.2-exp",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemma-3n-e4b-it',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "google/gemma-3n-e4b-it",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'mistralai/mistral-tiny',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "mistralai/mistral-tiny",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-chat-v3.1',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "deepseek/deepseek-chat-v3.1",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-235b-a22b-thinking-2507',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "alibaba/qwen3-235b-a22b-thinking-2507",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'qwen-plus',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "qwen-plus",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'nvidia/nemotron-nano-12b-v2-vl',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "nvidia/nemotron-nano-12b-v2-vl",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    top_pnumber · min: 0.1 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    reasoning_effortstring · enumOptional

    Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

    Possible values:
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    post
    Body
    post
    Body
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    top_pnumber · min: 0.1 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    reasoning_effortstring · enumOptional

    Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

    Possible values:
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/llama-3.3-70b-versatile',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "meta-llama/llama-3.3-70b-versatile",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'anthropic/claude-3.7-sonnet',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "anthropic/claude-3.7-sonnet",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'anthropic/claude-3-5-haiku',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "anthropic/claude-3-5-haiku",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-vl-32b-instruct',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "alibaba/qwen3-vl-32b-instruct",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'nvidia/llama-3.1-nemotron-70b-instruct',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "nvidia/llama-3.1-nemotron-70b-instruct",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    top_pnumber · min: 0.1 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    reasoning_effortstring · enumOptional

    Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

    Possible values:
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    post
    Body
    post
    Body
    post
    Body
    post
    Body
    POST /v1/chat/completions HTTP/1.1
    Host: api.aimlapi.com
    Content-Type: application/json
    Accept: */*
    Content-Length: 641
    
    {
      "model": "nousresearch/hermes-4-405b",
      "messages": [
        {
          "role": "user",
          "content": "text",
          "name": "text"
        }
      ],
      "max_completion_tokens": 1,
      "max_tokens": 1,
      "stream": false,
      "stream_options": {
        "include_usage": true
      },
      "temperature": 1,
      "top_p": 1,
      "seed": 1,
      "min_p": 1,
      "top_k": 1,
      "repetition_penalty": 1,
      "top_a": 1,
      "frequency_penalty": 1,
      "prediction": {
        "type": "content",
        "content": "text"
      },
      "presence_penalty": 1,
      "tools": [
        {
          "type": "function",
          "function": {
            "description": "text",
            "name": "text",
            "parameters": {
              "ANY_ADDITIONAL_PROPERTY": null
            },
            "strict": true
          }
        }
      ],
      "tool_choice": "none",
      "parallel_tool_calls": true,
      "stop": "text",
      "logprobs": true,
      "top_logprobs": 1,
      "response_format": {
        "type": "text"
      }
    }
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba-cloud/qwen3-next-80b-a3b-instruct',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "alibaba-cloud/qwen3-next-80b-a3b-instruct",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/llama-4-maverick',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "meta-llama/llama-4-maverick",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'mistralai/Mixtral-8x7B-Instruct-v0.1',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemini-2.5-pro',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "google/gemini-2.5-pro",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "nousresearch/hermes-4-405b",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/Llama-3-70b-chat-hf',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "meta-llama/Llama-3-70b-chat-hf",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'qwen-turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "qwen-turbo",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'anthropic/claude-sonnet-4',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "anthropic/claude-sonnet-4",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemini-2.5-flash',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "google/gemini-2.5-flash",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-4o-mini-audio-preview',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
          modalities: ['text', 'audio'],
          audio: { voice: 'alloy', format: 'pcm16' },
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "gpt-4o-mini-audio-preview",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    top_pnumber · min: 0.1 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    reasoning_effortstring · enumOptional

    Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

    Possible values:
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 1Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    top_pnumber · min: 0.1 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    reasoning_effortstring · enumOptional

    Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

    Possible values:
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    post
    Body
    post
    Body
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-4-0125-preview',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "gpt-4-0125-preview",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    top_pnumber · min: 0.1 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    reasoning_effortstring · enumOptional

    Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

    Possible values:
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-chat',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "deepseek/deepseek-chat",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'anthropic/claude-opus-4',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "anthropic/claude-opus-4",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-max-preview',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "alibaba/qwen3-max-preview",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'claude-3-haiku',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "claude-3-haiku",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'Qwen/Qwen2.5-7B-Instruct-Turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "Qwen/Qwen2.5-7B-Instruct-Turbo",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'minimax/m2',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "minimax/m2",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'google/gemini-3-pro-preview',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "google/gemini-3-pro-preview",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'moonshot/kimi-k2-turbo-preview',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "moonshot/kimi-k2-turbo-preview",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'anthropic/claude-opus-4.1',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "anthropic/claude-opus-4.1",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'deepseek/deepseek-non-reasoner-v3.1-terminus',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "deepseek/deepseek-non-reasoner-v3.1-terminus",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    post
    Body
    post
    Body
    post
    Body
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-4',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "gpt-4",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-4o-mini',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "gpt-4o-mini",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    top_pnumber · min: 0.1 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    reasoning_effortstring · enumOptional

    Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

    Possible values:
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'Qwen/Qwen2.5-72B-Instruct-Turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "Qwen/Qwen2.5-72B-Instruct-Turbo",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'anthropic/claude-opus-4-5',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "anthropic/claude-opus-4-5",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'mistralai/codestral-2501',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "mistralai/codestral-2501",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    enable_thinkingbooleanOptional

    Specifies whether to use the thinking mode.

    Default: false
    thinking_budgetinteger · min: 1Optional

    The maximum reasoning length, effective only when enable_thinking is set to true.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-4o',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "gpt-4o",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-vl-32b-thinking',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "alibaba/qwen3-vl-32b-thinking",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'alibaba/qwen3-32b',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "alibaba/qwen3-32b",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    top_anumber · max: 1Optional

    Alternate top sampling parameter.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'nvidia/nemotron-nano-9b-v2',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "nvidia/nemotron-nano-9b-v2",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'meta-llama/llama-4-scout',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "meta-llama/llama-4-scout",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-4-turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "gpt-4-turbo",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    post
    Body
    modelstring · enumRequiredPossible values:
    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    echobooleanOptional

    If True, the response will contain the prompt. Can be used with logprobs to return prompt logprobs.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    min_pnumber · min: 0.001 · max: 0.999Optional

    A number between 0.001 and 0.999 that can be used as an alternative to top_p and top_k.

    top_knumberOptional

    Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

    repetition_penaltynumber | nullableOptional

    A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.

    Responses
    200Success
    post
    /v1/chat/completions
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'mistralai/Mistral-7B-Instruct-v0.2',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "mistralai/Mistral-7B-Instruct-v0.2",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'moonshot/kimi-k2-preview',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "moonshot/kimi-k2-preview",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    max_completion_tokensinteger · min: 1Optional

    An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

    max_tokensnumber · min: 1Optional

    The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

    streambooleanOptional

    If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    tool_choiceany ofOptional

    Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

    string · enumOptional

    none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

    Possible values:
    or
    parallel_tool_callsbooleanOptional

    Whether to enable parallel function calling during tool use.

    ninteger | nullableOptional

    How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

    stopany ofOptional

    Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

    stringOptional
    or
    string[]Optional
    or
    any | nullableOptional
    logprobsboolean | nullableOptional

    Whether to return log probabilities of the output tokens or not. If True, returns the log probabilities of each output token returned in the content of message.

    top_logprobsnumber | nullableOptional

    An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to True if this parameter is used.

    frequency_penaltynumber | nullableOptional

    Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penaltynumber | nullableOptional

    Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    seedinteger · min: 1Optional

    This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

    temperaturenumber · max: 2Optional

    What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

    top_pnumber · min: 0.01 · max: 1Optional

    An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

    response_formatone ofOptional

    An object specifying the format that the model must output.

    or
    or
    Responses
    200Success
    post
    /v1/chat/completions
    200Success

    Text Models (LLM)

    Overview of the capabilities of AIML API text models (LLMs).

    Overview

    The AI/ML API provides access to text-based models, also known as Large Language Models (LLMs), and allows you to interact with them through natural language (that's why a third common name for such models is chat models). These models can be applied to various tasks, enabling the creation of diverse applications using our API. For example, text models can be used to:

    • Create a system that searches your photos using text prompts.

    Act as a psychological supporter.

  • Play games with you through natural language.

  • Assist you with coding.

  • Perform a security assessment (pentests) on servers for vulnerabilities.

  • Write documentation for your services.

  • Serve as a grammar corrector for multiple languages with deep context understanding.

  • And much more.

  • Specific Capabilities

    There are several capabilities of text models that are worth mentioning separately.

    Completion allows the model to analyze a given text fragment and predict how it might continue based on the probabilities of the next possible tokens or characters. Chat Completion extends this functionality, enabling a simulated dialogue between the user and the model based on predefined roles (e.g., "strict language teacher" and "student"). A detailed description and examples can be found in our Completion and Chat Completion article.


    An evolution of chat completion includes Assistants (preconfigured conversational agents with specific roles) and Threads (a mechanism for maintaining conversation history for context). Examples of this functionality can be found in the Managing Assistants & Threads article.


    Function Calling allows a chat model to invoke external programmatic tools (e.g., a function you have written) while generating a response. A detailed description and examples are available in the article.

    Endpoint

    All text and chat models use the same endpoint:

    https://api.aimlapi.com/v1/chat/completions

    The parameters may vary (especially for models from different developers), so it’s best to check the API schema on each model’s page for details. Example: o4-mini.

    ✅ Quick Code Example

    We will call the gpt-4o model using the Python programming language and the OpenAI SDK.

    If you need a more detailed explanation of how to call a model's API in code, check out our QUICKSTART section.

    %pip install openai
    import os
    from openai import OpenAI
    
    client = OpenAI(
        base_url="https://api.aimlapi.com/v1",
    
        # Insert your AIML API Key in the quotation marks instead of <YOUR_AIMLAPI_KEY>:
        api_key="<YOUR_AIMLAPI_KEY>",  
    

    By running this code example, we received the following response from the chat model:

    Assistant: The sky appears blue due to a phenomenon called Rayleigh scattering. When sunlight enters Earth's atmosphere, it collides with gas molecules and small particles. Sunlight is made up of different colors, each with different wavelengths. Blue light has a shorter wavelength and is scattered in all directions by the gas molecules in the atmosphere more than other colors with longer wavelengths, such as red or yellow.
    As a result, when you look up at the sky during the day, you see this scattered blue light being dispersed in all directions, making the sky appear blue to our eyes. During sunrise and sunset, the sun's light passes through a greater thickness of Earth's atmosphere, scattering the shorter blue wavelengths out of your line of sight and leaving the longer wavelengths, like red and orange, more dominant, which is why the sky often turns those colors at those times.
    Complete Text Model List
    Model ID + API Reference link
    Developer
    Context
    Model Card

    Open AI

    16,000

    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'gpt-3.5-turbo',
          messages:[
              {
                  role:'user',
                  content: 'Hello'
              }
          ],
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "id": "chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl",
      "object": "chat.completion",
      "created": 1762343744,
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
            "refusal": null,
            "annotations": null,
            "audio": null,
            "tool_calls": null
          },
          "finish_reason": "stop",
          "logprobs": null
        }
      ],
      "model": "gpt-3.5-turbo",
      "usage": {
        "prompt_tokens": 137,
        "completion_tokens": 914,
        "total_tokens": 1051,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      }
    }
    )
    response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
    {
    "role": "system",
    "content": "You are an AI assistant who knows everything.",
    },
    {
    "role": "user",
    "content": "Tell me, why is the sky blue?"
    },
    ],
    )
    message = response.choices[0].message.content
    print(f"Assistant: {message}")

    gpt-3.5-turbo-0125

    Open AI

    16,000

    Chat GPT-3.5 Turbo 0125

    gpt-3.5-turbo-1106

    Open AI

    16,000

    Chat GPT-3.5 Turbo 1106

    gpt-4o

    Open AI

    128,000

    Chat GPT-4o

    gpt-4o-2024-08-06

    Open AI

    128,000

    GPT-4o-2024-08-06

    gpt-4o-2024-05-13

    Open AI

    128,000

    GPT-4o-2024-05-13

    gpt-4o-mini

    Open AI

    128,000

    Chat GPT 4o mini

    gpt-4o-mini-2024-07-18

    Open AI

    128,000

    -

    chatgpt-4o-latest

    Open AI

    128,000

    -

    gpt-4o-audio-preview

    Open AI

    128,000

    GPT-4o Audio Preview

    gpt-4o-mini-audio-preview

    Open AI

    128,000

    GPT-4o mini Audio

    gpt-4o-search-preview

    Open AI

    128,000

    GPT-4o Search Preview

    gpt-4o-mini-search-preview

    Open AI

    128,000

    GPT-4o Mini Search Preview

    gpt-4-turbo

    Open AI

    128,000

    Chat GPT 4 Turbo

    gpt-4-turbo-2024-04-09

    Open AI

    128,000

    -

    gpt-4

    Open AI

    8,000

    Chat GPT 4

    gpt-4-0125-preview

    Open AI

    8,000

    -

    gpt-4-1106-preview

    Open AI

    8,000

    -

    o1

    Open AI

    200,000

    OpenAI o1

    openai/o3-2025-04-16

    Open AI

    200,000

    o3

    o3-mini

    Open AI

    200,000

    OpenAI o3 mini

    openai/o3-pro

    Open AI

    200,000

    o3-pro

    openai/gpt-4.1-2025-04-14

    Open AI

    1,000,000

    GPT-4.1

    openai/gpt-4.1-mini-2025-04-14

    Open AI

    1,000,000

    GPT-4.1 Mini

    openai/gpt-4.1-nano-2025-04-14

    Open AI

    1,000,000

    GPT-4.1 Nano

    openai/o4-mini-2025-04-16

    Open AI

    200,000

    GPT-o4-mini-2025-04-16

    openai/gpt-oss-20b

    Open AI

    128,000

    GPT OSS 20B

    openai/gpt-oss-120b

    Open AI

    128,000

    GPT OSS 120B

    openai/gpt-5-2025-08-07

    Open AI

    400,000

    GPT-5

    openai/gpt-5-mini-2025-08-07

    Open AI

    400,000

    GPT-5 Mini

    openai/gpt-5-nano-2025-08-07

    Open AI

    400,000

    GPT-5 Nano

    openai/gpt-5-chat-latest

    Open AI

    400,000

    GPT-5 Chat

    openai/gpt-5-1

    Open AI

    128,000

    GPT-5.1

    openai/gpt-5-1-chat-latest

    Open AI

    128,000

    GPT-5.1 Chat Latest

    openai/gpt-5-1-codex

    Open AI

    400,000

    GPT-5.1 Codex

    openai/gpt-5-1-codex-mini

    Open AI

    400,000

    GPT-5.1 Codex Mini

    claude-3-opus-20240229

    Anthropic

    200,000

    Claude 3 Opus

    claude-3-haiku-20240307

    Anthropic

    200,000

    -

    claude-3-5-haiku-20241022

    Anthropic

    200,000

    -

    claude-3-7-sonnet-20250219

    Anthropic

    200,000

    Claude 3.7 Sonnet

    anthropic/claude-opus-4

    Anthropic

    200,000

    Claude 4 Opus

    anthropic/claude-opus-4.1 claude-opus-4-1 claude-opus-4-1-20250805

    Anthropic

    200,000

    Claude Opus 4.1

    anthropic/claude-sonnet-4

    Anthropic

    200,000

    Claude 4 Sonnet

    claude-sonnet-4-5-20250929

    anthropic/claude-sonnet-4.5

    claude-sonnet-4-5

    Anthropic

    200,000

    Claude 4.5 Sonnet

    anthropic/claude-haiku-4.5 claude-haiku-4-5

    claude-haiku-4-5-20251001

    Anthropic

    200,000

    Claude 4.5 Haiku

    anthropic/claude-opus-4-5 claude-opus-4-5 claude-opus-4-5-20251101

    Anthropic

    200,000

    Coming Soon

    Qwen/Qwen2.5-7B-Instruct-Turbo

    Alibaba Cloud

    32,000

    Qwen 2.5 7B Instruct Turbo

    Qwen/Qwen2.5-Coder-32B-Instruct

    Alibaba Cloud

    131,000

    -

    qwen-max

    Alibaba Cloud

    32,000

    Qwen Max

    qwen-max-2025-01-25

    Alibaba Cloud

    32,000

    Qwen Max 2025-01-25

    qwen-plus

    Alibaba Cloud

    131,000

    Qwen Plus

    qwen-turbo

    Alibaba Cloud

    1,000,000

    Qwen Turbo

    Qwen/Qwen2.5-72B-Instruct-Turbo

    Alibaba Cloud

    32,000

    Qwen 2.5 72B Instruct Turbo

    Qwen/QwQ-32B

    Alibaba Cloud

    131,000

    QwQ-32B

    Qwen/Qwen3-235B-A22B-fp8-tput

    Alibaba Cloud

    32,000

    Qwen 3 235B A22B

    alibaba/qwen3-32b

    Alibaba Cloud

    131,000

    Qwen3-32B

    alibaba/qwen3-coder-480b-a35b-instruct

    Alibaba Cloud

    262,000

    Qwen3 Coder

    alibaba/qwen3-235b-a22b-thinking-2507

    Alibaba Cloud

    262,000

    Qwen3 235B A22B Thinking

    alibaba/qwen3-next-80b-a3b-instruct

    Alibaba Cloud

    262,000

    Qwen3-Next-80B-A3B Instruct

    alibaba/qwen3-next-80b-a3b-thinking

    Alibaba Cloud

    262,000

    Qwen3-Next-80B-A3B Thinking

    alibaba/qwen3-max-preview

    Alibaba Cloud

    258,000

    Qwen3-Max Preview

    alibaba/qwen3-max-instruct

    Alibaba Cloud

    262,000

    Qwen3-Max Instruct

    qwen3-omni-30b-a3b-captioner

    Alibaba Cloud

    65,000

    qwen3-omni-30b-a3b-captioner

    alibaba/qwen3-vl-32b-instruct

    Alibaba Cloud

    126,000

    Qwen3 VL 32B Instruct

    alibaba/qwen3-vl-32b-thinking

    Alibaba Cloud

    126,000

    Qwen3 VL 32B Thinking

    deepseek-chat or deepseek/deepseek-chat or deepseek/deepseek-chat-v3-0324

    DeepSeek

    128,000

    DeepSeek V3

    deepseek/deepseek-r1 or deepseek-reasoner

    DeepSeek

    128,000

    DeepSeek R1

    deepseek/deepseek-prover-v2

    DeepSeek

    164,000

    DeepSeek Prover V2

    deepseek/deepseek-chat-v3.1

    DeepSeek

    128,000

    DeepSeek V3.1 Chat

    deepseek/deepseek-reasoner-v3.1

    DeepSeek

    128,000

    DeepSeek V3.1 Reasoner

    deepseek/deepseek-thinking-v3.2-exp

    DeepSeek

    128,000

    DeepSeek V3.2-Exp Thinking

    deepseek/deepseek-non-thinking-v3.2-exp

    DeepSeek

    128,000

    DeepSeek V3.2-Exp Non-Thinking

    deepseek/deepseek-reasoner-v3.1-terminus

    DeepSeek

    128,000

    DeepSeek V3.1 Terminus Reasoning

    deepseek/deepseek-non-reasoner-v3.1-terminus

    DeepSeek

    128,000

    DeepSeek V3.1 Terminus Non-Reasoning

    mistralai/Mixtral-8x7B-Instruct-v0.1

    Mistral AI

    64,000

    Mixtral-8x7B Instruct v0.1

    meta-llama/Llama-3.3-70B-Instruct-Turbo

    Meta

    128,000

    Meta Llama 3.3 70B Instruct Turbo

    meta-llama/Llama-3.2-3B-Instruct-Turbo

    Meta

    131,000

    Llama 3.2 3B Instruct Turbo

    meta-llama/Meta-Llama-3-8B-Instruct-Lite

    Meta

    9,000

    Llama 3 8B Instruct Lite

    meta-llama/Llama-3-70b-chat-hf

    Meta

    8,000

    Llama 3 70B Instruct Reference

    meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo

    Meta

    4,000

    Llama 3.1 (405B) Instruct Turbo

    meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo

    Meta

    128,000

    Llama 3.1 8B Instruct Turbo

    meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo

    Meta

    128,000

    Llama 3.1 70B Instruct Turbo

    meta-llama/llama-4-scout

    Meta

    1,000,000

    Llama 4 Scout

    meta-llama/llama-4-maverick

    Meta

    256,000

    Llama 4 Maverick

    meta-llama/llama-3.3-70b-versatile

    Meta

    131,000

    Llama 3.3 70B Versatile

    mistralai/Mistral-7B-Instruct-v0.2

    Mistral AI

    32,000

    Mistral (7B) Instruct v0.2

    mistralai/Mistral-7B-Instruct-v0.1

    Mistral AI

    8,000

    Mistral (7B) Instruct v0.1

    mistralai/Mistral-7B-Instruct-v0.3

    Mistral AI

    32,000

    Mistral (7B) Instruct v0.3

    gemini-2.0-flash-exp

    Google

    1,000,000

    Gemini 2.0 Flash Experimental

    gemini-2.0-flash

    Google

    1,000,000

    Gemini 2.0 Flash

    google/gemini-2.5-flash-lite-preview

    Google

    1,000,000

    –

    google/gemini-2.5-flash

    Google

    1,000,000

    Gemini 2.5 Flash

    google/gemini-2.5-pro

    Google

    1,000,000

    Gemini 2.5 Pro

    google/gemini-3-pro-preview

    Google

    200,000

    Gemini 3 Pro Preview

    google/gemma-3-4b-it

    Google

    128,000

    Gemma 3 (4B)

    google/gemma-3-12b-it

    Google

    128,000

    Gemma 3 (12B)

    google/gemma-3-27b-it

    Google

    128,000

    Gemma 3 (27B)

    google/gemma-3n-e4b-it

    Google

    8,192

    Gemma 3n 4B

    mistralai/mistral-tiny

    Mistral AI

    32,000

    Mistral Tiny

    mistralai/mistral-nemo

    Mistral AI

    128,000

    Mistral Nemo

    anthracite-org/magnum-v4-72b

    Anthracite

    32,000

    Magnum v4 72B

    nvidia/llama-3.1-nemotron-70b-instruct

    NVIDIA

    128,000

    Llama 3.1 Nemotron 70B Instruct

    nvidia/nemotron-nano-9b-v2

    NVIDIA

    128,000

    Nemotron Nano 9B V2

    nvidia/nemotron-nano-12b-v2-vl

    NVIDIA

    128,000

    Nemotron Nano 12B V2 VL

    cohere/command-a

    Cohere

    256,000

    Command A

    mistralai/codestral-2501

    Mistral AI

    256,000

    Mistral Codestral-2501

    MiniMax-Text-01

    MiniMax

    1,000,000

    MiniMax-Text-01

    minimax/m1

    MiniMax

    1,000,000

    MiniMax M1

    minimax/m2

    MiniMax

    200,000

    MiniMax M2

    moonshot/kimi-k2-preview

    Moonshot

    131,000

    Kimi-K2

    moonshot/kimi-k2-0905-preview

    Moonshot

    256,000

    Kimi-K2

    moonshot/kimi-k2-turbo-preview

    Moonshot

    256,000

    Kimi K2 Turbo Preview

    nousresearch/hermes-4-405b

    NousResearch

    131,000

    -

    perplexity/sonar

    Perplexity

    128,000

    Sonar

    perplexity/sonar-pro

    Perplexity

    200,000

    Sonar Pro

    x-ai/grok-3-beta

    xAI

    131,000

    Grok 3 Beta

    x-ai/grok-3-mini-beta

    xAI

    131,000

    Grok 3 Beta Mini

    x-ai/grok-4-07-09

    xAI

    256,000

    Grok 4

    x-ai/grok-code-fast-1

    xAI

    256,000

    Grok Code Fast 1

    x-ai/grok-4-fast-non-reasoning

    xAI

    2,000,000

    Grok 4 Fast

    x-ai/grok-4-fast-reasoning

    xAI

    2,000,000

    Grok 4 Fast Reasoning

    x-ai/grok-4-1-fast-non-reasoning

    xAI

    2,000,000

    Grok 4.1 Fast Non-Reasoning

    x-ai/grok-4-1-fast-reasoning

    xAI

    2,000,000

    Grok 4.1 Fast Reasoning

    zhipu/glm-4.5-air

    Zhipu

    128,000

    GLM-4.5 Air

    zhipu/glm-4.5

    Zhipu

    128,000

    GLM-4.5

    zhipu/glm-4.6

    Zhipu

    200,000

    GLM-4.6

    Function Calling
    gpt-3.5-turbo
    Chat GPT 3.5 Turbo
    post
    Body
    modelstring · enumRequiredPossible values:
    inputany ofRequired

    Text, image, or file inputs to the model, used to generate a response.

    stringOptional

    A text input to the model, equivalent to a text input with the user role.

    or
    or
    or
    or
    or
    or
    or
    or
    or
    or
    or
    or
    max_output_tokensintegerOptional

    An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

    previous_response_idstring | nullableOptional

    The unique ID of the previous response to the model. Use this to create multi-turn conversations.

    storeboolean | nullableOptional

    Whether to store the generated model response for later retrieval via API.

    Default: false
    streamboolean | nullableOptional

    If set to true, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    truncationstring · enumOptional

    The truncation strategy to use for the model response.

    • auto: If the context of this response and previous ones exceeds the model's context window size, the model will truncate the response to fit the context window by dropping input items in the middle of the conversation.
    • disabled (default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.
    Default: disabledPossible values:
    tool_choiceany ofOptional

    How the model should select which tool (or tools) to use when generating a response.

    string · enumOptional

    Controls which (if any) tool is called by the model.

    none means the model will not call any tool and instead generates a message.

    auto means the model can pick between generating a message or calling one or more tools.

    required means the model must call one or more tools.

    Possible values:
    or
    or
    Responses
    200Success
    post
    /v1/responses
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/responses', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          "model": "gpt-4o",
          "input": "Hello"
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "background": false,
      "created_at": 1762343744,
      "error": null,
      "id": "resp_68963fb142d08197b4d3ae3ad852542c054845c6ea84caa2",
      "incomplete_details": null,
      "instructions": null,
      "max_output_tokens": null,
      "metadata": {},
      "model": "gpt-4o",
      "object": "response",
      "output": null,
      "output_text": "Hi! How’s your day going?",
      "parallel_tool_calls": false,
      "previous_response_id": null,
      "prompt": null,
      "reasoning": null,
      "service_tier": null,
      "status": "completed",
      "temperature": null,
      "text": {
        "format": {
          "type": "text"
        }
      },
      "tool_choice": null,
      "tools": null,
      "top_p": null,
      "truncation": null,
      "usage": {
        "input_tokens": 137,
        "input_tokens_details": null,
        "output_tokens": 914,
        "output_tokens_details": null,
        "total_tokens": 1051
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    inputany ofRequired

    Text, image, or file inputs to the model, used to generate a response.

    stringOptional

    A text input to the model, equivalent to a text input with the user role.

    or
    or
    or
    or
    or
    or
    or
    or
    or
    or
    or
    or
    max_output_tokensintegerOptional

    An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

    previous_response_idstring | nullableOptional

    The unique ID of the previous response to the model. Use this to create multi-turn conversations.

    storeboolean | nullableOptional

    Whether to store the generated model response for later retrieval via API.

    Default: false
    streamboolean | nullableOptional

    If set to true, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    truncationstring · enumOptional

    The truncation strategy to use for the model response.

    • auto: If the context of this response and previous ones exceeds the model's context window size, the model will truncate the response to fit the context window by dropping input items in the middle of the conversation.
    • disabled (default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.
    Default: disabledPossible values:
    tool_choiceany ofOptional

    How the model should select which tool (or tools) to use when generating a response.

    string · enumOptional

    Controls which (if any) tool is called by the model.

    none means the model will not call any tool and instead generates a message.

    auto means the model can pick between generating a message or calling one or more tools.

    required means the model must call one or more tools.

    Possible values:
    or
    or
    Responses
    200Success
    post
    /v1/responses
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/responses', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          "model": "gpt-4o-mini",
          "input": "Hello"
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "background": false,
      "created_at": 1762343744,
      "error": null,
      "id": "resp_68963fb142d08197b4d3ae3ad852542c054845c6ea84caa2",
      "incomplete_details": null,
      "instructions": null,
      "max_output_tokens": null,
      "metadata": {},
      "model": "gpt-4o-mini",
      "object": "response",
      "output": null,
      "output_text": "Hi! How’s your day going?",
      "parallel_tool_calls": false,
      "previous_response_id": null,
      "prompt": null,
      "reasoning": null,
      "service_tier": null,
      "status": "completed",
      "temperature": null,
      "text": {
        "format": {
          "type": "text"
        }
      },
      "tool_choice": null,
      "tools": null,
      "top_p": null,
      "truncation": null,
      "usage": {
        "input_tokens": 137,
        "input_tokens_details": null,
        "output_tokens": 914,
        "output_tokens_details": null,
        "total_tokens": 1051
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    inputany ofRequired

    Text, image, or file inputs to the model, used to generate a response.

    stringOptional

    A text input to the model, equivalent to a text input with the user role.

    or
    or
    or
    or
    or
    or
    or
    or
    or
    or
    or
    max_output_tokensintegerOptional

    An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

    previous_response_idstring | nullableOptional

    The unique ID of the previous response to the model. Use this to create multi-turn conversations.

    storeboolean | nullableOptional

    Whether to store the generated model response for later retrieval via API.

    Default: false
    streamboolean | nullableOptional

    If set to true, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    truncationstring · enumOptional

    The truncation strategy to use for the model response.

    • auto: If the context of this response and previous ones exceeds the model's context window size, the model will truncate the response to fit the context window by dropping input items in the middle of the conversation.
    • disabled (default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.
    Default: disabledPossible values:
    tool_choiceany ofOptional

    How the model should select which tool (or tools) to use when generating a response.

    string · enumOptional

    Controls which (if any) tool is called by the model.

    none means the model will not call any tool and instead generates a message.

    auto means the model can pick between generating a message or calling one or more tools.

    required means the model must call one or more tools.

    Possible values:
    or
    or
    Responses
    200Success
    post
    /v1/responses
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/responses', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          "model": "gpt-4-turbo",
          "input": "Hello"
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "background": false,
      "created_at": 1762343744,
      "error": null,
      "id": "resp_68963fb142d08197b4d3ae3ad852542c054845c6ea84caa2",
      "incomplete_details": null,
      "instructions": null,
      "max_output_tokens": null,
      "metadata": {},
      "model": "gpt-4-turbo",
      "object": "response",
      "output": null,
      "output_text": "Hi! How’s your day going?",
      "parallel_tool_calls": false,
      "previous_response_id": null,
      "prompt": null,
      "reasoning": null,
      "service_tier": null,
      "status": "completed",
      "temperature": null,
      "text": {
        "format": {
          "type": "text"
        }
      },
      "tool_choice": null,
      "tools": null,
      "top_p": null,
      "truncation": null,
      "usage": {
        "input_tokens": 137,
        "input_tokens_details": null,
        "output_tokens": 914,
        "output_tokens_details": null,
        "total_tokens": 1051
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    inputany ofRequired

    Text, image, or file inputs to the model, used to generate a response.

    stringOptional

    A text input to the model, equivalent to a text input with the user role.

    or
    or
    or
    or
    or
    or
    or
    or
    or
    or
    max_output_tokensintegerOptional

    An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

    previous_response_idstring | nullableOptional

    The unique ID of the previous response to the model. Use this to create multi-turn conversations.

    storeboolean | nullableOptional

    Whether to store the generated model response for later retrieval via API.

    Default: false
    streamboolean | nullableOptional

    If set to true, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    truncationstring · enumOptional

    The truncation strategy to use for the model response.

    • auto: If the context of this response and previous ones exceeds the model's context window size, the model will truncate the response to fit the context window by dropping input items in the middle of the conversation.
    • disabled (default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.
    Default: disabledPossible values:
    tool_choiceany ofOptional

    How the model should select which tool (or tools) to use when generating a response.

    string · enumOptional

    Controls which (if any) tool is called by the model.

    none means the model will not call any tool and instead generates a message.

    auto means the model can pick between generating a message or calling one or more tools.

    required means the model must call one or more tools.

    Possible values:
    or
    or
    Responses
    200Success
    post
    /v1/responses
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/responses', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          "model": "gpt-4",
          "input": "Hello"
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "background": false,
      "created_at": 1762343744,
      "error": null,
      "id": "resp_68963fb142d08197b4d3ae3ad852542c054845c6ea84caa2",
      "incomplete_details": null,
      "instructions": null,
      "max_output_tokens": null,
      "metadata": {},
      "model": "gpt-4",
      "object": "response",
      "output": null,
      "output_text": "Hi! How’s your day going?",
      "parallel_tool_calls": false,
      "previous_response_id": null,
      "prompt": null,
      "reasoning": null,
      "service_tier": null,
      "status": "completed",
      "temperature": null,
      "text": {
        "format": {
          "type": "text"
        }
      },
      "tool_choice": null,
      "tools": null,
      "top_p": null,
      "truncation": null,
      "usage": {
        "input_tokens": 137,
        "input_tokens_details": null,
        "output_tokens": 914,
        "output_tokens_details": null,
        "total_tokens": 1051
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    inputany ofRequired

    Text, image, or file inputs to the model, used to generate a response.

    stringOptional

    A text input to the model, equivalent to a text input with the user role.

    or
    or
    or
    or
    or
    or
    or
    or
    or
    or
    or
    max_output_tokensintegerOptional

    An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

    previous_response_idstring | nullableOptional

    The unique ID of the previous response to the model. Use this to create multi-turn conversations.

    storeboolean | nullableOptional

    Whether to store the generated model response for later retrieval via API.

    Default: false
    streamboolean | nullableOptional

    If set to true, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    truncationstring · enumOptional

    The truncation strategy to use for the model response.

    • auto: If the context of this response and previous ones exceeds the model's context window size, the model will truncate the response to fit the context window by dropping input items in the middle of the conversation.
    • disabled (default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.
    Default: disabledPossible values:
    tool_choiceany ofOptional

    How the model should select which tool (or tools) to use when generating a response.

    string · enumOptional

    Controls which (if any) tool is called by the model.

    none means the model will not call any tool and instead generates a message.

    auto means the model can pick between generating a message or calling one or more tools.

    required means the model must call one or more tools.

    Possible values:
    or
    or
    Responses
    200Success
    post
    /v1/responses
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/responses', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          "model": "gpt-4-0125-preview",
          "input": "Hello"
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "background": false,
      "created_at": 1762343744,
      "error": null,
      "id": "resp_68963fb142d08197b4d3ae3ad852542c054845c6ea84caa2",
      "incomplete_details": null,
      "instructions": null,
      "max_output_tokens": null,
      "metadata": {},
      "model": "gpt-4-0125-preview",
      "object": "response",
      "output": null,
      "output_text": "Hi! How’s your day going?",
      "parallel_tool_calls": false,
      "previous_response_id": null,
      "prompt": null,
      "reasoning": null,
      "service_tier": null,
      "status": "completed",
      "temperature": null,
      "text": {
        "format": {
          "type": "text"
        }
      },
      "tool_choice": null,
      "tools": null,
      "top_p": null,
      "truncation": null,
      "usage": {
        "input_tokens": 137,
        "input_tokens_details": null,
        "output_tokens": 914,
        "output_tokens_details": null,
        "total_tokens": 1051
      }
    }
    post
    Body
    modelstring · enumRequiredPossible values:
    inputany ofRequired

    Text, image, or file inputs to the model, used to generate a response.

    stringOptional

    A text input to the model, equivalent to a text input with the user role.

    or
    or
    or
    or
    or
    or
    or
    or
    or
    or
    max_output_tokensintegerOptional

    An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

    previous_response_idstring | nullableOptional

    The unique ID of the previous response to the model. Use this to create multi-turn conversations.

    storeboolean | nullableOptional

    Whether to store the generated model response for later retrieval via API.

    Default: false
    streamboolean | nullableOptional

    If set to true, the model response data will be streamed to the client as it is generated using server-sent events.

    Default: false
    truncationstring · enumOptional

    The truncation strategy to use for the model response.

    • auto: If the context of this response and previous ones exceeds the model's context window size, the model will truncate the response to fit the context window by dropping input items in the middle of the conversation.
    • disabled (default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.
    Default: disabledPossible values:
    tool_choiceany ofOptional

    How the model should select which tool (or tools) to use when generating a response.

    string · enumOptional

    Controls which (if any) tool is called by the model.

    none means the model will not call any tool and instead generates a message.

    auto means the model can pick between generating a message or calling one or more tools.

    required means the model must call one or more tools.

    Possible values:
    or
    or
    Responses
    200Success
    post
    /v1/responses
    200Success
    async function main() {
      const response = await fetch('https://api.aimlapi.com/v1/responses', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          "model": "gpt-3.5-turbo",
          "input": "Hello"
        }),
      });
    
      const data = await response.json();
      console.log(JSON.stringify(data, null, 2));
    }
    
    main();
    {
      "background": false,
      "created_at": 1762343744,
      "error": null,
      "id": "resp_68963fb142d08197b4d3ae3ad852542c054845c6ea84caa2",
      "incomplete_details": null,
      "instructions": null,
      "max_output_tokens": null,
      "metadata": {},
      "model": "gpt-3.5-turbo",
      "object": "response",
      "output": null,
      "output_text": "Hi! How’s your day going?",
      "parallel_tool_calls": false,
      "previous_response_id": null,
      "prompt": null,
      "reasoning": null,
      "service_tier": null,
      "status": "completed",
      "temperature": null,
      "text": {
        "format": {
          "type": "text"
        }
      },
      "tool_choice": null,
      "tools": null,
      "top_p": null,
      "truncation": null,
      "usage": {
        "input_tokens": 137,
        "input_tokens_details": null,
        "output_tokens": 914,
        "output_tokens_details": null,
        "total_tokens": 1051
      }
    }

    All Model IDs

    A full list of available models.

    If you need to select models based on specific parameters for your task, visit the dedicated page on our official website, which offers convenient filtering options. On the selected model’s page, you can find detailed technical and commercial information.

    The section Get Model List via API contains API reference for the service endpoint, which lets you request the full model list.

    The section Model IDs lists the identifiers of all available and deprecated models, grouped by category. These IDs are used to specify the exact models in your code, like this:

    If you already know the model ID, use the page search function (Ctrl+F for Win/Linux, Command+F for Mac) to locate it. The hyperlink will take you directly to the model's API Reference page.

    New Model Request

    Can't find the model you need? Join our to propose new models for integration into our API offerings. Your contributions help us grow and serve you better.

    Get Model List via API

    Full List of Model IDs

    Text Models (LLM)

    Model ID + API Reference link
    Developer
    Context
    Model Card

    Image Models

    Model ID + API Reference link
    Developer
    Context
    Model Card

    Video Models

    Model ID + API Reference link
    Developer
    Context
    Model Card

    Voice/Speech Models

    Speech-to-Text

    Model ID + API Reference link
    Developer
    Context
    Model Card

    Text-to-Speech

    Model ID
    Developer
    Context
    Model Card

    Voice Chat

    Model ID
    Developer
    Context
    Model Card

    Music Models

    Model ID
    Developer
    Context
    Model Card

    Content Moderation Models

    Model ID + API Reference link
    Developer
    Context
    Model Card

    Vision Models

    Optical Character Recognition (OCR)

    Model ID + API Reference link
    Developer
    Context
    Model Card

    3D-Generating Models

    Model ID + API Reference link
    Developer
    Context
    Model Card

    Embedding Models

    Model ID + API Reference link
    Developer
    Context
    Model Card

    Deprecated / No Longer Supported Models

    These models are no longer available for API or Playground calls. Their description and API reference pages have also been removed from this documentation portal.

    Model ID
    Developer
    Context
    Model Card

    Open AI

    128,000

    Open AI

    128,000

    Open AI

    128,000

    Open AI

    128,000

    Open AI

    128,000

    -

    Open AI

    128,000

    -

    Open AI

    128,000

    Open AI

    128,000

    Open AI

    128,000

    Open AI

    128,000

    Open AI

    128,000

    Open AI

    128,000

    -

    Open AI

    8,000

    Open AI

    8,000

    -

    Open AI

    8,000

    -

    Open AI

    200,000

    Open AI

    200,000

    Open AI

    200,000

    Open AI

    200,000

    Open AI

    1,000,000

    Open AI

    1,000,000

    Open AI

    1,000,000

    Open AI

    200,000

    Open AI

    128,000

    Open AI

    128,000

    Open AI

    400,000

    Open AI

    400,000

    Open AI

    400,000

    Open AI

    400,000

    Open AI

    128,000

    Open AI

    128,000

    Open AI

    400,000

    Open AI

    400,000

    Anthropic

    200,000

    Anthropic

    200,000

    -

    Anthropic

    200,000

    -

    Anthropic

    200,000

    Anthropic

    200,000

    Anthropic

    200,000

    Anthropic

    200,000

    Anthropic

    200,000

    Anthropic

    200,000

    Anthropic

    200,000

    Coming Soon

    Alibaba Cloud

    32,000

    Alibaba Cloud

    32,000

    Alibaba Cloud

    32,000

    Alibaba Cloud

    131,000

    Alibaba Cloud

    1,000,000

    Alibaba Cloud

    32,000

    Alibaba Cloud

    32,000

    Alibaba Cloud

    131,000

    Alibaba Cloud

    262,000

    Alibaba Cloud

    262,000

    Alibaba Cloud

    262,000

    Alibaba Cloud

    262,000

    Alibaba Cloud

    258,000

    Alibaba Cloud

    262,000

    Alibaba Cloud

    65,000

    Alibaba Cloud

    126,000

    Alibaba Cloud

    126,000

    DeepSeek

    128,000

    DeepSeek

    128,000

    DeepSeek

    164,000

    DeepSeek

    128,000

    DeepSeek

    128,000

    DeepSeek

    128,000

    DeepSeek

    128,000

    DeepSeek

    128,000

    DeepSeek

    128,000

    Mistral AI

    64,000

    Meta

    128,000

    Meta

    131,000

    Meta

    9,000

    Meta

    8,000

    Meta

    4,000

    Meta

    128,000

    Meta

    128,000

    Meta

    1,000,000

    Meta

    256,000

    Meta

    131,000

    Mistral AI

    32,000

    Mistral AI

    32,000

    Google

    1,000,000

    Google

    1,000,000

    Google

    1,000,000

    –

    Google

    1,000,000

    Google

    1,000,000

    Google

    200,000

    Google

    128,000

    Google

    128,000

    Google

    128,000

    Google

    8,192

    Mistral AI

    32,000

    Mistral AI

    128,000

    Anthracite

    32,000

    NVIDIA

    128,000

    NVIDIA

    128,000

    NVIDIA

    128,000

    Cohere

    256,000

    Mistral AI

    256,000

    MiniMax

    1,000,000

    MiniMax

    1,000,000

    MiniMax

    200,000

    Moonshot

    131,000

    Moonshot

    256,000

    Moonshot

    256,000

    NousResearch

    131,000

    -

    Perplexity

    128,000

    Perplexity

    200,000

    xAI

    131,000

    xAI

    131,000

    xAI

    256,000

    xAI

    256,000

    xAI

    2,000,000

    xAI

    2,000,000

    xAI

    2,000,000

    xAI

    2,000,000

    Zhipu

    128,000

    Zhipu

    128,000

    Zhipu

    200,000

    Alibaba Cloud

    ByteDance

    ByteDance

    ByteDance

    ByteDance

    ByteDance

    ByteDance

    Flux

    Flux

    Flux

    Flux

    Flux

    Flux

    -

    Flux

    Flux

    Flux

    Flux

    Flux

    Flux

    Flux

    Flux

    Flux

    Flux

    Flux

    Flux

    Flux

    Google

    Google

    Google

    Google

    Google

    Google

    Google

    Google

    Google

    Google

    OpenAI

    OpenAI

    OpenAI

    Recraft AI

    Reve

    Reve

    Reve

    Stability AI

    Stability AI

    Tencent

    Topaz Labs

    Topaz Labs

    xAI

    Alibaba Cloud

    Alibaba Cloud

    Alibaba Cloud

    Alibaba Cloud

    Alibaba Cloud

    Alibaba Cloud

    Alibaba Cloud

    Alibaba Cloud

    Alibaba Cloud

    ByteDance

    ByteDance

    ByteDance

    ByteDance

    ByteDance

    ByteDance

    Google

    Google

    Google

    Google

    Google

    Google

    Google

    Google

    Google

    Google

    Google

    Google

    Google

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Kling AI

    Krea

    Krea

    LTXV

    Coming Soon

    LTXV

    Coming Soon

    Minimax

    Luma AI

    Luma AI

    Luma AI

    Minimax

    -

    Minimax

    OpenAI

    -

    OpenAI

    -

    OpenAI

    -

    OpenAI

    -

    PixVerse

    PixVerse

    PixVerse

    Runway

    Runway

    Runway

    Runway

    Sber AI

    Sber AI

    Veed

    Veed

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    OpenAI

    -

    OpenAI

    -

    OpenAI

    -

    OpenAI

    -

    OpenAI

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    Deepgram

    ElevenLabs

    ElevenLabs

    Inworld

    Inworld

    Microsoft

    Microsoft

    OpenAI

    OpenAI

    OpenAI

    MiniMax

    MiniMax

    Minimax AI

    -

    Minimax AI

    Minimax AI

    Together AI

    32,000

    BAAI

    BAAI

    Anthropic

    16,000

    Anthropic

    32,000

    -

    Anthropic

    32,000

    -

    Anthropic

    16,000

    -

    Anthropic

    16,000

    -

    Anthropic

    16,000

    -

    Anthropic

    4,000

    -

    Google

    2,000

    Google

    2,000

    Google

    2,000

    -

    kling-video/v1.5/standard/text-to-video

    Kling AI

    128,000

    o1-mini o1-mini-2024-09-12

    OpenAI

    128,000

    Qwen/Qwen2-72B-Instruct

    Alibaba Cloud

    32,000

    claude-3-5-sonnet-20240620

    Anthropic

    200,000

    -

    claude-3-5-sonnet-20241022

    Anthropic

    200,000

    cohere/command-r-plus

    Cohere

    128,000

    google/gemma-2-27b-it

    Google

    8,000

    NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO

    Nous Research

    32,000

    -

    nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

    Nvidia

    128,000

    meta-llama/Llama-3-8b-chat-hf

    Meta

    8,000

    meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo

    Meta

    131,000

    meta-llama/Llama-Vision-Free

    Meta

    128,000

    -

    meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo

    Meta

    131,000

    abab6.5s-chat

    MiniMax

    245,000

    -

    openrouter/horizon-beta

    OpenRouter

    256,000

    -

    openrouter/horizon-alpha

    OpenRouter

    256,000

    -

    wan/v2.1/1.3b/text-to-video

    Alibaba Cloud

    -

    o1-preview, o1-preview-2024-09-12

    OpenAI

    128,000

    claude-3-sonnet-20240229, anthropic/claude-3-sonnet, claude-3-sonnet-latest

    Anthropic

    200,000

    google/gemini-2.5-pro-preview, google/gemini-2.5-pro-preview-05-06

    Google

    1,000,000

    google/gemini-2.5-flash-preview

    Google

    1,000,000

    neversleep/llama-3.1-lumimaid-70b

    NeverSleep

    8,000

    x-ai/grok-beta

    xAI

    131,000

    gpt-4.5-preview

    OpenAI

    128,000

    gemini-1.5-flash

    Google

    1,000,000

    gemini-1.5-pro

    Google

    1,000,000

    google/gemma-3-1b-it

    Google

    128,000

    togethercomputer/m2-bert-80M-8k-retrieval

    TogetherAI

    8,000

    togethercomputer/m2-bert-80M-2k-retrieval

    TogetherAI

    2,000

    Gryphe/MythoMax-L2-13b-Lite

    Gryphe

    4,000

    -

    mistralai/Mixtral-8x22B-Instruct-v0.1

    Mistral AI

    64,000

    google/gemini-2.5-pro-exp-03-25

    Google

    1,000,000

    -

    google/gemini-2.0-flash-thinking-exp-01

    Google

    1,000,000

    ai21/jamba-1-5-mini

    AI21 Labs

    256,000

    textembedding-gecko@001

    Google

    3,000

    -

    google/gemini-pro or gemini-pro

    Google

    32,000

    meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo-128K

    Meta

    128,000

    -

    stabilityai/stable-diffusion-xl-base-1.0

    Stability AI

    upstage/solar-10.7b-instruct-v1.0

    Upstage

    4,000

    meta-llama/Llama-2-13b-chat-hf

    Meta

    4,100

    meta-llama/meta-llama-3-70b-instruct-turbo

    Meta

    128,000

    -

    google/gemma-2-9b-it

    Google

    8,000

    google/gemma-2b-it

    Google

    8,000

    Gryphe/MythoMax-L2-13b

    Gryphe

    4,000

    microsoft/WizardLM-2-8x22B

    Microsoft

    64,000

    Austism/chronos-hermes-13b

    Austism

    2,000

    databricks/dbrx-instruct

    Databricks

    32,000

    deepseek-ai/deepseek-llm-67b-chat

    DeepSeek

    4,000

    deepseek-ai/deepseek-coder-33b-instruct

    DeepSeek

    16,000

    Meta-Llama/Llama-2-7b-chat-hf

    Meta

    4,000

    Meta-Llama/Meta-Llama-3-70B-Instruct-Lite

    Meta

    8,000

    Meta-Llama/Llama-Guard-7b

    Meta

    4,000

    meta-llama/Llama-2-7b-hf

    Meta

    4,000

    meta-llama/Llama-3-8b-hf

    Meta

    8,000

    codellama/CodeLlama-70b-hf

    Meta

    16,000

    codellama/CodeLlama-7b-Instruct-hf

    Meta

    16,000

    codellama/CodeLlama-13b-Instruct-hf

    Meta

    16,000

    codellama/CodeLlama-70b-Instruct-hf

    Meta

    4,000

    codellama/CodeLlama-70b-Python-hf

    Meta

    4,000

    mistralai/Mixtral-8x22B-Instruct-v0.1

    Mistral AI

    64,000

    gpt-3.5-turbo-16k-0613

    OpenAI

    -

    gpt-4-0613

    OpenAI

    128,000

    Qwen/Qwen-14B-Chat

    Alibaba Cloud

    8,000

    Qwen/Qwen1.5-0.5B

    Alibaba Cloud

    32,000

    Qwen/Qwen1.5-1.8B

    Alibaba Cloud

    32,000

    Qwen/Qwen1.5-4B

    Alibaba Cloud

    32,000

    Qwen/Qwen1.5-1.8B-Chat

    Alibaba Cloud

    32,000

    Qwen/Qwen1.5-4B-Chat

    Alibaba Cloud

    32,000

    Qwen/Qwen1.5-7B-Chat

    Alibaba Cloud

    32,000

    Qwen/Qwen1.5-14B-Chat

    Alibaba Cloud

    32,000

    qwen/qvq-72b-preview

    Alibaba Cloud

    32,000

    togethercomputer/guanaco-13b

    Tim Dettmers

    2,000

    togethercomputer/guanaco-33b

    Tim Dettmers

    2,000

    togethercomputer/guanaco-65b

    Tim Dettmers

    2,000

    togethercomputer/mpt-7b-chat

    Mosaic ML

    2,000

    togethercomputer/mpt-30b-chat

    Mosaic ML

    8,000

    togethercomputer/RedPajama-INCITE-7B-Instruct

    RedPajama

    2,000

    prompthero/openjourney

    PromptHero

    77

    wavymulder/Analog-Diffusion

    wavymulder

    77

    -

    01.AI

    4,000

    Undi95/Toppy-M-7B

    Undi95

    4,000

    SG161222/Realistic_Vision_V3.0_VAE

    Together

    77

    tiiuae/falcon-40b

    TII

    2,000

    allenai/OLMo-7B

    Allen Institute for AI

    2,000

    bigcode/starcoder

    BigCode

    8,000

    HuggingFaceH4/starchat-alpha

    Hugging Face

    8,000

    NousResearch/Nous-Hermes-Llama2-70b

    NousResearch

    4,000

    NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT

    NousResearch

    32,000

    NousResearch/Nous-Hermes-2-Mistral-7B-DPO

    NousResearch

    32,000

    NousResearch/Hermes-2-Theta-Llama-3-70B

    NousResearch

    8,000

    defog/sqlcoder

    Defog AI

    8,000

    replit/replit-code-v1-3b

    Replit

    2,000

    lmsys/vicuna-13b-v1.5

    Imsys

    4,000

    microsoft/phi-2

    Microsoft

    2,000

    stabilityai/stablelm-base-alpha-3b

    StabilityAI

    4,000

    runwayml/stable-diffusion-v1-5

    StabilityAI

    77

    stabilityai/stable-diffusion-2-1

    StabilityAI

    77

    teknium/OpenHermes-2p5-Mistral-7B

    Teknium

    8,000

    openchat/openchat-3.5-1210

    OpenChat

    8,000

    DiscoResearch/DiscoLM-mixtral-8x7b-v2

    Disco Research

    32,000

    google/flan-t5-xl

    Google

    512

    garage-bAInd/Platypus2-70B-instruct

    Garage-bAInd

    4,000

    EleutherAI/gpt-neox-20b

    EleutherAI

    2,000

    gradientai/Llama-3-70B-Instruct-Gradient-1048k

    Gradient

    1,048,000

    WhereIsAI/UAE-Large-V1

    WhereIsAI

    512

    zero-one-ai/Yi-34B-Chat

    01.AI

    4,000

    meta-llama/Meta-Llama-3.1-70B-Reference

    Meta

    32,000

    –

    meta-llama/Meta-Llama-3.1-8B-Reference

    Meta

    32,000

    –

    EleutherAI/llemma_7b

    EleutherAI

    32,000

    –

    huggyllama/llama-30b

    Huggyllama

    32,000

    –

    huggyllama/llama-13b

    Huggyllama

    32,000

    –

    togethercomputer/llama-2-70b

    TogetherAI

    32,000

    –

    togethercomputer/llama-2-13b

    TogetherAI

    32,000

    –

    huggyllama/llama-65b

    Huggyllama

    32,000

    –

    WizardLM/WizardLM-70B-V1.0

    WizardLM

    32,000

    –

    huggyllama/llama-7b

    Huggyllama

    32,000

    –

    togethercomputer/llama-2-7b

    TogetherAI

    32,000

    –

    NousResearch/Nous-Hermes-13b

    NousResearch

    2,000

    –

    mistralai/Mistral-7B-v0.1

    Mistral AI

    32,000

    mistralai/Mixtral-8x7B-v0.1

    Mistral AI

    32,000

    -

    Suno AI

    32

    gpt-3.5-turbo

    Open AI

    16,000

    Chat GPT 3.5 Turbo

    gpt-3.5-turbo-0125

    Open AI

    16,000

    Chat GPT-3.5 Turbo 0125

    gpt-3.5-turbo-1106

    Open AI

    16,000

    alibaba/qwen-image

    Alibaba Cloud

    Qwen Image

    alibaba/qwen-image-edit

    Alibaba Cloud

    Qwen Image Edit

    alibaba/z-image-turbo

    Alibaba Cloud

    alibaba/wan2.1-t2v-plus

    Alibaba Cloud

    Wan2.1 Plus

    alibaba/wan2.1-t2v-turbo

    Alibaba Cloud

    Wan2.1 Turbo

    alibaba/wan2.2-t2v-plus

    Alibaba Cloud

    aai/slam-1

    Assembly AI

    Slam 1

    aai/universal

    Assembly AI

    Universal

    #g1_nova-2-automotive

    Deepgram

    alibaba/qwen3-tts-flash

    Alibaba Cloud

    Qwen3-TTS-Flash

    #g1_aura-angus-en

    Deepgram

    Aura

    #g1_aura-arcas-en

    Deepgram

    elevenlabs/v3_alpha

    ElevenLabs

    Eleven v3 Alpha

    minimax/speech-2.5-turbo-preview

    MiniMax

    MiniMax Speech 2.5 Turbo

    minimax/speech-2.5-hd-preview

    MiniMax

    elevenlabs/eleven_music

    ElevenLabs

    Eleven Music

    google/lyria2

    Google

    Lyria 2

    stable-audio

    Stability AI

    meta-llama/Llama-Guard-3-11B-Vision-Turbo

    Meta

    128,000

    -

    meta-llama/LlamaGuard-2-8b

    Meta

    8,000

    LlamaGuard 2 (8b)

    meta-llama/Meta-Llama-Guard-3-8B

    Meta

    8,000

    The service has no Model ID

    Google

    -

    mistral/mistral-ocr-latest

    Mistral AI

    -

    triposr

    Tripo AI

    Stable TripoSR 3D

    text-embedding-3-small

    Open AI

    8,000

    -

    text-embedding-3-large

    Open AI

    8,000

    Text-embedding-3-large

    text-embedding-ada-002

    Open AI

    8,000

    mistralai/Mistral-7B-Instruct-v0.1

    Mistral AI

    8,000

    Mistral (7B) Instruct v0.1

    Qwen/Qwen2.5-Coder-32B-Instruct

    Alibaba Cloud

    131,000

    Qwen 2.5 Coder

    Qwen/QwQ-32B

    Alibaba Cloud

    131,000

    Discord community

    Chat GPT-3.5 Turbo 1106
    gpt-4o
    Chat GPT-4o
    gpt-4o-2024-08-06
    GPT-4o-2024-08-06
    gpt-4o-2024-05-13
    GPT-4o-2024-05-13
    gpt-4o-mini
    Chat GPT 4o mini
    gpt-4o-mini-2024-07-18
    chatgpt-4o-latest
    gpt-4o-audio-preview
    GPT-4o Audio Preview
    gpt-4o-mini-audio-preview
    GPT-4o mini Audio
    gpt-4o-search-preview
    GPT-4o Search Preview
    gpt-4o-mini-search-preview
    GPT-4o Mini Search Preview
    gpt-4-turbo
    Chat GPT 4 Turbo
    gpt-4-turbo-2024-04-09
    gpt-4
    Chat GPT 4
    gpt-4-0125-preview
    gpt-4-1106-preview
    o1
    OpenAI o1
    openai/o3-2025-04-16
    o3
    o3-mini
    OpenAI o3 mini
    openai/o3-pro
    o3-pro
    openai/gpt-4.1-2025-04-14
    GPT-4.1
    openai/gpt-4.1-mini-2025-04-14
    GPT-4.1 Mini
    openai/gpt-4.1-nano-2025-04-14
    GPT-4.1 Nano
    openai/o4-mini-2025-04-16
    GPT-o4-mini-2025-04-16
    openai/gpt-oss-20b
    GPT OSS 20B
    openai/gpt-oss-120b
    GPT OSS 120B
    openai/gpt-5-2025-08-07
    GPT-5
    openai/gpt-5-mini-2025-08-07
    GPT-5 Mini
    openai/gpt-5-nano-2025-08-07
    GPT-5 Nano
    openai/gpt-5-chat-latest
    GPT-5 Chat
    openai/gpt-5-1
    GPT-5.1
    openai/gpt-5-1-chat-latest
    GPT-5.1 Chat Latest
    openai/gpt-5-1-codex
    GPT-5.1 Codex
    openai/gpt-5-1-codex-mini
    GPT-5.1 Codex Mini
    claude-3-opus-20240229
    Claude 3 Opus
    claude-3-haiku-20240307
    claude-3-5-haiku-20241022
    claude-3-7-sonnet-20250219
    Claude 3.7 Sonnet
    anthropic/claude-opus-4
    Claude 4 Opus
    anthropic/claude-opus-4.1 claude-opus-4-1 claude-opus-4-1-20250805
    Claude Opus 4.1
    anthropic/claude-sonnet-4
    Claude 4 Sonnet
    claude-sonnet-4-5-20250929
    anthropic/claude-sonnet-4.5
    claude-sonnet-4-5
    Claude 4.5 Sonnet
    anthropic/claude-haiku-4.5
    claude-haiku-4-5
    claude-haiku-4-5-20251001
    Claude 4.5 Haiku
    anthropic/claude-opus-4-5 claude-opus-4-5 claude-opus-4-5-20251101
    Qwen/Qwen2.5-7B-Instruct-Turbo
    Qwen 2.5 7B Instruct Turbo
    qwen-max
    Qwen Max
    qwen-max-2025-01-25
    Qwen Max 2025-01-25
    qwen-plus
    Qwen Plus
    qwen-turbo
    Qwen Turbo
    Qwen/Qwen2.5-72B-Instruct-Turbo
    Qwen 2.5 72B Instruct Turbo
    Qwen/Qwen3-235B-A22B-fp8-tput
    Qwen 3 235B A22B
    alibaba/qwen3-32b
    Qwen3-32B
    alibaba/qwen3-coder-480b-a35b-instruct
    Qwen3 Coder
    alibaba/qwen3-235b-a22b-thinking-2507
    Qwen3 235B A22B Thinking
    alibaba/qwen3-next-80b-a3b-instruct
    Qwen3-Next-80B-A3B Instruct
    alibaba/qwen3-next-80b-a3b-thinking
    Qwen3-Next-80B-A3B Thinking
    alibaba/qwen3-max-preview
    Qwen3-Max Preview
    alibaba/qwen3-max-instruct
    Qwen3-Max Instruct
    qwen3-omni-30b-a3b-captioner
    qwen3-omni-30b-a3b-captioner
    alibaba/qwen3-vl-32b-instruct
    Qwen3 VL 32B Instruct
    alibaba/qwen3-vl-32b-thinking
    Qwen3 VL 32B Thinking
    deepseek-chat or deepseek/deepseek-chat or deepseek/deepseek-chat-v3-0324
    DeepSeek V3
    deepseek/deepseek-r1 or deepseek-reasoner
    DeepSeek R1
    deepseek/deepseek-prover-v2
    DeepSeek Prover V2
    deepseek/deepseek-chat-v3.1
    DeepSeek V3.1 Chat
    deepseek/deepseek-reasoner-v3.1
    DeepSeek V3.1 Reasoner
    deepseek/deepseek-thinking-v3.2-exp
    DeepSeek V3.2-Exp Thinking
    deepseek/deepseek-non-thinking-v3.2-exp
    DeepSeek V3.2-Exp Non-Thinking
    deepseek/deepseek-reasoner-v3.1-terminus
    DeepSeek V3.1 Terminus Reasoning
    deepseek/deepseek-non-reasoner-v3.1-terminus
    DeepSeek V3.1 Terminus Non-Reasoning
    mistralai/Mixtral-8x7B-Instruct-v0.1
    Mixtral-8x7B Instruct v0.1
    meta-llama/Llama-3.3-70B-Instruct-Turbo
    Meta Llama 3.3 70B Instruct Turbo
    meta-llama/Llama-3.2-3B-Instruct-Turbo
    Llama 3.2 3B Instruct Turbo
    meta-llama/Meta-Llama-3-8B-Instruct-Lite
    Llama 3 8B Instruct Lite
    meta-llama/Llama-3-70b-chat-hf
    Llama 3 70B Instruct Reference
    meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
    Llama 3.1 (405B) Instruct Turbo
    meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
    Llama 3.1 8B Instruct Turbo
    meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
    Llama 3.1 70B Instruct Turbo
    meta-llama/llama-4-scout
    Llama 4 Scout
    meta-llama/llama-4-maverick
    Llama 4 Maverick
    meta-llama/llama-3.3-70b-versatile
    Llama 3.3 70B Versatile
    mistralai/Mistral-7B-Instruct-v0.2
    Mistral (7B) Instruct v0.2
    mistralai/Mistral-7B-Instruct-v0.3
    Mistral (7B) Instruct v0.3
    gemini-2.0-flash-exp
    Gemini 2.0 Flash Experimental
    gemini-2.0-flash
    Gemini 2.0 Flash
    google/gemini-2.5-flash-lite-preview
    google/gemini-2.5-flash
    Gemini 2.5 Flash
    google/gemini-2.5-pro
    Gemini 2.5 Pro
    google/gemini-3-pro-preview
    Gemini 3 Pro Preview
    google/gemma-3-4b-it
    Gemma 3 (4B)
    google/gemma-3-12b-it
    Gemma 3 (12B)
    google/gemma-3-27b-it
    Gemma 3 (27B)
    google/gemma-3n-e4b-it
    Gemma 3n 4B
    mistralai/mistral-tiny
    Mistral Tiny
    mistralai/mistral-nemo
    Mistral Nemo
    anthracite-org/magnum-v4-72b
    Magnum v4 72B
    nvidia/llama-3.1-nemotron-70b-instruct
    Llama 3.1 Nemotron 70B Instruct
    nvidia/nemotron-nano-9b-v2
    Nemotron Nano 9B V2
    nvidia/nemotron-nano-12b-v2-vl
    Nemotron Nano 12B V2 VL
    cohere/command-a
    Command A
    mistralai/codestral-2501
    Mistral Codestral-2501
    MiniMax-Text-01
    MiniMax-Text-01
    minimax/m1
    MiniMax M1
    minimax/m2
    MiniMax M2
    moonshot/kimi-k2-preview
    Kimi-K2
    moonshot/kimi-k2-0905-preview
    Kimi-K2
    moonshot/kimi-k2-turbo-preview
    Kimi K2 Turbo Preview
    nousresearch/hermes-4-405b
    perplexity/sonar
    Sonar
    perplexity/sonar-pro
    Sonar Pro
    x-ai/grok-3-beta
    Grok 3 Beta
    x-ai/grok-3-mini-beta
    Grok 3 Beta Mini
    x-ai/grok-4-07-09
    Grok 4
    x-ai/grok-code-fast-1
    Grok Code Fast 1
    x-ai/grok-4-fast-non-reasoning
    Grok 4 Fast
    x-ai/grok-4-fast-reasoning
    Grok 4 Fast Reasoning
    x-ai/grok-4-1-fast-non-reasoning
    Grok 4.1 Fast Non-Reasoning
    x-ai/grok-4-1-fast-reasoning
    Grok 4.1 Fast Reasoning
    zhipu/glm-4.5-air
    GLM-4.5 Air
    zhipu/glm-4.5
    GLM-4.5
    zhipu/glm-4.6
    GLM-4.6
    Z-Image Turbo
    alibaba/z-image-turbo-lora
    Z-Image Turbo LoRA
    bytedance/seedream-3.0
    Seedream 3.0
    bytedance/seededit-3.0-i2i
    Seedream 3.0
    bytedance/seedream-v4-text-to-image
    Seedream 4 Text-to-Image
    bytedance/seedream-v4-edit
    Seedream 4 Edit
    bytedance/uso
    USO
    bytedance/seedream-4-5
    Seedream 4.5
    flux-pro
    FLUX.1 [pro]
    flux-pro/v1.1
    FLUX 1.1 [pro]
    flux-pro/v1.1-ultra
    FLUX 1.1 [pro ultra]
    flux-realism
    FLUX Realism LoRA
    flux/dev
    FLUX.1 [dev]
    flux/dev/image-to-image
    flux/schnell
    FLUX.1 [schnell]
    flux/kontext-max/text-to-image
    FLUX.1 Kontext [max]
    flux/kontext-max/image-to-image
    FLUX.1 Kontext [max]
    flux/kontext-pro/text-to-image
    Flux.1 Kontext [pro]
    flux/kontext-pro/image-to-image
    Flux.1 Kontext [pro]
    flux/srpo
    FLUX.1 SRPO Text-to-Image
    flux/srpo/image-to-image
    FLUX.1 SRPO Image-to-Image
    blackforestlabs/flux-2
    FLUX.2
    blackforestlabs/flux-2-edit
    FLUX.2 Edit
    blackforestlabs/flux-2-lora
    Flux 2 LoRA
    blackforestlabs/flux-2-lora-edit
    Flux 2 LoRA Edit
    blackforestlabs/flux-2-pro
    FLUX.2 [pro]
    blackforestlabs/flux-2-pro-edit
    FLUX.2 [pro] Edit
    imagen-3.0-generate-002
    Imagen 3
    google/imagen4/preview
    Imagen 4 Preview
    imagen-4.0-ultra-generate-preview-06-06
    Imagen 4 Ultra
    google/gemini-2.5-flash-image
    Gemini 2.5 Flash Image
    google/gemini-2.5-flash-image-edit
    Gemini 2.5 Flash Image Edit
    google/gemini-3-pro-image-preview
    Gemini 3 Pro Image (Nano Banana Pro)
    google/gemini-3-pro-image-preview-edit
    Gemini 3 Pro Image Edit (Nano Banana Pro)
    google/imagen-4.0-generate-001
    Imagen 4.0 Generate
    google/imagen-4.0-fast-generate-001
    Imagen 4.0 Fast Generate
    google/imagen-4.0-ultra-generate-001
    Imagen 4.0 Ultra Generate
    dall-e-2
    OpenAI DALL·E 2
    dall-e-3
    OpenAI DALL·E 3
    openai/gpt-image-1
    gpt-image-1
    recraft-v3
    Recraft v3
    reve/create-image
    Reve Create Image
    reve/edit-image
    Reve Edit Image
    reve/remix-edit-image
    Reve Remix Image
    stable-diffusion-v3-medium
    Stable Diffusion 3
    stable-diffusion-v35-large
    Stable Diffusion 3.5 Large
    hunyuan/hunyuan-image-v3-text-to-image
    HunyuanImage 3.0
    topaz-labs/sharpen
    Sharpen
    topaz-labs/sharpen-gen
    Sharpen Generative
    x-ai/grok-2-image
    Grok 2 Image
    Wan 2.2 T2V
    alibaba/wan2.5-t2v-preview
    Wan 2.5 Text-to-Video
    alibaba/wan2.5-i2v-preview
    Wan 2.5 Image-to-Video
    alibaba/wan2.2-14b-animate-replace
    Wan 2.2 14b animate replace
    alibaba/wan2.2-14b-animate-move
    Wan 2.2 14b animate move
    alibaba/wan2.2-vace-fun-a14b-reframe
    Wan 2.2 vace fun 14b reframe
    alibaba/wan2.2-vace-fun-a14b-outpainting
    Wan 2.2 vace fun 14b outpainting
    alibaba/wan2.2-vace-fun-a14b-inpainting
    Wan 2.2 vace fun 14b inpainting
    alibaba/wan2.2-vace-fun-a14b-pose
    Wan 2.2 vace fun 14b pose
    alibaba/wan2.2-vace-fun-14b-depth
    Wan 2.2 vace fun 14b depth
    bytedance/seedance-1-0-lite-t2v
    Seedance 1.0 lite Text to Video
    bytedance/seedance-1-0-lite-i2v
    Seedance 1.0 lite Image to Video
    bytedance/seedance-1-0-pro-t2v
    Seedance 1.0 Pro
    bytedance/seedance-1-0-pro-i2v
    Seedance 1.0 Pro
    bytedance/omnihuman
    OmniHuman
    bytedance/omnihuman/v1.5
    OmniHuman v1.5
    veo2
    Veo2 Text-to-Video
    veo2/image-to-video
    Veo2 Image-to-Video
    google/veo3
    Veo 3
    google/veo-3.0-i2v
    Veo 3 I2V
    google/veo-3.0-fast
    Veo 3 Fast
    google/veo-3.0-i2v-fast
    Veo 3 I2V Fast
    google/veo-3.1-t2v
    Veo 3.1 Text-to-Video
    google/veo-3.1-t2v-fast
    Veo 3.1 Fast Text-to-Video
    google/veo-3.1-i2v
    Veo 3.1 Image-to-Video
    google/veo-3.1-i2v-fast
    Veo 3.1 Fast Image-to-Video
    google/veo-3.1-reference-to-video
    Veo 3.1 Reference-to-Video
    google/veo-3.1-first-last-image-to-video
    Veo 3.1 First-Last Frame-to-Video
    google/veo-3.1-first-last-image-to-video-fast
    Veo 3.1 Fast First-Last Frame-to-Video
    kling-video/v1/standard/image-to-video
    Kling AI (image-to-video)
    kling-video/v1/standard/text-to-video
    Kling AI (text-to-video)
    kling-video/v1/pro/image-to-video
    Kling AI (image-to-video)
    kling-video/v1/pro/text-to-video
    Kling AI (text-to-video)
    kling-video/v1.6/standard/text-to-video
    Kling 1.6 Standard
    kling-video/v1.6/standard/image-to-video
    Kling 1.6 Standard
    kling-video/v1.6/pro/image-to-video
    Kling 1.6 Pro
    kling-video/v1.6/pro/text-to-video
    Kling 1.6 Pro
    klingai/kling-video-v1.6-pro-effects
    Kling 1.6 Pro Effects
    klingai/kling-video-v1.6-standard-effects
    Kling 1.6 Standard Effects
    kling-video/v1.6/standard/multi-image-to-video
    Kling V1.6 Multi-Image-to-Video
    klingai/v2-master-image-to-video
    Kling 2.0 Master
    klingai/v2-master-text-to-video
    Kling 2.0 Master
    kling-video/v2.1/standard/image-to-video
    Kling V2.1 Standard I2V
    kling-video/v2.1/pro/image-to-video
    Kling V2.1 Pro I2V
    klingai/v2.1-master-image-to-video
    ling 2.1 Master
    klingai/v2.1-master-text-to-video
    Kling 2.1 Master
    klingai/v2.5-turbo/pro/image-to-video
    Kling Video v2.5 Turbo Pro Image-to-Video
    klingai/v2.5-turbo/pro/text-to-video
    Kling Video v2.5 Turbo Pro Text-to-Video
    klingai/avatar-standard
    Kling AI Avatar Standard
    klingai/avatar-pro
    Kling AI Avatar Pro
    klingai/video-v2-6-pro-text-to-video
    Kling 2.6 Pro Text-to-Video
    klingai/video-v2-6-pro-image-to-video
    Kling 2.6 Pro Image-to-Video
    krea/krea-wan-14b/text-to-video
    Krea WAN 14B Text-to-Video
    krea/krea-wan-14b/video-to-video
    Krea WAN 14B Video-to-Video
    ltxv/ltxv-2
    ltxv/ltxv-2-fast
    video-01
    MiniMax Video-01
    luma/ray-1.6
    Ray 1.6
    luma/ray-2
    Ray 2
    luma/ray-flash-2
    Ray Flash 2
    video-01-live2d
    minimax/hailuo-02
    Hailuo 02
    sora-2-t2v
    sora-2-i2v
    sora-2-pro-t2v
    sora-2-pro-i2v
    pixverse/v5/text-to-video
    Pixverse v5 Text-to-Video
    pixverse/v5/image-to-video
    Pixverse v5 Image-to-Video
    pixverse/v5/transition
    Pixverse v5 Transition
    gen3a_turbo
    Runway Gen-3 turbo
    runway/gen4_turbo
    Runway Gen-4 Turbo
    runway/gen4_aleph
    Aleph
    runway/act_two
    Runway Act Two
    sber-ai/kandinsky5-t2v
    Kandinsky 5 Standard
    sber-ai/kandinsky5-distill-t2v
    Kandinsky 5 Distill
    veed/fabric-1.0
    fabric-1.0
    veed/fabric-1.0-fast
    fabric-1.0-fast
    Deepgram Nova-2
    #g1_nova-2-conversationalai
    Deepgram Nova-2
    #g1_nova-2-drivethru
    Deepgram Nova-2
    #g1_nova-2-finance
    Deepgram Nova-2
    #g1_nova-2-general
    Deepgram Nova-2
    #g1_nova-2-medical
    Deepgram Nova-2
    #g1_nova-2-meeting
    Deepgram Nova-2
    #g1_nova-2-phonecall
    Deepgram Nova-2
    #g1_nova-2-video
    Deepgram Nova-2
    #g1_nova-2-voicemail
    Deepgram Nova-2
    #g1_whisper-tiny
    #g1_whisper-small
    #g1_whisper-base
    #g1_whisper-medium
    #g1_whisper-large
    Whisper
    Aura
    #g1_aura-asteria-en
    Aura
    #g1_aura-athena-en
    Aura
    #g1_aura-helios-en
    Aura
    #g1_aura-hera-en
    Aura
    #g1_aura-luna-en
    Aura
    #g1_aura-orion-en
    Aura
    #g1_aura-orpheus-en
    Aura
    #g1_aura-perseus-en
    Aura
    #g1_aura-stella-en
    Aura
    #g1_aura-zeus-en
    Aura
    #g1_aura-2-amalthea-en
    Aura 2
    #g1_aura-2-andromeda-en
    Aura 2
    #g1_aura-2-apollo-en
    Aura 2
    #g1_aura-2-arcas-en
    Aura 2
    #g1_aura-2-aries-en
    Aura 2
    #g1_aura-2-asteria-en
    Aura 2
    #g1_aura-2-athena-en
    Aura 2
    #g1_aura-2-atlas-en
    Aura 2
    #g1_aura-2-aurora-en
    Aura 2
    #g1_aura-2-callista-en
    Aura 2
    #g1_aura-2-cora-en
    Aura 2
    #g1_aura-2-cordelia-en
    Aura 2
    #g1_aura-2-delia-en
    Aura 2
    #g1_aura-2-draco-en
    Aura 2
    #g1_aura-2-electra-en
    Aura 2
    #g1_aura-2-harmonia-en
    Aura 2
    #g1_aura-2-helena-en
    Aura 2
    #g1_aura-2-hera-en
    Aura 2
    #g1_aura-2-hermes-en
    Aura 2
    #g1_aura-2-hyperion-en
    Aura 2
    #g1_aura-2-iris-en
    Aura 2
    #g1_aura-2-janus-en
    Aura 2
    #g1_aura-2-juno-en
    Aura 2
    #g1_aura-2-jupiter-en
    Aura 2
    #g1_aura-2-luna-en
    Aura 2
    #g1_aura-2-mars-en
    Aura 2
    #g1_aura-2-minerva-en
    Aura 2
    #g1_aura-2-neptune-en
    Aura 2
    #g1_aura-2-odysseus-en
    Aura 2
    #g1_aura-2-ophelia-en
    Aura 2
    #g1_aura-2-orion-en
    Aura 2
    #g1_aura-2-orpheus-en
    Aura 2
    #g1_aura-2-pandora-en
    Aura 2
    #g1_aura-2-phoebe-en
    Aura 2
    #g1_aura-2-pluto-en
    Aura 2
    #g1_aura-2-saturn-en
    Aura 2
    #g1_aura-2-selene-en
    Aura 2
    #g1_aura-2-thalia-en
    Aura 2
    #g1_aura-2-theia-en
    Aura 2
    #g1_aura-2-vesta-en
    Aura 2
    #g1_aura-2-zeus-en
    Aura 2
    #g1_aura-2-celeste-es
    Aura 2
    #g1_aura-2-estrella-es
    Aura 2
    #g1_aura-2-nestor-es
    Aura 2
    elevenlabs/eleven_multilingual_v2
    ElevenLabs Multilingual v2
    elevenlabs/eleven_turbo_v2_5
    ElevenLabs Turbo v2.5
    inworld/tts-1
    Inworld TTS-1
    inworld/tts-1-max
    Inworld TTS-1 MAX
    microsoft/vibevoice-1.5b
    VibeVoice 1.5B
    microsoft/vibevoice-7b
    VibeVoice 7B
    openai/tts-1
    TTS-1
    openai/tts-1-hd
    TTS-1 HD
    openai/gpt-4o-mini-tts
    GPT-4o-mini-TTS
    MiniMax Speech 2.5 HD
    minimax/speech-2.6-turbo
    MiniMax Speech 2.6 Turbo
    minimax/speech-2.6-hd
    MiniMax Speech 2.6 HD
    Stable Audio
    minimax-music
    music-01
    MiniMax Music
    minimax/music-1.5
    MiniMax Music 1.5
    Llama Guard 3 (8B)
    Text-embedding-ada-002
    togethercomputer/m2-bert-80M-32k-retrieval
    M2-BERT-Retrieval-32k
    BAAI/bge-base-en-v1.5
    BAAI-Bge-Base-1p5
    BAAI/bge-large-en-v1.5
    bge-large-en
    voyage-large-2-instruct
    Voyage Large 2 Instruct
    voyage-finance-2
    voyage-multilingual-2
    voyage-law-2
    voyage-code-2
    voyage-large-2
    voyage-2
    textembedding-gecko@003
    Textembedding-gecko@003
    textembedding-gecko-multilingual@001
    Textembedding-gecko-multilingual@001
    text-multilingual-embedding-002
    Qwq-32B
    Kling 1.5 Standart
    OpenAI o1-mini
    Qwen 2 Instruct (72B)
    Claude 3.5 Sonnet 20241022
    Command R+
    Gemma 2 (27b)
    Llama 3.1 Nemotron 70B Instruct
    Llama 3 8B Instruct Reference
    Llama 3.2 90B Vision Instruct Turbo
    Llama 3.2 11B Vision Instruct Turbo
    Wan 2.1
    OpenAI o1-preview
    Claude 3 Sonnet
    Gemini Pro 2.5 Preview
    Gemini 2.5 Flash Preview
    Llama 3.1 Lumimaid 70b
    Grok-2 Beta
    Chat GPT 4.5 preview
    Gemini 1.5 Flash
    Gemini 1.5 Pro
    Gemma 3 (1B)
    M2-BERT-Retrieval-8k
    M2-BERT-Retrieval-2K
    Mixtral 8x22B Instruct
    Gemini 2.0 Flash Thinking Experimental
    Jamba 1.5 Mini
    Gemini 1.0 Pro
    Stable Diffusion XL 1.0
    Upstage SOLAR Instruct v1 (11B)
    LLaMA-2 Chat (13B)
    Gemma 2 (9B)
    Gemma Instruct (2B)
    MythoMax-L2 (13B)
    WizardLM 2-8 (22B)
    Chronos Hermes 13b
    DBRX Instruct
    Deepseek-LLM-67b-Chat
    Deepseek Coder Instruct (33B)
    LLaMA-2 Chat (7B)
    Llama 3 70B Instruct Lite
    Llama Guard (7B)
    LLaMA-2 (7B)
    Llama-3 (8B)
    Code Llama (70B)
    Code Llama Instruct (7B)
    Code Llama Instruct (13B)
    Code Llama Instruct (70B)
    Code Llama Python (70B)
    Mixtral 8x22B Instruct
    Chat GPT 4 Turbo
    Qwen Chat (14B)
    Qwen 1.5 (0.5B)
    Qwen 1.5 (1.8B)
    Qwen 1.5 (4B)
    Qwen 1.5 Chat (1.8B)
    Qwen 1.5 Chat (4B)
    Qwen 1.5 Chat (7B)
    Qwen 1.5 Chat (14B)
    QVQ-72B-Preview
    Guanaco (13B)
    Guanaco (33B)
    Guanaco (65B)
    MPT-Chat (7B)
    MPT-Chat (30B)
    RedPajama-INCITE Instruct (7B)
    Openjourney v4
    Analog Diffusion
    01-ai Yi Base (6B)
    Toppy M (7B)
    Realistic Vision 3.0
    Falcon (40B)
    OLMo-7B
    StarCoder (16B)
    StarCoderChat Alpha (16B)
    Nous Hermes LLaMA-2 (70B)
    Nous Hermes 2 - Mixtral 8x7B-SFT
    Nous Hermes 2 - Mistral DPO (7B)
    Hermes 2 Theta Llama-3 70B
    SQLCoder (15B)
    Replit-Code-v1 (3B)
    Vicuna v1.5 (13B)
    Microsoft Phi-2
    StableLM Base Alpha 3B
    Stable Diffusion 1.5
    Stable Diffusion 2.1
    OpenHermes-2.5-Mistral (7B)
    OpenChat 3.5 (7B)
    DiscoLM Mixtral 8x7b (46.7B)
    FLAN T5 XL (3B)
    Platypus2-70B-Instruct
    GPT Neox 20B
    Llama-3 70B Gradient Instruct 1048k
    UAE-Large-V1
    Yi-34B-Chat
    ​Mixtral 7B
    Mixtral-8x7B Instruct v0.1
    Suno AI
    get
    Responses
    200Success
    application/json
    get
    /models
    200Success
    GET /models HTTP/1.1
    Host: api.aimlapi.com
    Accept: */*
    
    {
      "object": "text",
      "data": [
        {
          "id": "text",
          "type": "text",
          "info": {
            "name": "text",
            "developer": "text",
            "description": "text",
            "contextLength": 1,
            "url": "text"
          },
          "features": [
            "text"
          ]
        }
      ]
    }