Claude 4 Sonnet

This documentation is valid for the following list of our models:

  • anthropic/claude-sonnet-4

  • claude-sonnet-4

  • claude-sonnet-4-20250514

Model Overview

A major improvement over Claude 3.7 Sonnet, offering better coding abilities, stronger reasoning, and more accurate responses to your instructions.

How to Make a Call

Step-by-Step Instructions

1️ Setup You Can’t Skip

▪️ Create an Account: Visit the AI/ML API website and create an account (if you don’t have one yet). ▪️ Generate an API Key: After logging in, navigate to your account dashboard and generate your API key. Ensure that key is enabled on UI.

2️ Copy the code example

At the bottom of this page, you'll find a code example that shows how to structure the request. Choose the code snippet in your preferred programming language and copy it into your development environment.

3️ Modify the code example

▪️ Replace <YOUR_AIMLAPI_KEY> with your actual AI/ML API key from your account. ▪️ Insert your question or request into the content field—this is what the model will respond to.

4️ (Optional) Adjust other optional parameters if needed

Only model and messages are required parameters for this model (and we’ve already filled them in for you in the example), but you can include optional parameters if needed to adjust the model’s behavior. Below, you can find the corresponding API schema, which lists all available parameters along with notes on how to use them.

5️ Run your modified code

Run your modified code in your development environment. Response time depends on various factors, but for simple prompts it rarely exceeds a few seconds.

API Schema

post
Body
modelstring · enumRequiredPossible values:
stop_sequencesstring[]Optional

Custom text sequences that will cause the model to stop generating.

streambooleanOptional

If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

Default: false
systemstringOptional

A system prompt is a way of providing context and instructions to Claude, such as specifying a particular goal or role.

max_tokensnumberOptional

The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

Default: 32000
temperaturenumber · max: 1Optional

Amount of randomness injected into the response. Defaults to 1.0. Ranges from 0.0 to 1.0. Use temperature closer to 0.0 for analytical / multiple choice, and closer to 1.0 for creative and generative tasks. Note that even with temperature of 0.0, the results will not be fully deterministic.

top_pnumber · max: 1Optional

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

top_knumberOptional

Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Recommended for advanced use cases only. You usually only need to use temperature.

Responses
200Success
idstringRequired

A unique identifier for the chat completion.

Example: chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl
objectstring · enumRequired

The object type.

Example: chat.completionPossible values:
creatednumberRequired

The Unix timestamp (in seconds) of when the chat completion was created.

Example: 1762343744
modelstringRequired

The model used for the chat completion.

Example: anthropic/claude-sonnet-4
post
/v1/chat/completions
200Success

Code Example #1

Response

Code Example #2: Streaming Mode

As of February 13, 2026, the streaming response format for Anthropic models has changed. Specifically, the usage fields were renamed as follows:

  • the state structure is no longer used,

  • input_tokensprompt_tokens,

  • output_tokenscompletion_tokens,

  • a new total_tokens field has been added.

Response

Last updated

Was this helpful?