Claude 4.8 Opus

This documentation is valid for the following list of our models:

  • anthropic/claude-opus-4-8

  • claude-opus-4-8

Model Overview

As of 28 May 2026, the most capable generally available model, optimized for autonomous long-horizon agentic workflows, knowledge-intensive tasks, vision, and memory, with strong overall performance across domains. It supports up to a 1M-token context window, 128k output tokens, adaptive reasoning, and full compatibility with the Claude Opus 4.8 toolset and platform features.

How to make the first API call

1️⃣ Required setup (don’t skip this)Create an account: Sign up on the AI/ML API website (if you don’t have one yet). ▪ Generate an API key: In your account dashboard, create an API key and make sure it’s enabled in the UI.

2️ Copy the code example At the bottom of this page, pick the snippet for your preferred programming language (Python / Node.js) and copy it into your project.

3️ Update the snippet for your use caseInsert your API key: replace <YOUR_AIMLAPI_KEY> with your real AI/ML API key. ▪ Select a model: set the model field to the model you want to call. ▪ Provide input: fill in the request input field(s) shown in the example (for example, messages for chat/LLM models, or other inputs for image/video/audio models).

4️ (Optional) Tune the request Depending on the model type, you can add optional parameters to control the output (e.g., generation settings, quality, length, etc.). See the API schema below for the full list.

5️ Run your code Run the updated code in your development environment. Response time depends on the model and request size, but simple requests typically return quickly.

API Schema

post
Body
modelstring · enumRequiredPossible values:
messagesany ofRequired

A list of messages comprising the conversation so far. Depending on the model you use, different message types (modalities) are supported, like text, documents (txt, pdf), images, and audio.

or
stop_sequencesstring[]Optional

Custom text sequences that will cause the model to stop generating.

streambooleanOptional

If set to True, the model response data will be streamed to the client as it is generated using server-sent events.

Default: false
systemany ofOptional

A system prompt is a way of providing context and instructions to Claude, such as specifying a particular goal or role.

stringOptional
or
tool_choiceany ofOptional

Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.

or
or
or
or
string · enumOptional

none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

Possible values:
or
or
or
or
toolsany ofOptional

Definitions of tools that the model may use. If you include tools in your API request, the model may return tool_use content blocks that represent the model's use of those tools. You can then run those tools using the tool input generated by the model and then optionally return results back to the model using tool_result content blocks. Each tool definition includes: name: Name of the tool. description: Optional, but strongly-recommended description of the tool. input_schema: JSON schema for the tool input shape that the model will produce in tool_use output content blocks.

or
thinkingone ofOptional

Configuration for enabling Claude's extended thinking. When enabled, responses include thinking content blocks showing Claude's thinking process before the final answer. Requires a minimum budget of 1,024 tokens and counts towards your max_tokens limit.

or
or
max_tokensnumberOptional

The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

Default: 128000
Responses
200Success
idstringRequired

A unique identifier for the chat completion.

Example: chatcmpl-CQ9FPg3osank0dx0k46Z53LTqtXMl
objectstring · enumRequired

The object type.

Example: chat.completionPossible values:
creatednumberRequired

The Unix timestamp (in seconds) of when the chat completion was created.

Example: 1762343744
modelstringRequired

The model used for the chat completion.

Example: anthropic/claude-opus-4-8
post
/v1/chat/completions
200Success

Code Example

Response

Last updated

Was this helpful?