Thinking / Reasoning

Overview

Some text models support advanced reasoning mode, enabling them to perform multi-step problem solving, draw inferences, and follow complex instructions. This makes them well-suited for tasks like code generation, data analysis, and answering questions that require understanding context or logic.

Sometimes, if you give the model a serious and complex task, generating a response can take quite a while. In such cases, you might want to use streaming mode to receive the answer word by word as it is being generated.

Models That Support Thinking / Reasoning Mode

Anthropic

Special parameters, such as thinking in Claude models, provide transparency into the model’s step-by-step reasoning process before it gives its final answer.

Supported models:

Google

Google's policy regarding reasoning models is not to provide parameters for explicitly controlling the model's reasoning activity during invocation. However, this activity does occur, and you can even inspect how many tokens it consumed by checking the reasoning_tokens field in the response.

Example of the "usage" section in a Gemini model response

  "usage": {
    "prompt_tokens": 6,
    "completion_tokens": 3050,
    "completion_tokens_details": {
      "reasoning_tokens": 1097
    },
    "total_tokens": 3056

Supported models:

OpenAI and other vendors

The standard way to control reasoning behavior in OpenAI models—and in some models from other providers—is through the reasoning_effort parameter, which tells the model how much internal reasoning it should perform before responding to the prompt.

Accepted values are low, medium, and high. Lower levels prioritize speed and efficiency, while higher levels provide deeper reasoning at the cost of increased token usage and latency. The default is medium, offering a balance between performance and quality.

Supported models:

zhipu/glm-4.5-air

zhipu/glm-4.5

PreviousCode Generation NextFunction Calling

Last updated 2 days ago

Was this helpful?