Run and Run Step API
Last updated
Was this helpful?
Last updated
Was this helpful?
Runs are processes that execute the assistant’s logic within a thread, allowing it to process messages, generate responses, and call external tools if needed. Runs go through different statuses, such as queued
, in_progress
, and completed
, and trigger events based on their progress, including tool calls and message updates.
This page provides API schemas for the following methods:
After each schema, you'll find a short example demonstrating how to correctly call the described method in code using the OpenAI SDK.
Note that the method names in the API schema and the SDK often differ. Accordingly, when calling these methods via the REST API, you should use the names from the API schema, while for calls through the OpenAI SDK, use the names from the examples.
https://api.aimlapi.com/threads/{threadId}/runs
https://api.aimlapi.com/threads/runs
https://api.aimlapi.com/threads/{threadId}/runs
https://api.aimlapi.com/threads/{threadId}/runs/{runId}
https://api.aimlapi.com/threads/{threadId}/runs/{runId}
https://api.aimlapi.com/threads/{threadId}/runs/{runId}/submit_tool_outputs
https://api.aimlapi.com/threads/{threadId}/runs/{runId}/cancel
https://api.aimlapi.com/threads/{threadId}/runs/{runId}/steps
https://api.aimlapi.com/threads/{threadId}/runs/{runId}/steps/{stepId}
A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
A cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, starting with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.
A cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
A cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, starting with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.
A cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
Filter messages by the run ID that generated them
The ID of the assistant to use to execute this run
Appends additional instructions at the end of the instructions for the run. This is useful for modifying the behavior on a per-run basis without overriding other instructions.
Adds additional messages to the thread before creating the run.
Overrides the instructions of the assistant. This is useful for modifying the behavior on a per-run basis.
The maximum number of completion tokens that may be used over the course of the run. The run will make a best effort to use only the number of completion tokens specified, across multiple turns of the run. If the run exceeds the number of completion tokens specified, the run will end with status incomplete
The maximum number of prompt tokens that may be used over the course of the run. The run will make a best effort to use only the number of prompt tokens specified, across multiple turns of the run. If the run exceeds the number of prompt tokens specified, the run will end with status incomplete
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
The ID of the Model to be used to execute this run. If a value is provided here, it will override the model associated with the assistant. If not, the model associated with the assistant will be used.
gpt-4o
, gpt-4o-2024-08-06
, gpt-4o-2024-05-13
, gpt-4o-mini
, gpt-4o-mini-2024-07-18
, chatgpt-4o-latest
, gpt-4-turbo
, gpt-4-turbo-2024-04-09
, gpt-4
, gpt-4-0125-preview
, gpt-4-1106-preview
, gpt-3.5-turbo
, gpt-3.5-turbo-0125
, gpt-3.5-turbo-1106
, o1-preview
, o1-preview-2024-09-12
, o1-mini
, o1-mini-2024-09-12
, o3-mini
, gpt-4.5-preview
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models
low
, medium
, high
Specifies the format that the model must output
If true, returns a stream of events that happen during the Run as server-sent events
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
Controls which (if any) tool is called by the model. none means the model will not call any tools and instead generates a message. auto is the default value and means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools before responding to the user
Override the tools the assistant can use for this run. This is useful for modifying the behavior on a per-run basis.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
Controls for how a thread will be truncated prior to the run. Use this to control the intial context window of the run.
The ID of the assistant to use to execute this run
Override the default system message of the assistant. This is useful for modifying the behavior on a per-run basis.
The maximum number of completion tokens that may be used over the course of the run. The run will make a best effort to use only the number of completion tokens specified, across multiple turns of the run. If the run exceeds the number of completion tokens specified, the run will end with status incomplete
The maximum number of prompt tokens that may be used over the course of the run. The run will make a best effort to use only the number of prompt tokens specified, across multiple turns of the run. If the run exceeds the number of prompt tokens specified, the run will end with status incomplete
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
The ID of the Model to be used to execute this run. If a value is provided here, it will override the model associated with the assistant. If not, the model associated with the assistant will be used.
gpt-4o
, gpt-4o-2024-08-06
, gpt-4o-2024-05-13
, gpt-4o-mini
, gpt-4o-mini-2024-07-18
, chatgpt-4o-latest
, gpt-4-turbo
, gpt-4-turbo-2024-04-09
, gpt-4
, gpt-4-0125-preview
, gpt-4-1106-preview
, gpt-3.5-turbo
, gpt-3.5-turbo-0125
, gpt-3.5-turbo-1106
, o1-preview
, o1-preview-2024-09-12
, o1-mini
, o1-mini-2024-09-12
, o3-mini
, gpt-4.5-preview
Whether to enable parallel function calling during tool use.
Specifies the format that the model must output
If true, returns a stream of events that happen during the Run as server-sent events
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
Options to create a new thread. If no thread is provided when running a request, an empty thread will be created
Controls which (if any) tool is called by the model. none means the model will not call any tools and instead generates a message. auto is the default value and means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools before responding to the user
A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.
Override the tools the assistant can use for this run. This is useful for modifying the behavior on a per-run basis.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
Controls for how a thread will be truncated prior to the run. Use this to control the intial context window of the run.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
A list of tools for which the outputs are being submitted.
If true, returns a stream of events that happen during the Run as server-sent events