Run and Run Step API
Last updated
Was this helpful?
Last updated
Was this helpful?
Runs are processes that execute the assistant’s logic within a thread, allowing it to process messages, generate responses, and call external tools if needed. Runs go through different statuses, such as queued
, in_progress
, and completed
, and trigger events based on their progress, including tool calls and message updates.
This page provides API schemas for the following methods:
After each schema, you'll find a short example demonstrating how to correctly call the described method in code using the OpenAI SDK.
Note that the method names in the API schema and the SDK often differ. Accordingly, when calling these methods via the REST API, you should use the names from the API schema, while for calls through the OpenAI SDK, use the names from the examples.
https://api.aimlapi.com/threads/{threadId}/runs
https://api.aimlapi.com/threads/runs
https://api.aimlapi.com/threads/{threadId}/runs
https://api.aimlapi.com/threads/{threadId}/runs/{runId}
https://api.aimlapi.com/threads/{threadId}/runs/{runId}
https://api.aimlapi.com/threads/{threadId}/runs/{runId}/submit_tool_outputs
https://api.aimlapi.com/threads/{threadId}/runs/{runId}/cancel
https://api.aimlapi.com/threads/{threadId}/runs/{runId}/steps
https://api.aimlapi.com/threads/{threadId}/runs/{runId}/steps/{stepId}
A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
A cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, starting with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.
A cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
A cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, starting with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.
A cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
Filter Messages by the Run ID that generated them.
The ID of the Assistant to use to execute this Run.
Appends additional instructions at the end of the instructions for the Run. This is useful for modifying the behavior on a per-Run basis without overriding other instructions.
Overrides the instructions of the Assistant. This is useful for modifying the behavior on a per-Run basis.
The maximum number of completion tokens that may be used over the course of the Run. The Run will make a best effort to use only the number of completion tokens specified, across multiple turns of the Run. If the Run exceeds the number of completion tokens specified, the Run will end with status incomplete
The maximum number of prompt tokens that may be used over the course of the Run. The Run will make a best effort to use only the number of prompt tokens specified, across multiple turns of the Run. If the Run exceeds the number of prompt tokens specified, the Run will end with status incomplete.
The ID of the model to be used to execute this Run. If a value is provided here, it will override the model associated with the Assistant. If not, the model associated with the Assistant will be used.
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models.
Specifies the format that the model must output.
If True, returns a stream of events that happen during the Run as server-sent events.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
Controls which (if any) tool is called by the model.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
The ID of the Assistant to use to execute this Run.
Overrides the instructions of the Assistant. This is useful for modifying the behavior on a per-Run basis.
The maximum number of completion tokens that may be used over the course of the Run. The Run will make a best effort to use only the number of completion tokens specified, across multiple turns of the Run. If the Run exceeds the number of completion tokens specified, the Run will end with status incomplete
The maximum number of prompt tokens that may be used over the course of the Run. The Run will make a best effort to use only the number of prompt tokens specified, across multiple turns of the Run. If the Run exceeds the number of prompt tokens specified, the Run will end with status incomplete.
The ID of the model to be used to execute this Run. If a value is provided here, it will override the model associated with the Assistant. If not, the model associated with the Assistant will be used.
Whether to enable parallel function calling during tool use.
Specifies the format that the model must output.
If True, returns a stream of events that happen during the Run as server-sent events.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
Controls which (if any) tool is called by the model.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
If True, returns a stream of events that happen during the Run as server-sent events.