Run and Run Step API
Runs are processes that execute the assistant’s logic within a thread, allowing it to process messages, generate responses, and call external tools if needed. Runs go through different statuses, such as queued
, in_progress
, and completed
, and trigger events based on their progress, including tool calls and message updates.
This page provides API schemas for the following methods:
https://api.aimlapi.com/threads/{threadId}/runs
https://api.aimlapi.com/threads/runs
https://api.aimlapi.com/threads/{threadId}/runs
https://api.aimlapi.com/threads/{threadId}/runs/{runId}
https://api.aimlapi.com/threads/{threadId}/runs/{runId}
https://api.aimlapi.com/threads/{threadId}/runs/{runId}/submit_tool_outputs
https://api.aimlapi.com/threads/{threadId}/runs/{runId}/cancel
https://api.aimlapi.com/threads/{threadId}/runs/{runId}/steps
https://api.aimlapi.com/threads/{threadId}/runs/{runId}/steps/{stepId}
After each schema, you'll find a short example demonstrating how to correctly call the described method in code using the OpenAI SDK.
Note that the method names in the API schema and the SDK often differ. Accordingly, when calling these methods via the REST API, you should use the names from the API schema, while for calls through the OpenAI SDK, use the names from the examples.
API Schemas
Create a run
The ID of the Assistant to use to execute this Run.
Appends additional instructions at the end of the instructions for the Run. This is useful for modifying the behavior on a per-Run basis without overriding other instructions.
Overrides the instructions of the Assistant. This is useful for modifying the behavior on a per-Run basis.
The maximum number of completion tokens that may be used over the course of the Run. The Run will make a best effort to use only the number of completion tokens specified, across multiple turns of the Run. If the Run exceeds the number of completion tokens specified, the Run will end with status incomplete
The maximum number of prompt tokens that may be used over the course of the Run. The Run will make a best effort to use only the number of prompt tokens specified, across multiple turns of the Run. If the Run exceeds the number of prompt tokens specified, the Run will end with status incomplete.
The ID of the model to be used to execute this Run. If a value is provided here, it will override the model associated with the Assistant. If not, the model associated with the Assistant will be used.
Whether to enable parallel function calling during tool use.
Constrains effort on reasoning for reasoning models.
Specifies the format that the model must output.
If True, returns a stream of events that happen during the Run as server-sent events.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
Controls which (if any) tool is called by the model.
- none means the model will not call any tools and instead generates a message.
- auto is the default value and means the model can pick between generating a message or calling one or more tools.
- required means the model must call one or more tools before responding to the user. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
POST /threads/{threadId}/runs HTTP/1.1
Host: api.aimlapi.com
Authorization: Bearer <YOUR_AIMLAPI_KEY>
Content-Type: application/json
Accept: */*
Content-Length: 604
{
"assistant_id": "text",
"additional_instructions": "text",
"additional_messages": [
{
"role": "user",
"content": "text",
"attachments": [
{
"file_id": "text",
"tools": [
{
"type": "code_interpreter"
}
]
}
],
"metadata": {
"ANY_ADDITIONAL_PROPERTY": "text"
}
}
],
"instructions": "text",
"max_completion_tokens": 1,
"max_prompt_tokens": 1,
"metadata": {
"ANY_ADDITIONAL_PROPERTY": "text"
},
"model": "openai/gpt-4o",
"parallel_tool_calls": true,
"reasoning_effort": "low",
"response_format": "auto",
"stream": true,
"temperature": 1,
"tool_choice": "none",
"tools": [
{
"type": "code_interpreter"
}
],
"top_p": 1,
"truncation_strategy": {
"type": "auto",
"last_messages": 1
}
}
{
"assistant_id": "text",
"cancelled_at": 1,
"completed_at": 1,
"created_at": 1,
"expires_at": 1,
"failed_at": 1,
"id": "text",
"incomplete_details": {
"reason": "text"
},
"instructions": "text",
"last_error": {
"code": "server_error",
"message": "text"
},
"max_completion_tokens": 1,
"max_prompt_tokens": 1,
"metadata": null,
"model": "text",
"object": "thread.run",
"parallel_tool_calls": true,
"required_action": {
"submit_tool_outputs": {
"tool_calls": [
{
"function": {
"arguments": "text",
"name": "text"
},
"id": "text",
"type": "function"
}
]
},
"type": "submit_tool_outputs"
},
"response_format": "auto",
"started_at": 1,
"status": "queued",
"temperature": 1,
"thread_id": "text",
"tool_choice": "none",
"tools": [
{
"type": "code_interpreter"
}
],
"top_p": 1,
"truncation_strategy": {
"type": "text",
"last_messages": 1
},
"usage": {
"completion_tokens": 1,
"prompt_tokens": 1,
"total_tokens": 1
}
}
Python + OpenAI SDK Example:
from openai import OpenAI
client = OpenAI()
run = client.beta.threads.runs.create(
thread_id="thread_abc123",
assistant_id="asst_abc123"
)
print(run)
Create a Thread and run it in one request
The ID of the Assistant to use to execute this Run.
Overrides the instructions of the Assistant. This is useful for modifying the behavior on a per-Run basis.
The maximum number of completion tokens that may be used over the course of the Run. The Run will make a best effort to use only the number of completion tokens specified, across multiple turns of the Run. If the Run exceeds the number of completion tokens specified, the Run will end with status incomplete
The maximum number of prompt tokens that may be used over the course of the Run. The Run will make a best effort to use only the number of prompt tokens specified, across multiple turns of the Run. If the Run exceeds the number of prompt tokens specified, the Run will end with status incomplete.
The ID of the model to be used to execute this Run. If a value is provided here, it will override the model associated with the Assistant. If not, the model associated with the Assistant will be used.
Whether to enable parallel function calling during tool use.
Specifies the format that the model must output.
If True, returns a stream of events that happen during the Run as server-sent events.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
Controls which (if any) tool is called by the model.
- none means the model will not call any tools and instead generates a message.
- auto is the default value and means the model can pick between generating a message or calling one or more tools.
- required means the model must call one or more tools before responding to the user. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
POST /threads/runs HTTP/1.1
Host: api.aimlapi.com
Authorization: Bearer <YOUR_AIMLAPI_KEY>
Content-Type: application/json
Accept: */*
Content-Length: 1032
{
"assistant_id": "text",
"instructions": "text",
"max_completion_tokens": 1,
"max_prompt_tokens": 1,
"metadata": {
"ANY_ADDITIONAL_PROPERTY": "text"
},
"model": "openai/gpt-4o",
"parallel_tool_calls": true,
"response_format": "auto",
"stream": true,
"temperature": 1,
"thread": {
"messages": [
{
"role": "user",
"content": "text",
"attachments": [
{
"file_id": "text",
"tools": [
{
"type": "code_interpreter"
}
]
}
],
"metadata": {
"ANY_ADDITIONAL_PROPERTY": "text"
}
}
],
"metadata": {
"ANY_ADDITIONAL_PROPERTY": "text"
},
"tool_resources": {
"code_interpreter": {
"file_ids": []
},
"file_search": {
"vector_store_ids": [
"text"
],
"vector_stores": [
{
"chunking_strategy": {
"type": "auto"
},
"file_ids": [
"text"
],
"metadata": {
"ANY_ADDITIONAL_PROPERTY": "text"
}
}
]
}
}
},
"tool_choice": "none",
"tool_resources": {
"code_interpreter": {
"file_ids": []
},
"file_search": {
"vector_store_ids": [
"text"
],
"vector_stores": [
{
"chunking_strategy": {
"type": "auto"
},
"file_ids": [
"text"
],
"metadata": {
"ANY_ADDITIONAL_PROPERTY": "text"
}
}
]
}
},
"tools": [
{
"type": "code_interpreter"
}
],
"top_p": 1,
"truncation_strategy": {
"type": "auto",
"last_messages": 1
}
}
{
"assistant_id": "text",
"cancelled_at": 1,
"completed_at": 1,
"created_at": 1,
"expires_at": 1,
"failed_at": 1,
"id": "text",
"incomplete_details": {
"reason": "text"
},
"instructions": "text",
"last_error": {
"code": "server_error",
"message": "text"
},
"max_completion_tokens": 1,
"max_prompt_tokens": 1,
"metadata": null,
"model": "text",
"object": "thread.run",
"parallel_tool_calls": true,
"required_action": {
"submit_tool_outputs": {
"tool_calls": [
{
"function": {
"arguments": "text",
"name": "text"
},
"id": "text",
"type": "function"
}
]
},
"type": "submit_tool_outputs"
},
"response_format": "auto",
"started_at": 1,
"status": "queued",
"temperature": 1,
"thread_id": "text",
"tool_choice": "none",
"tools": [
{
"type": "code_interpreter"
}
],
"top_p": 1,
"truncation_strategy": {
"type": "text",
"last_messages": 1
},
"usage": {
"completion_tokens": 1,
"prompt_tokens": 1,
"total_tokens": 1
}
}
Python + OpenAI SDK Example:
from openai import OpenAI
client = OpenAI()
run = client.beta.threads.create_and_run(
assistant_id="asst_abc123",
thread={
"messages": [
{"role": "user", "content": "Explain deep learning to a 5 year old."}
]
}
)
print(run)
Retrieve a list of Runs belonging to a specific Thread
A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
A cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, starting with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.
A cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
GET /threads/{threadId}/runs HTTP/1.1
Host: api.aimlapi.com
Authorization: Bearer <YOUR_AIMLAPI_KEY>
Accept: */*
{
"object": "list",
"data": [
{
"assistant_id": "text",
"cancelled_at": 1,
"completed_at": 1,
"created_at": 1,
"expires_at": 1,
"failed_at": 1,
"id": "text",
"incomplete_details": {
"reason": "text"
},
"instructions": "text",
"last_error": {
"code": "server_error",
"message": "text"
},
"max_completion_tokens": 1,
"max_prompt_tokens": 1,
"metadata": null,
"model": "text",
"object": "thread.run",
"parallel_tool_calls": true,
"required_action": {
"submit_tool_outputs": {
"tool_calls": [
{
"function": {
"arguments": "text",
"name": "text"
},
"id": "text",
"type": "function"
}
]
},
"type": "submit_tool_outputs"
},
"response_format": "auto",
"started_at": 1,
"status": "queued",
"temperature": 1,
"thread_id": "text",
"tool_choice": "none",
"tools": [
{
"type": "code_interpreter"
}
],
"top_p": 1,
"truncation_strategy": {
"type": "text",
"last_messages": 1
},
"usage": {
"completion_tokens": 1,
"prompt_tokens": 1,
"total_tokens": 1
}
}
],
"first_id": "text",
"last_id": "text",
"has_more": true
}
Python + OpenAI SDK Example:
from openai import OpenAI
client = OpenAI()
runs = client.beta.threads.runs.list(
"thread_abc123"
)
print(runs)
Retrieve information about a specific Run by its ID
GET /threads/{threadId}/runs/{runId} HTTP/1.1
Host: api.aimlapi.com
Authorization: Bearer <YOUR_AIMLAPI_KEY>
Accept: */*
{
"assistant_id": "text",
"cancelled_at": 1,
"completed_at": 1,
"created_at": 1,
"expires_at": 1,
"failed_at": 1,
"id": "text",
"incomplete_details": {
"reason": "text"
},
"instructions": "text",
"last_error": {
"code": "server_error",
"message": "text"
},
"max_completion_tokens": 1,
"max_prompt_tokens": 1,
"metadata": null,
"model": "text",
"object": "thread.run",
"parallel_tool_calls": true,
"required_action": {
"submit_tool_outputs": {
"tool_calls": [
{
"function": {
"arguments": "text",
"name": "text"
},
"id": "text",
"type": "function"
}
]
},
"type": "submit_tool_outputs"
},
"response_format": "auto",
"started_at": 1,
"status": "queued",
"temperature": 1,
"thread_id": "text",
"tool_choice": "none",
"tools": [
{
"type": "code_interpreter"
}
],
"top_p": 1,
"truncation_strategy": {
"type": "text",
"last_messages": 1
},
"usage": {
"completion_tokens": 1,
"prompt_tokens": 1,
"total_tokens": 1
}
}
Python + OpenAI SDK Example:
from openai import OpenAI
client = OpenAI()
run = client.beta.threads.runs.retrieve(
thread_id="thread_abc123",
run_id="run_abc123"
)
print(run)
Modify a specific run by its ID
POST /threads/{threadId}/runs/{runId} HTTP/1.1
Host: api.aimlapi.com
Authorization: Bearer <YOUR_AIMLAPI_KEY>
Content-Type: application/json
Accept: */*
Content-Length: 47
{
"metadata": {
"ANY_ADDITIONAL_PROPERTY": "text"
}
}
{
"assistant_id": "text",
"cancelled_at": 1,
"completed_at": 1,
"created_at": 1,
"expires_at": 1,
"failed_at": 1,
"id": "text",
"incomplete_details": {
"reason": "text"
},
"instructions": "text",
"last_error": {
"code": "server_error",
"message": "text"
},
"max_completion_tokens": 1,
"max_prompt_tokens": 1,
"metadata": null,
"model": "text",
"object": "thread.run",
"parallel_tool_calls": true,
"required_action": {
"submit_tool_outputs": {
"tool_calls": [
{
"function": {
"arguments": "text",
"name": "text"
},
"id": "text",
"type": "function"
}
]
},
"type": "submit_tool_outputs"
},
"response_format": "auto",
"started_at": 1,
"status": "queued",
"temperature": 1,
"thread_id": "text",
"tool_choice": "none",
"tools": [
{
"type": "code_interpreter"
}
],
"top_p": 1,
"truncation_strategy": {
"type": "text",
"last_messages": 1
},
"usage": {
"completion_tokens": 1,
"prompt_tokens": 1,
"total_tokens": 1
}
}
Python + OpenAI SDK Example:
from openai import OpenAI
client = OpenAI()
run = client.beta.threads.runs.update(
thread_id="thread_abc123",
run_id="run_abc123",
metadata={"user_id": "user_abc123"},
)
print(run)
Submit Tool outputs to a specific Run
If True, returns a stream of events that happen during the Run as server-sent events.
POST /threads/{threadId}/runs/{runId}/submit_tool_outputs HTTP/1.1
Host: api.aimlapi.com
Authorization: Bearer <YOUR_AIMLAPI_KEY>
Content-Type: application/json
Accept: */*
Content-Length: 72
{
"tool_outputs": [
{
"output": "text",
"tool_call_id": "text"
}
],
"stream": true
}
{
"assistant_id": "text",
"cancelled_at": 1,
"completed_at": 1,
"created_at": 1,
"expires_at": 1,
"failed_at": 1,
"id": "text",
"incomplete_details": {
"reason": "text"
},
"instructions": "text",
"last_error": {
"code": "server_error",
"message": "text"
},
"max_completion_tokens": 1,
"max_prompt_tokens": 1,
"metadata": null,
"model": "text",
"object": "thread.run",
"parallel_tool_calls": true,
"required_action": {
"submit_tool_outputs": {
"tool_calls": [
{
"function": {
"arguments": "text",
"name": "text"
},
"id": "text",
"type": "function"
}
]
},
"type": "submit_tool_outputs"
},
"response_format": "auto",
"started_at": 1,
"status": "queued",
"temperature": 1,
"thread_id": "text",
"tool_choice": "none",
"tools": [
{
"type": "code_interpreter"
}
],
"top_p": 1,
"truncation_strategy": {
"type": "text",
"last_messages": 1
},
"usage": {
"completion_tokens": 1,
"prompt_tokens": 1,
"total_tokens": 1
}
}
Python + OpenAI SDK Example:
from openai import OpenAI
client = OpenAI()
run = client.beta.threads.runs.submit_tool_outputs(
thread_id="thread_123",
run_id="run_123",
tool_outputs=[
{
"tool_call_id": "call_001",
"output": "70 degrees and sunny."
}
]
)
print(run)
Cancel a specific Run by its ID
POST /threads/{threadId}/runs/{runId}/cancel HTTP/1.1
Host: api.aimlapi.com
Authorization: Bearer <YOUR_AIMLAPI_KEY>
Accept: */*
{
"assistant_id": "text",
"cancelled_at": 1,
"completed_at": 1,
"created_at": 1,
"expires_at": 1,
"failed_at": 1,
"id": "text",
"incomplete_details": {
"reason": "text"
},
"instructions": "text",
"last_error": {
"code": "server_error",
"message": "text"
},
"max_completion_tokens": 1,
"max_prompt_tokens": 1,
"metadata": null,
"model": "text",
"object": "thread.run",
"parallel_tool_calls": true,
"required_action": {
"submit_tool_outputs": {
"tool_calls": [
{
"function": {
"arguments": "text",
"name": "text"
},
"id": "text",
"type": "function"
}
]
},
"type": "submit_tool_outputs"
},
"response_format": "auto",
"started_at": 1,
"status": "queued",
"temperature": 1,
"thread_id": "text",
"tool_choice": "none",
"tools": [
{
"type": "code_interpreter"
}
],
"top_p": 1,
"truncation_strategy": {
"type": "text",
"last_messages": 1
},
"usage": {
"completion_tokens": 1,
"prompt_tokens": 1,
"total_tokens": 1
}
}
Python + OpenAI SDK Example:
from openai import OpenAI
client = OpenAI()
run = client.beta.threads.runs.cancel(
thread_id="thread_abc123",
run_id="run_abc123"
)
print(run)
Retrieve a list of Run Steps belonging to a specific Run
A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
A cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, starting with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.
A cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
Filter Messages by the Run ID that generated them.
GET /threads/{threadId}/runs/{runId}/steps HTTP/1.1
Host: api.aimlapi.com
Authorization: Bearer <YOUR_AIMLAPI_KEY>
Accept: */*
{
"object": "list",
"data": [
{
"assistant_id": "text",
"cancelled_at": 1,
"completed_at": 1,
"created_at": 1,
"expired_at": 1,
"failed_at": 1,
"id": "text",
"last_error": {
"code": "server_error",
"message": "text"
},
"metadata": null,
"object": "thread.run.step",
"run_id": "text",
"status": "in_progress",
"step_details": {
"message_creation": {
"message_id": "text"
},
"type": "message_creation"
},
"thread_id": "text",
"type": "message_creation",
"usage": {
"completion_tokens": 1,
"prompt_tokens": 1,
"total_tokens": 1
}
}
],
"first_id": "text",
"last_id": "text",
"has_more": true
}
Python + OpenAI SDK Example:
from openai import OpenAI
client = OpenAI()
run_steps = client.beta.threads.runs.steps.list(
thread_id="thread_abc123",
run_id="run_abc123"
)
print(run_steps)
Retrieve information about a specific Run Step by its ID
GET /threads/{threadId}/runs/{runId}/steps/{stepId} HTTP/1.1
Host: api.aimlapi.com
Authorization: Bearer <YOUR_AIMLAPI_KEY>
Accept: */*
{
"assistant_id": "text",
"cancelled_at": 1,
"completed_at": 1,
"created_at": 1,
"expired_at": 1,
"failed_at": 1,
"id": "text",
"last_error": {
"code": "server_error",
"message": "text"
},
"metadata": null,
"object": "thread.run.step",
"run_id": "text",
"status": "in_progress",
"step_details": {
"message_creation": {
"message_id": "text"
},
"type": "message_creation"
},
"thread_id": "text",
"type": "message_creation",
"usage": {
"completion_tokens": 1,
"prompt_tokens": 1,
"total_tokens": 1
}
}
Python + OpenAI SDK Example:
from openai import OpenAI
client = OpenAI()
run_step = client.beta.threads.runs.steps.retrieve(
thread_id="thread_abc123",
run_id="run_abc123",
step_id="step_abc123"
)
print(run_step)
Last updated
Was this helpful?