# OpenAI

## Overview

OpenAI is an AI research and product company focused on developing general-purpose artificial intelligence systems for both consumers and enterprises. The company is widely known for the GPT family of models, including lightweight [GPT-4.1 Nano](/api-references/text-models-llm/openai/gpt-4.1-nano.md) variants for low-latency workloads, balanced [GPT-4.1](/api-references/text-models-llm/openai/gpt-4.1.md) and [GPT-4o](/api-references/text-models-llm/openai/gpt-4o.md) models for production use, and advanced reasoning-oriented o-series models designed for complex coding, analytical, and agentic tasks. OpenAI’s broader ecosystem also includes multimodal capabilities such as image understanding, speech processing, audio generation, and real-time interaction APIs, along with products like ChatGPT, Codex, and developer-focused agentic tooling.

The chat models from this provider support multiple API paradigms and multimodal workflows. OpenAI models can be accessed through both the standard `/v1/chat/completions` endpoint and the newer `/v1/responses` endpoint, which provides unified handling for text, images, audio, tool usage, and structured outputs. Some models additionally support integrated web search and retrieval capabilities, allowing them to access external information sources directly during generation.

<details>

<summary>Chat Completions vs. Responses API</summary>

**Chat Completions**\
The *chat completions* API is the older, chat-oriented interface where you send a list of messages (`role: user`, `role: assistant`, etc.), and the model returns a single response. It was designed specifically for conversational workflows and follows a structured chat message format. It is now considered a legacy interface.

**Responses**\
The *Responses* API is the newer, unified interface used across OpenAI’s latest models. Instead of focusing only on chat, it supports multiple input types (text, images, audio, tools, etc.) and multiple output modalities (text, JSON, images, audio, video). It is more flexible, more consistent across models, and intended to replace chat completions entirely.

</details>

#### Supported capabilities

Supported capabilities vary depending on the specific model, with different models offering different combinations of the features listed below.

* Text completions: Build advanced conversational systems and text-processing pipelines.
* Function Calling: Utilize tools, APIs, and structured workflows.
* Stream mode: Receive partial responses incrementally as tokens are generated.
* Batch Processing: Execute multiple independent requests within a single API call.
* Vision Tasks: Process and analyze images and visual inputs.
* Audio Tasks: Transcribe, generate, and process speech and audio streams.
* Web Search: Access external web information directly from supported models.

***

Other model categories from this provider are available as well.

* [Image](/api-references/image-models/openai.md)&#x20;
* [Speech-To-Text](/api-references/speech-models/speech-to-text/openai.md)&#x20;
* [Embedding](/api-references/embedding-models/openai.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.aimlapi.com/api-references/text-models-llm/openai.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
