Vision Models
Welcome to the Vision Models API documentation! The AI/ML API allows you to leverage vision capabilities to analyze and understand images through our models.
gpt-4o
open-ai
gpt-4o-2024-08-06
open-ai
gpt-4o-2024-05-13
open-ai
gpt-4o-mini
open-ai
gpt-4o-mini-2024-07-18
open-ai
gpt-4-turbo
open-ai
gpt-4-turbo-2024-04-09
open-ai
meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
open-source
meta-llama/Llama-Vision-Free
open-source
meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo
open-source
gemini-1.5-flash
gemini-1.5-pro
claude-3-5-sonnet-latest
anthropic
claude-3-haiku-latest
anthropic
claude-3-opus-latest
anthropic
claude-3-sonnet-latest
anthropic
claude-3-5-haiku-latest
anthropic
qwen/qvq-72b-preview
openrouter
Key Features
Image Analysis: Understand and describe the content of images.
Flexible Input Methods: Supports both image URLs and base64 encoded images.
Multiple Image Inputs: Analyze multiple images in a single request.
Quick Start
Images can be provided to the model in two main ways: by passing an image URL or by passing the base64 encoded image directly in the request.
Uploading Images by URL
In this example, the GPT-4o model was used with the corresponding set of parameters. If you are using models from Anthropic (claude-3.5-sonnet, etc), check here.
Uploading Base64 Encoded Images
For local images, you can pass the base64 encoded image to the model.
In this example, the GPT-4o model was used with the corresponding set of parameters. If you are using models from Anthropic (claude-3.5-sonnet, etc), check here.
Multiple Image Inputs
The API can process multiple images in a single request.
Python Example
Last updated
Was this helpful?