Vision Models
Welcome to the Vision Models API documentation! The AI/ML API allows you to leverage vision capabilities to analyze and understand images through our models.
gpt-4o
open-ai
gpt-4o-2024-08-06
open-ai
gpt-4o-2024-05-13
open-ai
gpt-4o-mini
open-ai
gpt-4o-mini-2024-07-18
open-ai
gpt-4-turbo
open-ai
gpt-4-turbo-2024-04-09
open-ai
meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
open-source
meta-llama/Llama-Vision-Free
open-source
meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo
open-source
gemini-1.5-flash
gemini-1.5-pro
claude-3-5-sonnet-latest
anthropic
claude-3-haiku-latest
anthropic
claude-3-opus-latest
anthropic
claude-3-sonnet-latest
anthropic
claude-3-5-haiku-latest
anthropic
Key Features
Image Analysis: Understand and describe the content of images.
Flexible Input Methods: Supports both image URLs and base64 encoded images.
Multiple Image Inputs: Analyze multiple images in a single request.
Quick Start
Images can be provided to the model in two main ways: by passing an image URL or by passing the base64 encoded image directly in the request.
Example: What's in this image?
Python Example
Uploading Base64 Encoded Images
For local images, you can pass the base64 encoded image to the model.
Python Example
Multiple Image Inputs
The API can process multiple images in a single request.
Python Example
Last updated