Veo2 (Text-to-Video)
This documentation is valid for the following list of our models:
veo2
Overview
Google’s cutting-edge AI model designed to generate highly realistic and cinematic video content from textual prompts or a combination of text and images. Leveraging advanced machine learning techniques, Veo2 excels in creating videos with natural motion, realistic physics, and professional-grade visual fidelity.
Key Features:
Text-to-Video (T2V): Converts descriptive text into dynamic video content.
High Resolution Support: Generates videos up to 4K resolution for professional-grade outputs.
Multimodal Input Encoding: Integrates text and image inputs seamlessly for creative flexibility.
Setup your API Key
If you don’t have an API key for the AI/ML API yet, feel free to use our Quickstart guide.
How to Make a Call
Generating a video using this model involves sequentially calling two endpoints:
The first one is for creating and sending a video generation task to the server (returns a generation ID).
The second one is for requesting the generated video from the server using the generation ID received from the first endpoint.
Below, you can find two corresponding API schemas and examples for both endpoint calls.
API Schemas
Video Generation
You can generate a video using this API. In the basic setup, you only need a prompt, the aspect ratio, and the desired duration (5, 6, 7, or 8 seconds).
Fetch the video
Examples
Video generation
Fetch the video
Last updated
Was this helpful?