Veo 2 (Image-to-Video)
An advanced multimodal (image + text) AI model that transforms static images into high-quality, dynamic video content. It builds upon the success of Google's Veo2 text-to-video model, offering unprecedented control and realism in video generation from still images, faithful content preservation from source images, and intuitive motion generation with physics-aware movement.
How to Make a Call
API Schemas
Create a video generation task and send it to the server
You can generate a video using this API.
To quickly test video models from different developers without changing endpoints, use our new universal short one — https://api.aimlapi.com/v2/video/generations.
Bearer key
The text description of the scene, subject, or action to generate in the video.
A direct link to an online image or a Base64-encoded local image that will serve as the visual base or the first frame for the video.
A direct link to an online image or a Base64-encoded local image to be used as the last frame of the video.
The aspect ratio of the generated video.
The length of the output video in seconds.
The description of elements to avoid in the generated video.
Varying the seed integer is a way to get different results for the same other request parameters. Using the same value for an identical request will produce similar results. If unspecified, a random number is chosen.
Whether to enhance the video generation.
trueSuccessfully generated video
Successfully generated video
Fetch the video
After sending a request for video generation, this task is added to the queue. This endpoint lets you check the status of a video generation task using its id, obtained from the endpoint described above.
If the video generation task status is complete, the response will include the final result — with the generated video URL and additional metadata.
Bearer key
Successfully generated video
Successfully generated video
Full Example: Generating and Retrieving the Video From the Server
We have a classic reproduction of the famous da Vinci painting. Let's ask the model to generate a video where the Mona Lisa puts on glasses.
Original: 1280x720
Low-res GIF preview:

Last updated
Was this helpful?