v5/text-to-video

This documentation is valid for the following list of our models:

  • pixverse/v5/text-to-video

This model provides faster text-to-video rendering with consistently sharp, realistic, and cinematic-quality results. This model also generates videos with synchronized audio. For lip-sync input, you may supply text with a predefined voice.

Setup your API Key

If you don’t have an API key for the AI/ML API yet, feel free to use our Quickstart guide.

How to Make a Call

Step-by-Step Instructions

Generating a video using this model involves sequentially calling two endpoints:

  • The first one is for creating and sending a video generation task to the server (returns a generation ID).

  • The second one is for requesting the generated video from the server using the generation ID received from the first endpoint.

Below, you can find both corresponding API schemas.

API Schemas

Create a video generation task and send it to the server

You can generate a video using this API. In the basic setup, you only need a prompt. This endpoint creates and sends a video generation task to the server — and returns a generation ID. For lip-sync input, you may supply text (lip_sync_tts_content) with a predefined voice (lip_sync_tts_speaker).

post
Body
modelstring · enumRequiredPossible values:
promptstringRequired

The text description of the scene, subject, or action to generate in the video.

aspect_ratiostring · enumOptional

The aspect ratio of the generated video.

Default: 16:9Possible values:
resolutionstring · enumOptional

An enumeration where the short side of the video frame determines the resolution.

Default: 720pPossible values:
durationinteger · enumOptional

The output video length in seconds. The 1080p quality option does not support 8-second videos.

Default: 5Possible values:
negative_promptstringOptional

The description of elements to avoid in the generated video.

stylestring · enumOptional

The style of the generated video.

Possible values:
seedintegerOptional

Varying the seed integer is a way to get different results for the same other request parameters. Using the same value for an identical request will produce similar results. If unspecified, a random number is chosen.

lip_sync_tts_contentstringOptional

The text content to be lip-synced in the video.

lip_sync_tts_speakerstring · enumOptional

A predefined system voice used for generating speech in the video.

Possible values:
Responses
200Success
application/json
post
/v2/video/generations
200Success

Retrieve the generated video from the server

After sending a request for video generation, this task is added to the queue. This endpoint lets you check the status of a video generation task using its generation_id, obtained from the endpoint described above. If the video generation task status is completed, the response will include the final result — with the generated video URL and additional metadata.

get
Authorizations
AuthorizationstringRequired

Bearer key

Query parameters
generation_idstringRequiredExample: <REPLACE_WITH_YOUR_GENERATION_ID>
Responses
200Success
application/json
get
/v2/video/generations
200Success

Full Example: Generating and Retrieving the Video From the Server

The code below creates a video generation task, then automatically polls the server every 10 seconds until it finally receives the video URL.

Generation takes about 30–40 seconds for a 5-second 720p video and around 1 minute 15 seconds for 1080p.

Response

Processing time: ~1 min 14 sec.

Original: 1920x1080

Low-res GIF preview:

"A menacing evil dragon appears in a distance above the tallest mountain, then rushes toward the camera with its jaws open, revealing massive fangs. We see it's coming."

Full Example #2: Lip-Sync

Now let’s test the parameters related to the lip-sync feature. We’ll generate a video with some character and give them a piece of text to speak. The text goes into the lip_sync_tts_content parameter, and the lip_sync_tts_speaker parameter selects one of the predefined voices.

The code below, just like in the first example, creates a video generation task and then automatically polls the server every 15 seconds until it finally receives the video URL.

Statuses
Status
Description

queued

Job is waiting in queue

generating

Video is being generated

completed

Generation successful, video available

error

Generation failed, check error field

Response

Processing time: ~1 min 17 sec.

Generated video (1280x720, with sound):

Last updated

Was this helpful?