minimax-music [legacy]

This documentation is valid for the following list of our models:

  • minimax-music

An advanced AI model that generates diverse high-quality audio compositions by analyzing and reproducing musical patterns, rhythms, and vocal styles from the reference track. Refine the process using a text prompt.

How to Make a Call

Step-by-Step Instructions

Generating an audio using this model involves sequentially calling two endpoints:

  • The first one is for creating and sending a video generation task to the server (returns a generation ID).

  • The second one is for requesting the generated video from the server using the generation ID received from the first endpoint.

Below, you can find two corresponding API schemas and examples for both endpoint calls.


If you want to learn how to call AI models via API from the very basics, feel free to use our Quickstart guide.

API Schemas

Generate a music sample

This endpoint creates and sends a music generation task to the server — and returns a generation ID and the task status.

post
Authorizations
AuthorizationstringRequired

Bearer key

Body
modelundefined · enumRequiredPossible values:
promptstringRequired

Lyrics with optional formatting. You can use a newline to separate each line of lyrics. You can use two newlines to add a pause between lines. You can use double hash marks (##) at the beginning and end of the lyrics to add accompaniment. Maximum 600 characters.

reference_audio_urlstring · uriRequired

Reference song, should contain music and vocals. Must be a .wav or .mp3 file longer than 15 seconds.

Responses
default

Retrieve the generated music sample from the server

After sending a request for music generation, this task is added to the queue. Based on the service's load, the generation can be completed in 50-60 seconds or take a bit more.

get
Authorizations
AuthorizationstringRequired

Bearer key

Query parameters
generation_idstringRequiredExample: <REPLACE_WITH_YOUR_GENERATION_ID>
Responses
200Success
application/json
idstringRequired

The ID of the generated audio.

Example: 60ac7c34-3224-4b14-8e7d-0aa0db708325
statusstring · enumRequired

The current status of the generation task.

Example: completedPossible values:
get
/v2/generate/audio
200Success

Quick Code Example

Here is an example of generation an audio file based on a sample and a prompt using the music model minimax-music.

Full example explanation

As an example, we will generate a song using the popular minimax-music model from the Chinese company MiniMax. As you can verify in its API Schemas above, this model accepts an audio sample as input—extracting information about its vocals and instruments for use in the generation process—along with a text prompt where we can provide lyrics for our song.

We used a publicly available sample from royalty-free sample database and generated some lyrics in Chat GPT:

Side by side, through thick and thin, &#xNAN;With a laugh, we always win. &#xNAN;Storms may come, but we stay true, &#xNAN;Friends forever—me and you!

To turn this into a model-friendly prompt (as a single string), we added hash symbols and line breaks.

'''\ ##Side by side, through thick and thin, \n\nWith a laugh, we always win. \n\n Storms may come, but we stay true, \n\nFriends forever—me and you!##\ '''

A notable feature of our audio and video models is that uploading the prompt or sample, generating the content, and retrieving the final file from the server are handled through separate API calls. (AIML API tokens are only consumed during the first step—i.e., the actual content generation.)

We’ve written a complete code example that sequentially calls both endpoints — you can view and copy it below. Don’t forget to replace <YOUR_AIMLAPI_KEY> with your actual AIML API Key from your account!

The structure of the code is simple: there are two separate functions for calling each endpoint, and a main function that orchestrates everything.

Execution starts automatically from main(). It first runs the function that creates and sends a music generation task to the server — this is where you pass your prompt describing the desired musical fragment. This function returns a generation ID and the initial task status:

This indicates that the file upload and our generation has been queued on the server (which took 7 seconds in our case).

Next, main() launches the second function — the one that checks the task status and, once ready, retrieves the download URL from the server. This second function is called in a loop every 10 seconds.

During execution, you’ll see messages in the output:

  • If the file is not yet ready:

  • Once the file is ready, a completion message appears with the download info. In our case, after five reruns of the second code block (waiting a total of about 50-60 seconds), we saw the following output:

As you can see, the 'status' is now 'completed', and further in the output line, we have a URL where the generated audio file can be downloaded.


Listen to the track we generated below the code and response blocks.

Response

Listen to the track we generated:

Last updated

Was this helpful?