qwen3-omni-30b-a3b-captioner
Model Overview
This model is an open-source model built on Qwen3-Omni that automatically generates rich, detailed descriptions of complex audio — including speech, music, ambient sounds, and effects — without prompts. It detects emotions, musical styles, instruments, and sensitive information, making it ideal for audio analysis, security auditing, intent recognition, and editing.
How to Make a Call
API Schema
post
Body
modelstring · enumRequiredPossible values:
max_tokensnumber · min: 1Optional
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
streambooleanOptionalDefault:
If set to True, the model response data will be streamed to the client as it is generated using server-sent events.
falseResponses
200Success
post
/v1/chat/completions200Success
Code Example
Last updated
Was this helpful?