nova-2

This documentation is valid for the following list of our models:

  • #g1_nova-2-automotive

  • #g1_nova-2-conversationalai

  • #g1_nova-2-drivethru

  • #g1_nova-2-finance

  • #g1_nova-2-general

  • #g1_nova-2-medical

  • #g1_nova-2-meeting

  • #g1_nova-2-phonecall

  • #g1_nova-2-video

  • #g1_nova-2-voicemail

Model Overview

"Nova-2 builds on the advancements of Nova-1 with speech-specific optimizations to its Transformer architecture, refined data curation techniques, and a multi-stage training approach. These improvements result in a lower word error rate (WER) and better entity recognition (including proper nouns and alphanumeric sequences), as well as enhanced punctuation and capitalization.

Nova-2 offers the following model options:

  • automotive: Optimized for audio with automotive oriented vocabulary.

  • conversationalai: Optimized for use cases in which a human is talking to an automated bot, such as IVR, a voice assistant, or an automated kiosk.

  • drivethru: Optimized for audio sources from drivethrus.

  • finance: Optimized for multiple speakers with varying audio quality, such as might be found on a typical earnings call. Vocabulary is heavily finance oriented.

  • general: Optimized for everyday audio processing.

  • medical: Optimized for audio with medical oriented vocabulary.

  • meeting: Optimized for conference room settings, which include multiple speakers with a single microphone.

  • phonecall: Optimized for low-bandwidth audio phone calls.

  • video: Optimized for audio sourced from videos.

  • voicemail: Optimized for low-bandwidth audio clips with a single speaker. Derived from the phonecall model."

Setup your API Key

If you don’t have an API key for the AI/ML API yet, feel free to use our Quickstart guide.

Submit a request

API Schema

Last updated

Was this helpful?