> For the complete documentation index, see [llms.txt](https://docs.aimlapi.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.aimlapi.com/api-references/text-models-llm/nvidia/nemotron-3-ultra-550b-a55b.md).

# nemotron-3-ultra-550b-a55b

{% columns %}
{% column width="66.66666666666666%" %}
{% hint style="info" %}
This documentation is valid for the following list of our models:

* `nvidia/nemotron-3-ultra-550b-a55b`
  {% endhint %}
  {% endcolumn %}

{% column width="33.33333333333334%" %} <a href="https://aimlapi.com/app/nemotron-3-ultra-550b-a55b" class="button primary">Try in Playground</a>
{% endcolumn %}
{% endcolumns %}

## Model Overview

A large-scale hybrid Transformer-Mamba Mixture-of-Experts reasoning model with 550B total parameters and 55B active per forward pass. Optimized for complex multi-step reasoning, long-context analysis, agentic orchestration, and tool use with a context window of up to 1M tokens.

{% hint style="success" %}
[Create AI/ML API Key](https://aimlapi.com/app/keys)
{% endhint %}

<details>

<summary>How to make the first API call</summary>

{% stepper %}
{% step %}

## Required setup (don't skip this)

▪ **Create an account:** Sign up on the AI/ML API website (if you don't have one yet).\
▪ **Generate an API key:** In your account dashboard, create an API key and make sure it is **enabled** in the UI.
{% endstep %}

{% step %}

## Copy the code example

At the bottom of this page, pick the snippet for your preferred programming language (Python / Node.js) and copy it into your project.
{% endstep %}

{% step %}

## Update the snippet for your use case

▪ **Insert your API key:** replace `<YOUR_AIMLAPI_KEY>` with your real AI/ML API key.\
▪ **Select a model:** set the `model` field to the model you want to call.\
▪ **Provide input:** fill in the request input field(s) shown in the example (for example, `messages` for chat/LLM models, or other inputs for image/video/audio models).
{% endstep %}

{% step %}

## (Optional) Tune the request

Depending on the model type, you can add optional parameters to control the output (e.g., generation settings, quality, length, etc.). See the API schema below for the full list.
{% endstep %}

{% step %}

## Run your code

Run the updated code in your development environment. Response time depends on the model and request size, but simple requests typically return quickly.

{% hint style="success" %}
If you need a more detailed walkthrough for setting up your development environment and making a request step by step — feel free to use our [Quickstart guide](broken://pages/6620d3d6fef090e59aa4e4217bd4627790ff77d6).
{% endhint %}
{% endstep %}
{% endstepper %}

</details>

## API Schema

## POST /v1/chat/completions

>

```json
{"openapi":"3.0.0","info":{"title":"AIML API","version":"1.0.0"},"servers":[{"url":"https://api.aimlapi.com"}],"paths":{"/v1/chat/completions":{"post":{"operationId":"_v1_chat_completions","requestBody":{"required":true,"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string","enum":["nvidia/nemotron-3-ultra-550b-a55b"]},"messages":{"type":"array","description":"A list of messages comprising the conversation so far.","items":{"type":"object","properties":{"role":{"type":"string","enum":["system","user","assistant","tool"],"description":"The role of the message author."},"content":{"type":"string","description":"The content of the message."}},"required":["role","content"]}},"max_tokens":{"type":"number","minimum":1,"description":"The maximum number of tokens to generate. Controls output length and cost."},"temperature":{"type":"number","minimum":0,"maximum":2,"description":"Sampling temperature. Higher values produce more random output; lower values make it more focused and deterministic. Do not use together with top_p."},"top_p":{"type":"number","minimum":0.01,"maximum":1,"description":"Nucleus sampling threshold. The model considers only tokens comprising the top top_p probability mass. Do not use together with temperature."},"top_k":{"type":"number","description":"Sample from the top K most likely tokens at each step. Reduces low-probability outputs. Recommended for advanced use cases only."},"stream":{"type":"boolean","default":false,"description":"If true, the response will be streamed as server-sent events (SSE) as it is generated."},"stop":{"anyOf":[{"type":"string"},{"type":"array","items":{"type":"string"}}],"description":"Up to 4 sequences where the API will stop generating further tokens."},"frequency_penalty":{"type":"number","minimum":-2,"maximum":2,"nullable":true,"description":"Penalizes tokens based on their frequency in the text so far, reducing repetition."},"presence_penalty":{"type":"number","minimum":-2,"maximum":2,"nullable":true,"description":"Penalizes tokens based on whether they have appeared in the text so far, encouraging the model to discuss new topics."},"seed":{"type":"integer","minimum":1,"description":"If specified, the system will attempt deterministic sampling — repeated requests with the same seed and parameters should return the same result."},"tools":{"type":"array","description":"A list of tools (functions) the model may call. Supports up to 128 functions.","items":{"type":"object","properties":{"type":{"type":"string","enum":["function"]},"function":{"type":"object","properties":{"name":{"type":"string","description":"The name of the function to call."},"description":{"type":"string","description":"A description of what the function does."},"parameters":{"type":"object","description":"The parameters the function accepts, described as a JSON Schema object."}},"required":["name"]}},"required":["type","function"]}},"tool_choice":{"anyOf":[{"type":"string","enum":["none","auto","required"],"description":"none — model will not call any tool. auto — model can pick between a message or tool call. required — model must call one or more tools."}],"description":"Controls which tool (if any) the model calls."},"response_format":{"type":"object","description":"Specifies the output format. Use {\"type\": \"json_object\"} to enable JSON mode, or {\"type\": \"json_schema\", \"json_schema\": {...}} for structured output.","properties":{"type":{"type":"string","enum":["text","json_object","json_schema"]}}},"reasoning":{"type":"object","description":"Configuration for model reasoning/thinking tokens.","properties":{"effort":{"type":"string","enum":["low","medium","high"],"description":"Reasoning effort level. Higher effort uses more tokens but produces better results for complex tasks."},"max_tokens":{"type":"integer","minimum":1,"description":"Maximum number of reasoning tokens. Cannot be used simultaneously with effort."},"exclude":{"type":"boolean","description":"If true, reasoning tokens will be excluded from the response."}}}},"required":["model","messages"],"title":"nvidia/nemotron-3-ultra-550b-a55b"}}}},"responses":{"200":{"content":{"application/json":{"schema":{"type":"object","properties":{"id":{"type":"string","description":"A unique identifier for the chat completion."},"object":{"type":"string","enum":["chat.completion"]},"created":{"type":"number","description":"Unix timestamp of when the completion was created."},"model":{"type":"string","description":"The model used for the completion."},"choices":{"type":"array","items":{"type":"object","properties":{"index":{"type":"number"},"message":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}}},"finish_reason":{"type":"string","enum":["stop","length","tool_calls","content_filter"]}}}},"usage":{"type":"object","properties":{"prompt_tokens":{"type":"number"},"completion_tokens":{"type":"number"},"total_tokens":{"type":"number"}}},"meta":{"type":"object","nullable":true,"properties":{"usage":{"type":"object","nullable":true,"properties":{"credits_used":{"type":"number","description":"The number of credits consumed during generation."},"usd_spent":{"type":"number","description":"The total amount spent in USD."}}}}}},"required":["id","object","created","choices","model","usage"]}}}}}}}}}
```

## Code Example

{% tabs %}
{% tab title="Python" %}
{% code overflow="wrap" %}

```python
import requests
import json  # for getting a structured output with indentation 

response = requests.post(
    "https://api.aimlapi.com/v1/chat/completions",
    headers={
        # Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
        "Authorization":"Bearer <YOUR_AIMLAPI_KEY>",
        "Content-Type":"application/json"
    },
    json={
        "model":"nvidia/nemotron-3-ultra-550b-a55b",
        "messages":[
            {
                "role":"user",
                "content":"Hi! What do you think about mankind?" # insert your prompt
            }
        ]
    }
)

data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))
```

{% endcode %}
{% endtab %}

{% tab title="JavaScript" %}
{% code overflow="wrap" %}

```javascript
async function main() {
  const response = await fetch('https://api.aimlapi.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      // Insert your AIML API Key instead of <YOUR_AIMLAPI_KEY>:
      'Authorization': 'Bearer <YOUR_AIMLAPI_KEY>',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'nvidia/nemotron-3-ultra-550b-a55b',
      messages: [
        {
          role: 'user',
          content: 'Hi! What do you think about mankind?' // insert your prompt here
        }
      ],
    }),
  });

  const data = await response.json();
  console.log(JSON.stringify(data, null, 2));
}

main();
```

{% endcode %}
{% endtab %}
{% endtabs %}

<details>

<summary>Response</summary>

{% code overflow="wrap" %}

```json5
{
  "id": "gen-1749730923-aB3cD4eF5gH6iJ7kL8mN",
  "object": "chat.completion",
  "created": 1749730923,
  "model": "nvidia/nemotron-3-ultra-550b-a55b",
  "system_fingerprint": null,
  "choices": [
    {
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop",
      "native_finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Mankind is a fascinating paradox — a species capable of breathtaking creativity and compassion, yet also of remarkable shortsightedness and destruction.\n\nOn one hand, humans have built civilizations, developed science and art, forged bonds across cultures, and created systems of meaning — philosophy, religion, literature — that speak to a deep yearning for understanding and connection. The capacity for empathy, for sacrificing personal gain for others, for imagining futures different from the present — these are genuinely remarkable traits.\n\nOn the other hand, the historical record is full of exploitation, tribalism, environmental destruction, and cycles of violence that suggest humanity often struggles to act in accordance with its own stated values. The gap between what we know to be right and what we collectively do is one of the central tensions of human existence.\n\nIf I had to offer one framing: humanity is a work in progress. The arc is not guaranteed to bend toward anything — progress is neither inevitable nor irreversible. But the self-awareness that allows humans to recognize their own failings is also what makes genuine improvement possible.\n\nWhat I find most compelling is the question your query implicitly raises: what *should* we think about mankind? And more importantly — what does that thinking commit us to doing?",
        "refusal": null,
        "reasoning": "The user is asking a broad philosophical question about humanity. This is a safe, open-ended prompt. I should give a thoughtful, balanced response that acknowledges both the strengths and weaknesses of humanity without being preachy or one-sided.\n",
        "reasoning_details": [
          {
            "type": "reasoning.text",
            "text": "The user is asking a broad philosophical question about humanity. This is a safe, open-ended prompt. I should give a thoughtful, balanced response that acknowledges both the strengths and weaknesses of humanity without being preachy or one-sided.\n",
            "format": "unknown",
            "index": 0
          }
        ]
      }
    }
  ],
  "usage": {
    "completion_tokens": 284,
    "prompt_tokens": 25,
    "total_tokens": 309,
    "completion_tokens_details": {
      "reasoning_tokens": 62,
      "image_tokens": 0,
      "audio_tokens": 0
    },
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "cache_write_tokens": 0,
      "audio_tokens": 0,
      "video_tokens": 0
    }
  },
  "meta": {
    "usage": {
      "credits_used": 2004,
      "usd_spent": 0.001302
    }
  }
}
```

{% endcode %}

</details>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.aimlapi.com/api-references/text-models-llm/nvidia/nemotron-3-ultra-550b-a55b.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
