Parameters

Learn how you can tweak results of your AI/ML API responses with tailored requests.

Overview

When you send a request to the text model, you can specify custom parameters that affect the model's response. These parameters can control your model usage, optimize your requests, and achieve more creative or exact results.

Whether a parameter is required or optional can be checked in the API Reference for the specific model.

Also, pay attention: some of the parameters described below have similar names and logic. Make sure to check the API Reference to confirm which parameter is used by the model you’ve selected to work with.

Possible Parameters

Frequency Penalty

Penalizes tokens based on how frequently they have already appeared in the generated text. ChatGPT series models use numbers between -2.0 and 2.0: a positive value decreases the likelihood of repeating frequently used tokens, a negative value increases it, and a value of 0 means no penalty is applied for token frequency.

"frequency_penalty": 0.75  # Applies a penalty for frequently used tokens. 

Log Probs

When enabled, the model will return the log probabilities (logarithms of the probabilities) of the top predicted tokens for each step in the completion. For example, if you set logprobs=4, the model will include the log probabilities for the top 5 most likely tokens at every step:

"logprobs": 4  # Instruction to return log probabilities for the 4 most likely tokens

Logit bias

Modifies the likelihood of specified tokens appearing in the completion. By assigning a bias value to particular tokens, you can increase or decrease their probability in the generated text.

Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

Since the parameter takes in tokens, not text, you’ll want to use some external tokenizer tool to convert text to token IDs.

Example

Let's go through some example. If we call the Completions endpoint with the prompt “Once upon a,” the completion is very likely going to start with “ time”.

For example, the word “time” tokenizes to the ID 2435 and the word “ time” (which has a space at the start) tokenizes to the ID 640. We can pass these through logit_bias with -100 to ban them from appearing in the completion, like so:

messages=[{"role": "system", "content": "You finish user's sentences."},
             "role": "user", "content": "Once upon a"} ] 
logit_bias={2435:-100, 640:-100}

Now, the prompt “Once upon a” generates the completion “midnight dreary, while I pondered, weak and weary.” Notice that the word “time” is nowhere to be found, because we’ve effectively banned that token using logit_bias.

Maximum Tokens

This parameter specifies the maximum number of tokens (words or pieces of words) that the model will generate in response to the prompt. A lower number of tokens typically results in faster response times.

max_tokens = 50  # Limit the model to generate up to 50 tokens.

Messages

A list of messages comprising the conversation so far. Depending on the model you use, different message types (modalities) are supported, like text, images, and audio.

messages=[{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]

Presence Penalty

Adjusts the likelihood of the model including tokens that have already appeared in the generated text. ChatGPT series models use numbers between -2.0 and 2.0: a positive value reduces the chance of reusing tokens, a negative value increases it, and a value of 0 means no penalty is applied, and the model can freely repeat tokens.

"presence_penalty": 0.8  # Applies a penalty to discourage token reusing. 

Repetition Penalty

This parameter discourages the model from repeating the same line or phrase, promoting more diverse and engaging content. The penalty scales with how many times a token has been used — the more frequent the token, the greater the penalty.

"repetition_penalty" = 1.2  # Applies a penalty to discourage repetition.

Stop Sequences

Stop sequences are strings that, when detected in the model's output, signal the model to stop generating further tokens. This is particularly useful when the desired output is a concise response or a single word.

stop = ["\n"]  # Instructs the model to stop generating when it produces a newline character.

Stream

The parameter in chat models controls how responses are delivered.

  • If true, the model returns output token by token in real time. This allows users to see the response as it is being generated, improving responsiveness in chat applications.

  • If false, the model waits until the entire response is generated before returning it in full.

Streaming is particularly useful for interactive conversations but requires handling streamed data properly in code. By default, stream is set to false.

stream = True #

Temperature

The temperature controls the randomness of the model's output. Setting it to 0 results in deterministic output, whereas higher values up to 1 introduce more variation and creativity in responses.

temperature = 0.5  # Sets a balance between randomness and determinism.

Tool choice

Users can use tool_choice to specify how external tools such as user defined functions or APIs are used. The most commonly used values ​​are:

  • "auto": default mode. Model decides if it uses the tool or not.

  • "any": forces tool use.

  • "none": prevents tool use.

For a specific set of acceptable values, see the API Reference for your model.

tool_choice = "any"  # Instructs the model to use the tool mandatorily

Tools

A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.

Example
  {
    "type": "function",
    "function": {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": [
              "celsius",
              "fahrenheit"
            ]
          }
        }
      }
    }
  }
]

The maximum number of tools that can be connected is limited. For example, in the case of OpenAI models, the maximum number is 128.

Top P (Nucleus Sampling)

The top P parameter, also known as nucleus sampling, filters the model's token choices such that the cumulative probability of the tokens considered at each step is at least P. This method allows for more dynamic and contextually relevant responses.

top_p = 0.9  # Only tokens that contribute to the top 90% cumulative probability are considered.

Top K (Top-K Sampling)

Top K limits the model's choices to the K most likely next tokens. Lower values can speed up generation and may improve coherency by focusing on the most probable tokens.

top_k = 40  # The model will only consider the top 40 most probable next tokens.

Last updated

Was this helpful?