Parameters
Learn how you can tweak results of your AI/ML API responses with tailored requests.
Overview
When you send a request to the text model, you can specify custom parameters that affect the model's response. These parameters can control your model usage, optimize your requests, and achieve more creative or exact results.
Whether a parameter is required or optional can be checked in the API Reference for the specific model.
Also, pay attention: some of the parameters described below have similar names and logic. Make sure to check the API Reference to confirm which parameter is used by the model you’ve selected to work with.
Possible Parameters
Frequency Penalty
Penalizes tokens based on how frequently they have already appeared in the generated text. ChatGPT series models use numbers between -2.0 and 2.0: a positive value decreases the likelihood of repeating frequently used tokens, a negative value increases it, and a value of 0 means no penalty is applied for token frequency.
Log Probs
When enabled, the model will return the log probabilities (logarithms of the probabilities) of the top predicted tokens for each step in the completion. For example, if you set logprobs=4
, the model will include the log probabilities for the top 5 most likely tokens at every step:
Logit bias
Modifies the likelihood of specified tokens appearing in the completion. By assigning a bias value to particular tokens, you can increase or decrease their probability in the generated text.
Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
Since the parameter takes in tokens, not text, you’ll want to use some external tokenizer tool to convert text to token IDs.
Maximum Tokens
This parameter specifies the maximum number of tokens (words or pieces of words) that the model will generate in response to the prompt. A lower number of tokens typically results in faster response times.
Messages
A list of messages comprising the conversation so far. Depending on the model you use, different message types (modalities) are supported, like text, images, and audio.
Presence Penalty
Adjusts the likelihood of the model including tokens that have already appeared in the generated text. ChatGPT series models use numbers between -2.0 and 2.0: a positive value reduces the chance of reusing tokens, a negative value increases it, and a value of 0 means no penalty is applied, and the model can freely repeat tokens.
Repetition Penalty
This parameter discourages the model from repeating the same line or phrase, promoting more diverse and engaging content. The penalty scales with how many times a token has been used — the more frequent the token, the greater the penalty.
Stop Sequences
Stop sequences are strings that, when detected in the model's output, signal the model to stop generating further tokens. This is particularly useful when the desired output is a concise response or a single word.
Stream
The parameter in chat models controls how responses are delivered.
If
true
, the model returns output token by token in real time. This allows users to see the response as it is being generated, improving responsiveness in chat applications.If
false
, the model waits until the entire response is generated before returning it in full.
Streaming is particularly useful for interactive conversations but requires handling streamed data properly in code. By default, stream
is set to false
.
Temperature
The temperature controls the randomness of the model's output. Setting it to 0 results in deterministic output, whereas higher values up to 1 introduce more variation and creativity in responses.
Tool choice
Users can use tool_choice
to specify how external tools such as user defined functions or APIs are used. The most commonly used values are:
"auto"
: default mode. Model decides if it uses the tool or not."any"
: forces tool use."none"
: prevents tool use.
For a specific set of acceptable values, see the API Reference for your model.
Tools
A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.
The maximum number of tools that can be connected is limited. For example, in the case of OpenAI models, the maximum number is 128.
Top P (Nucleus Sampling)
The top P parameter, also known as nucleus sampling, filters the model's token choices such that the cumulative probability of the tokens considered at each step is at least P. This method allows for more dynamic and contextually relevant responses.
Top K (Top-K Sampling)
Top K limits the model's choices to the K most likely next tokens. Lower values can speed up generation and may improve coherency by focusing on the most probable tokens.
Last updated
Was this helpful?