Can I call API in the asynchronous mode?
Sure, any of our available models. Let's see how this works with an example in Python.
Example in Python
Below, we will see how two requests are handled when the second one is shorter and lighter than the first. We will compare synchronous processing (first example) and asynchronous processing (second example). After each example, the Response section shows the model's output for both queries. Pay attention to the order in which the answers are returned in each response!
Synchronous call:
from openai import OpenAI
def complete_chat(question):
    api_key = '<YOUR_AIMLAPI_KEY>'
    client = OpenAI(
        base_url='https://api.aimlapi.com',
        api_key=api_key,
    )    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}],
    )
    print(f"Response for: {question}\n{response}\n")
def main():
    long_question = "List the 5 most famous hockey players of the 20th century."
    short_question = "What is 2+2?"
    # Execute both requests sequentially
    complete_chat(long_question)
    complete_chat(short_question)
if __name__ == "__main__":
    main()Asynchronous call:
import asyncio
from openai import AsyncOpenAI
async def complete_chat(question):
    api_key = '<YOUR_API_KEY>'
    client = AsyncOpenAI(
        base_url='https://api.aimlapi.com',
        api_key=api_key,
    )    
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}],
    )
    print(f"Response for: {question}\n{response}\n")
async def main():
    long_question = "List the 5 most famous hockey players of the 20th century."
    short_question = "What is 2+2?"
    # Run both requests concurrently
    await asyncio.gather(
        complete_chat(long_question),
        complete_chat(short_question),
    )
if __name__ == "__main__":
    try:
        asyncio.run(main())  # Works in a regular Python script
    except RuntimeError:
        loop = asyncio.get_event_loop()
        loop.run_until_complete(main())  # Works in Jupyter and other environments
As we can see, in the case of asynchronous execution, the response to a shorter or lighter query may be returned faster than the response to a longer or more complex one, even if the lighter query was formally queued second.
Last updated
Was this helpful?
