# LiteLLM

## About

[LiteLLM](https://www.litellm.ai/) is an open-source Python library that provides a unified API for interacting with multiple large language model providers. It allows developers to switch between different models with minimal code changes, optimizing cost and performance. LiteLLM simplifies integration by offering a single interface for various LLM endpoints, enabling seamless experimentation and deployment across different AI providers.

If you use this library, you can also call models from AI/ML API through it. Below are the most common use cases:

* [Chat completion](#chat-completion)
* [Streaming](#streaming)
* [Chat completion (asynchronous)](#async-completion)
* [Streaming (asynchronous)](#async-streaming)
* [Embedding (asynchronous)](#async-embedding)
* [Image Generation (asynchronous)](#async-image-generation)

## Installation <a href="#usage" id="usage"></a>

Install the library with the standard pip tool in terminal:

```sh
pip install litellm
```

## Making API Calls <a href="#usage" id="usage"></a>

You can choose from LLama, Qwen, Flux, and 200+ other models on the [AI/ML API official website](https://aimlapi.com/models).

### Chat completion

{% code overflow="wrap" %}

```python
import litellm

response = litellm.completion(
    # The model name must include prefix "openai/" + the model id from AI/ML API:
    model="openai/meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo", 
    # your AI/ML API api-key: 
    api_key="<YOUR_AIMLAPI_KEY>", 
    api_base="https://api.aimlapi.com/v2",
    messages=[
        {
            "role": "user",
            "content": "Hey, how's it going?",
        }
    ],
)
```

{% endcode %}

### Streaming <a href="#streaming" id="streaming"></a>

{% code overflow="wrap" %}

```python
import litellm

response = litellm.completion(
    # The model name must include prefix "openai/" + the model id from AI/ML API:
    model="openai/Qwen/Qwen2-72B-Instruct",  
    # your AI/ML API api-key 
    api_key="<YOUR_AIMLAPI_KEY>",  
    api_base="https://api.aimlapi.com/v2",
    messages=[
        {
            "role": "user",
            "content": "Hey, how's it going?",
        }
    ],
    stream=True,
)
for chunk in response:
    print(chunk)
```

{% endcode %}

### Async Completion <a href="#async-completion" id="async-completion"></a>

{% code overflow="wrap" %}

```python
import asyncio

import litellm


async def main():
    response = await litellm.acompletion(
        # The model name must include prefix "openai/" + the model id from AI/ML API:
        model="openai/anthropic/claude-3-5-haiku",  
        # your AI/ML API api-key 
        api_key="<YOUR_AIMLAPI_KEY>", 
        api_base="https://api.aimlapi.com/v2",
        messages=[
            {
                "role": "user",
                "content": "Hey, how's it going?",
            }
        ],
    )
    print(response)


if __name__ == "__main__":
    asyncio.run(main())
```

{% endcode %}

### Async Streaming <a href="#async-streaming" id="async-streaming"></a>

{% code overflow="wrap" %}

```python
import asyncio
import traceback

import litellm


async def main():
    try:
        print("test acompletion + streaming")
        response = await litellm.acompletion(
            # The model name must include prefix "openai/" + model id from AI/ML API:
            model="openai/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
            # your AI/ML API api-key 
            api_key="<YOUR_AIMLAPI_KEY>",
            api_base="https://api.aimlapi.com/v2",
            messages=[{"content": "Hey, how's it going?", "role": "user"}],
            stream=True,
        )
        print(f"response: {response}")
        async for chunk in response:
            print(chunk)
    except:
        print(f"error occurred: {traceback.format_exc()}")
        pass


if __name__ == "__main__":
    asyncio.run(main())
```

{% endcode %}

### Async Embedding <a href="#async-embedding" id="async-embedding"></a>

{% code overflow="wrap" %}

```python
import asyncio

import litellm


async def main():
    response = await litellm.aembedding(
        # The model name must include prefix "openai/" + model id from AI/ML API:
        model="openai/text-embedding-3-small",
        # your AI/ML API api-key 
        api_key="<YOUR_AIMLAPI_KEY>",
        api_base="https://api.aimlapi.com/v1", # 👈 the URL has changed from v2 to v1
        input="Your text string",
    )
    print(response)


if __name__ == "__main__":
    asyncio.run(main())
```

{% endcode %}

### Async Image Generation <a href="#async-image-generation" id="async-image-generation"></a>

{% code overflow="wrap" %}

```python
import asyncio

import litellm


async def main():
    response = await litellm.aimage_generation(
        # The model name must include prefix "openai/" + model id from AI/ML API:
        model="openai/dall-e-3",
        # your AI/ML API api-key 
        api_key="",
        api_base="https://api.aimlapi.com/v1", # 👈 the URL has changed from v2 to v1
        prompt="A cute baby sea otter",
    )
    print(response)


if __name__ == "__main__":
    asyncio.run(main())
```

{% endcode %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.aimlapi.com/integrations/litellm.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
