fastinference-llm


Namefastinference-llm JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://github.com/blefo/FastInference
SummarySeamlessly integrate with top LLM APIs for speedy, robust, and scalable querying. Ideal for developers needing quick, reliable AI-powered responses.
upload_time2024-05-16 12:49:31
maintainerNone
docs_urlNone
authorBaptiste Lefort
requires_pythonNone
licenseMIT
keywords api fast inference distributed llm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="center">
        ⚡FastInference - The Ultra-Fast LLM Querying Manager (OpenAi, HuggingFace, Ollama, ...)
</h1>
<p align="center">
            <p align="center">Query any LLM API and get the responses very fast with a <b> highly robust and distributed </b> library. <br>
            All the LLMs providers can be used with FastInference  [OpenAI, Huggingface, VertexAI, TogetherAI, Azure, etc.]
</p>

## Features

- **High Performance**: Get high inference speed thanks to intelligent asynchronous and distributed querying.
- **Robust Error Handling**: Advanced mechanisms to handle exceptions, ensuring robust querying.
- **Ease of Use**: Simplified API designed for working with all the LLM providers: easy and fast.
- **Scalability**: Optimized for large datasets and high concurrency.

## The workflow
![Diagram of the workflow](https://github.com/blefo/FastInference/blob/main/detailed_workflow.png)

## Usage

```bash
pip install fastinference-llm
```

```python
from fastinference import FastInference

prompt = """
            You will be provided with a tweet, and your task is to classify its sentiment as positive, neutral, or negative.
            
            Tweet: {tweet_content}
        """

api_key = "you-api-key"
model_name = "modelprovider/model_name"

results = FastInference(file_path="your-dataset-file-path", 
                        main_column="your-main-feature", 
                        prompt=prompt, 
                        api_key=api_key,
                        model_name=model_name, 
                        only_response=True).run()
print(results)
```

### The Parameters
Here are the parameters that are not optional for initializing the FastInference object.
* **file_path** (string): path to your dataset (csv, xlsx, json, parquet)
* **main_column** (string): name of the main column (explained below in detail)
* **prompt** (string): the prompt with the variable in it (explained below in detail)
* **api_key** (string): your API key
* **model_name** (string): has the format provider/model_name (for example "huggingface/meta-llama/Meta-Llama-3-70B")
* **only_response** (bool): if True, you get a list containing the response of the LLM otherwise you get the full object normalized following the OpenAI API



### The Prompt
One of the parameter of the FastInference library is a prompt.The prompt must be in a string format.
It contains between curly brackets the column's name from your dataset where you want the variable to be in the prompt.

#### Example Usage
To understand how to use the `prompt` parameter in the FastInference library, we'll provide an example based on a tweet sentiment classification task. Consider a dataset with the following structure:

| tweet_content                                           | related_entities |
|---------------------------------------------------------|------------------|
| "Just had the best day ever at the NeurIPS Conference!" | "NeurIPS"        |
| "Traffic was terrible this morning in Paris."           | "Paris"          |
| "Looking forward to the new Star Wars movie!"           | "Star Wars"      |

One of the parameters of the FastInference library is a `prompt`. This must be formatted as a string. It contains, within curly brackets, the names of the columns from your dataset that you want to include in the prompt.

Here's how you could set up your prompt for classifying the sentiment of tweets based on their content and related entities:

```python
prompt = """
          You will be provided with a tweet, and your task is to classify its sentiment as positive, neutral, or negative.
          You must consider the related identified entities in order to make a good decision.
          
          Tweet: {tweet_content}
          Related Entities: {related_entities}
          """
```

### The main_column Parameter
The parameter main_column is the parameter that is considered as the most important information for inference.
It is a string containing the name of the most important column in your data.
It does not influence the LLM in inference since the prompt does not create hierarchical relationships between data.

**The main column has no influence on LLM inference.**

### Output format

If only_response is True, it gives back a list with items created by the library, and these items are strings.

Here is the structure of the return data if *only_response=True*:
```python
["response 1", "response 2", ..., "response n"]
```

But if only_response is False, it gives back a list of Datablock items. Each Datablock item has these parts: content (str), metadata (dict), content_with_prompt (Prompt object), and response (ModelResponse, which is part of the OpenAI API). You can easily get the words generated by the language model by picking from the "choices" attribute.

Here is the structure of the return data if *only_response=False*:
```python
[
        Datablock(content: str, content_with_prompt: PromptTemplate, metadata: dict, response: ModelResponse),
        ...
        Datablock(content: str, content_with_prompt: PromptTemplate, metadata: dict, response: ModelResponse)
]
```

The only_response=False is by default and advised. The Datablock item keeps track of data correctly after the distribution steps. It makes sure the data stays reliable and concistent throughout the process.


### Supported Providers ([Docs](https://docs.litellm.ai/docs/providers))

The FastInference is based on the open-source [LiteLLM](https://github.com/BerriAI/litellm/blob/main/README.md) library. All the supported LLMs by LiteLLM are also by FastInference.

| Provider                                                                            | [Completion](https://docs.litellm.ai/docs/#basic-usage) |
| ----------------------------------------------------------------------------------- | ------------------------------------------------------- |
| [openai](https://docs.litellm.ai/docs/providers/openai)                             | ✅                                                      |
| [azure](https://docs.litellm.ai/docs/providers/azure)                               | ✅                                                      |
| [aws - sagemaker](https://docs.litellm.ai/docs/providers/aws_sagemaker)             | ✅                                                      |
| [aws - bedrock](https://docs.litellm.ai/docs/providers/bedrock)                     | ✅                                                      |
| [google - vertex_ai [Gemini]](https://docs.litellm.ai/docs/providers/vertex)        | ✅                                                      |
| [google - palm](https://docs.litellm.ai/docs/providers/palm)                        | ✅                                                      |
| [google AI Studio - gemini](https://docs.litellm.ai/docs/providers/gemini)          | ✅                                                      |
| [mistral ai api](https://docs.litellm.ai/docs/providers/mistral)                    | ✅                                                      |
| [cloudflare AI Workers](https://docs.litellm.ai/docs/providers/cloudflare_workers)  | ✅                                                      |
| [cohere](https://docs.litellm.ai/docs/providers/cohere)                             | ✅                                                      |
| [anthropic](https://docs.litellm.ai/docs/providers/anthropic)                       | ✅                                                      |
| [huggingface](https://docs.litellm.ai/docs/providers/huggingface)                   | ✅                                                      |
| [replicate](https://docs.litellm.ai/docs/providers/replicate)                       | ✅                                                      |
| [together_ai](https://docs.litellm.ai/docs/providers/togetherai)                    | ✅                                                      |
| [openrouter](https://docs.litellm.ai/docs/providers/openrouter)                     | ✅                                                      |
| [ai21](https://docs.litellm.ai/docs/providers/ai21)                                 | ✅                                                      |
| [baseten](https://docs.litellm.ai/docs/providers/baseten)                           | ✅                                                      |
| [vllm](https://docs.litellm.ai/docs/providers/vllm)                                 | ✅                                                      |
| [nlp_cloud](https://docs.litellm.ai/docs/providers/nlp_cloud)                       | ✅                                                      |
| [aleph alpha](https://docs.litellm.ai/docs/providers/aleph_alpha)                   | ✅                                                      |
| [petals](https://docs.litellm.ai/docs/providers/petals)                             | ✅                                                      |
| [ollama](https://docs.litellm.ai/docs/providers/ollama)                             | ✅                                                      |
| [deepinfra](https://docs.litellm.ai/docs/providers/deepinfra)                       | ✅                                                      |
| [perplexity-ai](https://docs.litellm.ai/docs/providers/perplexity)                  | ✅                                                      |
| [Groq AI](https://docs.litellm.ai/docs/providers/groq)                              | ✅                                                      |
| [anyscale](https://docs.litellm.ai/docs/providers/anyscale)                         | ✅                                                      |
| [IBM - watsonx.ai](https://docs.litellm.ai/docs/providers/watsonx)                  | ✅                                                      |
| [voyage ai](https://docs.litellm.ai/docs/providers/voyage)                          |                                                         |
| [xinference [Xorbits Inference]](https://docs.litellm.ai/docs/providers/xinference) |                                                         |

## Contributing

To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.

Here's how to modify the repo locally: Clone the repo

```bash
git clone https://github.com/blefo/FastInference.git
```

Make your changes then Submit a PR! 🚀
push your fork to your GitHub repo and submit a PR from there

- Add new method for data loading
- Make the API KEY and model's information directly loaded in the os variables
- Optimize the DataBlock Structure
- Leverage the LiteLLM's feature for rotating APIs and keys in order to avoid the exceptions 

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/blefo/FastInference",
    "name": "fastinference-llm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "api, fast, inference, distributed, llm",
    "author": "Baptiste Lefort",
    "author_email": "lefort.baptiste@icloud.com",
    "download_url": "https://files.pythonhosted.org/packages/fe/0a/108004bf466e73884c5ab1f589407e3553e00b61935d72ad066c9cd3f808/fastinference_llm-0.0.5.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">\n        \u26a1FastInference - The Ultra-Fast LLM Querying Manager (OpenAi, HuggingFace, Ollama, ...)\n</h1>\n<p align=\"center\">\n            <p align=\"center\">Query any LLM API and get the responses very fast with a <b> highly robust and distributed </b> library. <br>\n            All the LLMs providers can be used with FastInference  [OpenAI, Huggingface, VertexAI, TogetherAI, Azure, etc.]\n</p>\n\n## Features\n\n- **High Performance**: Get high inference speed thanks to intelligent asynchronous and distributed querying.\n- **Robust Error Handling**: Advanced mechanisms to handle exceptions, ensuring robust querying.\n- **Ease of Use**: Simplified API designed for working with all the LLM providers: easy and fast.\n- **Scalability**: Optimized for large datasets and high concurrency.\n\n## The workflow\n![Diagram of the workflow](https://github.com/blefo/FastInference/blob/main/detailed_workflow.png)\n\n## Usage\n\n```bash\npip install fastinference-llm\n```\n\n```python\nfrom fastinference import FastInference\n\nprompt = \"\"\"\n            You will be provided with a tweet, and your task is to classify its sentiment as positive, neutral, or negative.\n            \n            Tweet: {tweet_content}\n        \"\"\"\n\napi_key = \"you-api-key\"\nmodel_name = \"modelprovider/model_name\"\n\nresults = FastInference(file_path=\"your-dataset-file-path\", \n                        main_column=\"your-main-feature\", \n                        prompt=prompt, \n                        api_key=api_key,\n                        model_name=model_name, \n                        only_response=True).run()\nprint(results)\n```\n\n### The Parameters\nHere are the parameters that are not optional for initializing the FastInference object.\n* **file_path** (string): path to your dataset (csv, xlsx, json, parquet)\n* **main_column** (string): name of the main column (explained below in detail)\n* **prompt** (string): the prompt with the variable in it (explained below in detail)\n* **api_key** (string): your API key\n* **model_name** (string): has the format provider/model_name (for example \"huggingface/meta-llama/Meta-Llama-3-70B\")\n* **only_response** (bool): if True, you get a list containing the response of the LLM otherwise you get the full object normalized following the OpenAI API\n\n\n\n### The Prompt\nOne of the parameter of the FastInference library is a prompt.The prompt must be in a string format.\nIt contains between curly brackets the column's name from your dataset where you want the variable to be in the prompt.\n\n#### Example Usage\nTo understand how to use the `prompt` parameter in the FastInference library, we'll provide an example based on a tweet sentiment classification task. Consider a dataset with the following structure:\n\n| tweet_content                                           | related_entities |\n|---------------------------------------------------------|------------------|\n| \"Just had the best day ever at the NeurIPS Conference!\" | \"NeurIPS\"        |\n| \"Traffic was terrible this morning in Paris.\"           | \"Paris\"          |\n| \"Looking forward to the new Star Wars movie!\"           | \"Star Wars\"      |\n\nOne of the parameters of the FastInference library is a `prompt`. This must be formatted as a string. It contains, within curly brackets, the names of the columns from your dataset that you want to include in the prompt.\n\nHere's how you could set up your prompt for classifying the sentiment of tweets based on their content and related entities:\n\n```python\nprompt = \"\"\"\n          You will be provided with a tweet, and your task is to classify its sentiment as positive, neutral, or negative.\n          You must consider the related identified entities in order to make a good decision.\n          \n          Tweet: {tweet_content}\n          Related Entities: {related_entities}\n          \"\"\"\n```\n\n### The main_column Parameter\nThe parameter main_column is the parameter that is considered as the most important information for inference.\nIt is a string containing the name of the most important column in your data.\nIt does not influence the LLM in inference since the prompt does not create hierarchical relationships between data.\n\n**The main column has no influence on LLM inference.**\n\n### Output format\n\nIf only_response is True, it gives back a list with items created by the library, and these items are strings.\n\nHere is the structure of the return data if *only_response=True*:\n```python\n[\"response 1\", \"response 2\", ..., \"response n\"]\n```\n\nBut if only_response is False, it gives back a list of Datablock items. Each Datablock item has these parts: content (str), metadata (dict), content_with_prompt (Prompt object), and response (ModelResponse, which is part of the OpenAI API). You can easily get the words generated by the language model by picking from the \"choices\" attribute.\n\nHere is the structure of the return data if *only_response=False*:\n```python\n[\n        Datablock(content: str, content_with_prompt: PromptTemplate, metadata: dict, response: ModelResponse),\n        ...\n        Datablock(content: str, content_with_prompt: PromptTemplate, metadata: dict, response: ModelResponse)\n]\n```\n\nThe only_response=False is by default and advised. The Datablock item keeps track of data correctly after the distribution steps. It makes sure the data stays reliable and concistent throughout the process.\n\n\n### Supported Providers ([Docs](https://docs.litellm.ai/docs/providers))\n\nThe FastInference is based on the open-source [LiteLLM](https://github.com/BerriAI/litellm/blob/main/README.md) library. All the supported LLMs by LiteLLM are also by FastInference.\n\n| Provider                                                                            | [Completion](https://docs.litellm.ai/docs/#basic-usage) |\n| ----------------------------------------------------------------------------------- | ------------------------------------------------------- |\n| [openai](https://docs.litellm.ai/docs/providers/openai)                             | \u2705                                                      |\n| [azure](https://docs.litellm.ai/docs/providers/azure)                               | \u2705                                                      |\n| [aws - sagemaker](https://docs.litellm.ai/docs/providers/aws_sagemaker)             | \u2705                                                      |\n| [aws - bedrock](https://docs.litellm.ai/docs/providers/bedrock)                     | \u2705                                                      |\n| [google - vertex_ai [Gemini]](https://docs.litellm.ai/docs/providers/vertex)        | \u2705                                                      |\n| [google - palm](https://docs.litellm.ai/docs/providers/palm)                        | \u2705                                                      |\n| [google AI Studio - gemini](https://docs.litellm.ai/docs/providers/gemini)          | \u2705                                                      |\n| [mistral ai api](https://docs.litellm.ai/docs/providers/mistral)                    | \u2705                                                      |\n| [cloudflare AI Workers](https://docs.litellm.ai/docs/providers/cloudflare_workers)  | \u2705                                                      |\n| [cohere](https://docs.litellm.ai/docs/providers/cohere)                             | \u2705                                                      |\n| [anthropic](https://docs.litellm.ai/docs/providers/anthropic)                       | \u2705                                                      |\n| [huggingface](https://docs.litellm.ai/docs/providers/huggingface)                   | \u2705                                                      |\n| [replicate](https://docs.litellm.ai/docs/providers/replicate)                       | \u2705                                                      |\n| [together_ai](https://docs.litellm.ai/docs/providers/togetherai)                    | \u2705                                                      |\n| [openrouter](https://docs.litellm.ai/docs/providers/openrouter)                     | \u2705                                                      |\n| [ai21](https://docs.litellm.ai/docs/providers/ai21)                                 | \u2705                                                      |\n| [baseten](https://docs.litellm.ai/docs/providers/baseten)                           | \u2705                                                      |\n| [vllm](https://docs.litellm.ai/docs/providers/vllm)                                 | \u2705                                                      |\n| [nlp_cloud](https://docs.litellm.ai/docs/providers/nlp_cloud)                       | \u2705                                                      |\n| [aleph alpha](https://docs.litellm.ai/docs/providers/aleph_alpha)                   | \u2705                                                      |\n| [petals](https://docs.litellm.ai/docs/providers/petals)                             | \u2705                                                      |\n| [ollama](https://docs.litellm.ai/docs/providers/ollama)                             | \u2705                                                      |\n| [deepinfra](https://docs.litellm.ai/docs/providers/deepinfra)                       | \u2705                                                      |\n| [perplexity-ai](https://docs.litellm.ai/docs/providers/perplexity)                  | \u2705                                                      |\n| [Groq AI](https://docs.litellm.ai/docs/providers/groq)                              | \u2705                                                      |\n| [anyscale](https://docs.litellm.ai/docs/providers/anyscale)                         | \u2705                                                      |\n| [IBM - watsonx.ai](https://docs.litellm.ai/docs/providers/watsonx)                  | \u2705                                                      |\n| [voyage ai](https://docs.litellm.ai/docs/providers/voyage)                          |                                                         |\n| [xinference [Xorbits Inference]](https://docs.litellm.ai/docs/providers/xinference) |                                                         |\n\n## Contributing\n\nTo contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.\n\nHere's how to modify the repo locally: Clone the repo\n\n```bash\ngit clone https://github.com/blefo/FastInference.git\n```\n\nMake your changes then Submit a PR! \ud83d\ude80\npush your fork to your GitHub repo and submit a PR from there\n\n- Add new method for data loading\n- Make the API KEY and model's information directly loaded in the os variables\n- Optimize the DataBlock Structure\n- Leverage the LiteLLM's feature for rotating APIs and keys in order to avoid the exceptions \n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Seamlessly integrate with top LLM APIs for speedy, robust, and scalable querying. Ideal for developers needing quick, reliable AI-powered responses.",
    "version": "0.0.5",
    "project_urls": {
        "Homepage": "https://github.com/blefo/FastInference"
    },
    "split_keywords": [
        "api",
        " fast",
        " inference",
        " distributed",
        " llm"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "72829643630ad369992968bf933c9c617150d2aac6086be4bc1b7a61356b3889",
                "md5": "f5f8a92d1edcd00a7ddcd09f9b0947f1",
                "sha256": "602527bfb8bc9636a98c5d0bbaf868b29653c65afc51a19b3f38b139e7addc82"
            },
            "downloads": -1,
            "filename": "fastinference_llm-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f5f8a92d1edcd00a7ddcd09f9b0947f1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 11345,
            "upload_time": "2024-05-16T12:49:30",
            "upload_time_iso_8601": "2024-05-16T12:49:30.724373Z",
            "url": "https://files.pythonhosted.org/packages/72/82/9643630ad369992968bf933c9c617150d2aac6086be4bc1b7a61356b3889/fastinference_llm-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fe0a108004bf466e73884c5ab1f589407e3553e00b61935d72ad066c9cd3f808",
                "md5": "97cb8df6b97d3bb80128dcd5d0472b3a",
                "sha256": "baa67c2d74904d31576f702d029b567897ed50936c9419127d640562b989187d"
            },
            "downloads": -1,
            "filename": "fastinference_llm-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "97cb8df6b97d3bb80128dcd5d0472b3a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 11680,
            "upload_time": "2024-05-16T12:49:31",
            "upload_time_iso_8601": "2024-05-16T12:49:31.654247Z",
            "url": "https://files.pythonhosted.org/packages/fe/0a/108004bf466e73884c5ab1f589407e3553e00b61935d72ad066c9cd3f808/fastinference_llm-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-16 12:49:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "blefo",
    "github_project": "FastInference",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "fastinference-llm"
}
        
Elapsed time: 1.37444s