ragtime


Nameragtime JSON
Version 0.0.43 PyPI version JSON
download
home_pageNone
SummaryRagtime 🎹 is an LLMOps framework to automatically evaluate Retrieval Augmented Generation (RAG) systems and compare different RAGs / LLMs
upload_time2024-06-10 15:20:30
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT License Copyright (c) 2024 reciTAL Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords evaluation llm rag
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <img src="img/Ragtime_logo.png" alt="Ragtime 🎹 LLM Ops for all">

# Presentation
**Ragtime** 🎹 is an LLMOps framework which allows you to automatically:
1. evaluate a Retrieval Augmented Generation (RAG) system
2. compare different RAGs / LLMs
3. generate Facts to allow automatic evaluation

Ragtime 🎹 allows you to evaluate **long answers** not only multiple choice questions or counting common words between an answer and a baseline. It is then required to evaluate summarizers, 

In Ragtime 🎹, a *RAG* is made of, optionally, a *Retriever*, and always, one or several *Large Language Model* (*LLM*).
- A *Retriever* takes a *question* in input and returns one or several *chunks* or *paragraphs* retrieved from a documents knowledge base
- A *LLM* is a text to text generator taking in input a *prompt*, made of a question and optional chunks, and returning an *LLMAnswer*

You can specify how *prompts* are generated and how the *LLMAnswer* has to be post-processed to return an *answer*.

# Contributing
Glad you wish to contribute! More details [here](CONTRIBUTING.md).

# How does it work?
The main idea in Ragtime 🎹 is to evaluate answers returned by a RAG based on **Facts** that you define. Indeed, it is very difficult to evaluate RAGs and/or LLMs because you cannot define a "good" answer. A LLM can return many equivalent answers expressed in different ways, making impossible a simple string comparison to determine whether an answer is right or wrong. Even though many proxies have been created, counting the number of common words like in ROUGE for instance is not very precise (see [HuggingFace's `lighteval`](https://github.com/huggingface/lighteval?tab=readme-ov-file#metrics-for-generative-tasks))

In Ragtime 🎹, answers returned by a RAG or a LLM are evaluated against a set of facts. If the answer validates all the facts, then the answer is deemed correct. Conversely, if some facts are not validated, the answer is considered wrong. The number of validated facts compared to the total number of facts to validate defines a score.

You can either define facts manually, or have a LLM define them for you. **The evaluation of facts against answers is done automatically with another LLM**.

# Main objects
The main objects used in Ragtime 🎹 are:
- `AnswerGenerator`: generate `Answer`s with 1 or several `LLM`s. Each `LLM` uses a `Prompter` to get a prompt to be fed with and to post-process the `LLMAnswer` returned by the `LLM`
- `FactGenerator`: generate `Facts` from the answers with human validation equals to 1. `FactGenerator` also uses an `LLM` to generate the facts
- `EvalGenerator`: generate `Eval`s based on `Answer`s and `Facts`. Also uses a `LLM` to perform the evaluations.
- `LLM`: generates text and return `LLMAnswer` objects
- `LLMAnswer`: answer returned by an LLM. Contains a `text` field, returned by the LLM, plus a `cost`, a `duration`, a `timestamp` and a `prompt` field, being the prompt used to generate the answer
- `Prompter`: a prompter is used to generate a prompt for an LLM and to post-process the text returned by the LLM
- `Expe`: an experiment object, containing a list of `QA` objects
- `QA`: an element an `Expe`. Contains a `Question` and, optionally, `Facts`, `Chunks` and `Answers`.
- `Question`: contains a `text` field for the question's text. Can also contain a `meta` dictionary
- `Facts`: a list of `Fact`, with a `text` field being the fact in itself and an `LLMAnswer` object if the fact has been generated by an LLM
- `Chunks`: a list of `Chunk` containing the `text` of the chunk and optionally a `meta` dictionary with extra data associated with the retriever
- `Answers`: the answer to the question is in the `text` field plus an `LLMAnswer` containing all the data related to the answer generation, plus an `Eval` object related to the evaluation of the answer
- `Eval`: contains a `human` field to store human evaluation of the answer as well as a `auto` field when the evaluation is done automatically. In this case, it also contains an `LLMAnswer` object related to the automatic evaluation

Almost every object in Ragtime 🎹 has a `meta` field, which is a dictionnary where you can store all the extra data you need for your specific use case.

# Basic sequence
When calling a generator, the following sequence unfolds (below is en example with an AnsGenerator, a AnsPrompterBase, a MyRetriever and 2 llms instanciated as LiteLLms from their name, but it would work simlarly with any other TextGenerator, Prompter and LLM child):
```python
main.py: ans_gen = AnsGenerator(prompter=AnsPrompterBase(), retriever=MyRetriever(), llms=["gpt4", "mistral-large"])
main.py: AnsGenerator.generate(expe)
-> TextGenerator.generate: async call _generate_for_qa(qa) for each qa in expe
--> TextGenerator._generate_for_qa: AnsGenerator.gen_for_qa(qa)
---> AnsGenerator.gen_for_qa: llm.generate for each llm in AnsGenerator
----> llm.generate: prompter.get_prompt
----> llm.generate: llm.complete
----> llm.generate: prompter.post_process
```

# Examples
You can now go to [ragtime-projects](https://github.com/recitalAI/ragtime-projects) to see examples of Ragtime 🎹 in action!

# Troubleshooting
## Setting the API keys on Windows
API keys are stored in environment variables locally on your computer. If you are using Windows, you should first set the API keys values in the shell as:
```shell
setx OPENAI_API_KEY sk-....
```
The list of environment variable names to set, depending on the APIs you need to access, is given in the [LiteLLM documentation](https://litellm.vercel.app/docs/providers).

Once the keys are set, just call `ragtime.config.init_win_env` with the list of environment variables to make accessible to Python, for instance `init_API_keys(['OPENAI_API_KEY'])`.

## Using Google LLMs
Execute what's indicated in the [LiteLLM documentation](https://litellm.vercel.app/docs/providers/vertex#gemini-pro).
Also make sure your project has `Vertex AI` API enabled.
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ragtime",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "reciTAL <ragtime@recital.ai>",
    "keywords": "Evaluation, LLM, RAG",
    "author": null,
    "author_email": "Gilles Moyse <gilles@recital.ai>",
    "download_url": "https://files.pythonhosted.org/packages/30/1c/3b676e9979c35c790b6eb99114fd2912e781afc8aabe0c8f60cb1aebbf48/ragtime-0.0.43.tar.gz",
    "platform": null,
    "description": "<img src=\"img/Ragtime_logo.png\" alt=\"Ragtime \ud83c\udfb9 LLM Ops for all\">\n\n# Presentation\n**Ragtime** \ud83c\udfb9 is an LLMOps framework which allows you to automatically:\n1. evaluate a Retrieval Augmented Generation (RAG) system\n2. compare different RAGs / LLMs\n3. generate Facts to allow automatic evaluation\n\nRagtime \ud83c\udfb9 allows you to evaluate **long answers** not only multiple choice questions or counting common words between an answer and a baseline. It is then required to evaluate summarizers, \n\nIn Ragtime \ud83c\udfb9, a *RAG* is made of, optionally, a *Retriever*, and always, one or several *Large Language Model* (*LLM*).\n- A *Retriever* takes a *question* in input and returns one or several *chunks* or *paragraphs* retrieved from a documents knowledge base\n- A *LLM* is a text to text generator taking in input a *prompt*, made of a question and optional chunks, and returning an *LLMAnswer*\n\nYou can specify how *prompts* are generated and how the *LLMAnswer* has to be post-processed to return an *answer*.\n\n# Contributing\nGlad you wish to contribute! More details [here](CONTRIBUTING.md).\n\n# How does it work?\nThe main idea in Ragtime \ud83c\udfb9 is to evaluate answers returned by a RAG based on **Facts** that you define. Indeed, it is very difficult to evaluate RAGs and/or LLMs because you cannot define a \"good\" answer. A LLM can return many equivalent answers expressed in different ways, making impossible a simple string comparison to determine whether an answer is right or wrong. Even though many proxies have been created, counting the number of common words like in ROUGE for instance is not very precise (see [HuggingFace's `lighteval`](https://github.com/huggingface/lighteval?tab=readme-ov-file#metrics-for-generative-tasks))\n\nIn Ragtime \ud83c\udfb9, answers returned by a RAG or a LLM are evaluated against a set of facts. If the answer validates all the facts, then the answer is deemed correct. Conversely, if some facts are not validated, the answer is considered wrong. The number of validated facts compared to the total number of facts to validate defines a score.\n\nYou can either define facts manually, or have a LLM define them for you. **The evaluation of facts against answers is done automatically with another LLM**.\n\n# Main objects\nThe main objects used in Ragtime \ud83c\udfb9 are:\n- `AnswerGenerator`: generate `Answer`s with 1 or several `LLM`s. Each `LLM` uses a `Prompter` to get a prompt to be fed with and to post-process the `LLMAnswer` returned by the `LLM`\n- `FactGenerator`: generate `Facts` from the answers with human validation equals to 1. `FactGenerator` also uses an `LLM` to generate the facts\n- `EvalGenerator`: generate `Eval`s based on `Answer`s and `Facts`. Also uses a `LLM` to perform the evaluations.\n- `LLM`: generates text and return `LLMAnswer` objects\n- `LLMAnswer`: answer returned by an LLM. Contains a `text` field, returned by the LLM, plus a `cost`, a `duration`, a `timestamp` and a `prompt` field, being the prompt used to generate the answer\n- `Prompter`: a prompter is used to generate a prompt for an LLM and to post-process the text returned by the LLM\n- `Expe`: an experiment object, containing a list of `QA` objects\n- `QA`: an element an `Expe`. Contains a `Question` and, optionally, `Facts`, `Chunks` and `Answers`.\n- `Question`: contains a `text` field for the question's text. Can also contain a `meta` dictionary\n- `Facts`: a list of `Fact`, with a `text` field being the fact in itself and an `LLMAnswer` object if the fact has been generated by an LLM\n- `Chunks`: a list of `Chunk` containing the `text` of the chunk and optionally a `meta` dictionary with extra data associated with the retriever\n- `Answers`: the answer to the question is in the `text` field plus an `LLMAnswer` containing all the data related to the answer generation, plus an `Eval` object related to the evaluation of the answer\n- `Eval`: contains a `human` field to store human evaluation of the answer as well as a `auto` field when the evaluation is done automatically. In this case, it also contains an `LLMAnswer` object related to the automatic evaluation\n\nAlmost every object in Ragtime \ud83c\udfb9 has a `meta` field, which is a dictionnary where you can store all the extra data you need for your specific use case.\n\n# Basic sequence\nWhen calling a generator, the following sequence unfolds (below is en example with an AnsGenerator, a AnsPrompterBase, a MyRetriever and 2 llms instanciated as LiteLLms from their name, but it would work simlarly with any other TextGenerator, Prompter and LLM child):\n```python\nmain.py: ans_gen = AnsGenerator(prompter=AnsPrompterBase(), retriever=MyRetriever(), llms=[\"gpt4\", \"mistral-large\"])\nmain.py: AnsGenerator.generate(expe)\n-> TextGenerator.generate: async call _generate_for_qa(qa) for each qa in expe\n--> TextGenerator._generate_for_qa: AnsGenerator.gen_for_qa(qa)\n---> AnsGenerator.gen_for_qa: llm.generate for each llm in AnsGenerator\n----> llm.generate: prompter.get_prompt\n----> llm.generate: llm.complete\n----> llm.generate: prompter.post_process\n```\n\n# Examples\nYou can now go to [ragtime-projects](https://github.com/recitalAI/ragtime-projects) to see examples of Ragtime \ud83c\udfb9 in action!\n\n# Troubleshooting\n## Setting the API keys on Windows\nAPI keys are stored in environment variables locally on your computer. If you are using Windows, you should first set the API keys values in the shell as:\n```shell\nsetx OPENAI_API_KEY sk-....\n```\nThe list of environment variable names to set, depending on the APIs you need to access, is given in the [LiteLLM documentation](https://litellm.vercel.app/docs/providers).\n\nOnce the keys are set, just call `ragtime.config.init_win_env` with the list of environment variables to make accessible to Python, for instance `init_API_keys(['OPENAI_API_KEY'])`.\n\n## Using Google LLMs\nExecute what's indicated in the [LiteLLM documentation](https://litellm.vercel.app/docs/providers/vertex#gemini-pro).\nAlso make sure your project has `Vertex AI` API enabled.",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 reciTAL  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "Ragtime \ud83c\udfb9 is an LLMOps framework to automatically evaluate Retrieval Augmented Generation (RAG) systems and compare different RAGs / LLMs",
    "version": "0.0.43",
    "project_urls": {
        "Homepage": "https://github.com/recitalAI/ragtime-package",
        "Issues": "https://github.com/recitalAI/ragtime-package/issues"
    },
    "split_keywords": [
        "evaluation",
        " llm",
        " rag"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ede03d3d966701e6c918f6a2216c7c2c0f0f9a49521a5fb1e0ed7f3b609b36d8",
                "md5": "420aa3ad97b57900455f16f30b554001",
                "sha256": "218066df329a9443f122f5a4e4ae45e2ca98ef1459dbc45372dc80c3fcbd5a23"
            },
            "downloads": -1,
            "filename": "ragtime-0.0.43-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "420aa3ad97b57900455f16f30b554001",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 241263,
            "upload_time": "2024-06-10T15:20:28",
            "upload_time_iso_8601": "2024-06-10T15:20:28.474861Z",
            "url": "https://files.pythonhosted.org/packages/ed/e0/3d3d966701e6c918f6a2216c7c2c0f0f9a49521a5fb1e0ed7f3b609b36d8/ragtime-0.0.43-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "301c3b676e9979c35c790b6eb99114fd2912e781afc8aabe0c8f60cb1aebbf48",
                "md5": "b3bc2e02ae7f22257749ae4d69f44154",
                "sha256": "1fd91ccb1de450b05dcbcbfba8a773a397fd28fb78aad666a45649ffc266f495"
            },
            "downloads": -1,
            "filename": "ragtime-0.0.43.tar.gz",
            "has_sig": false,
            "md5_digest": "b3bc2e02ae7f22257749ae4d69f44154",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 242177,
            "upload_time": "2024-06-10T15:20:30",
            "upload_time_iso_8601": "2024-06-10T15:20:30.834087Z",
            "url": "https://files.pythonhosted.org/packages/30/1c/3b676e9979c35c790b6eb99114fd2912e781afc8aabe0c8f60cb1aebbf48/ragtime-0.0.43.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-10 15:20:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "recitalAI",
    "github_project": "ragtime-package",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ragtime"
}
        
Elapsed time: 0.78323s