ovos-flashrank-reranker-plugin

Name	ovos-flashrank-reranker-plugin JSON
Version	0.0.0 JSON
	download
home_page	https://github.com/TigreGotico/ovos-flashrank-reranker-plugin
Summary	A question solver plugin for OVOS
upload_time	2024-10-25 22:16:22
maintainer	None
docs_url	None
author	jarbasai
requires_python	None
license	MIT
keywords	ovos openvoiceos plugin utterance fallback query
VCS
bugtrack_url
requirements	ovos-plugin-manager flashrank
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # FlashRankMultipleChoiceSolver for OVOS

The `FlashRankMultipleChoiceSolver` plugin is designed for the Open Voice OS (OVOS) platform to help select the best
answer to a question from a list of options. This plugin utilizes the FlashRank library to evaluate and rank
multiple-choice answers based on their relevance to the given query.

## Features

- **Rerank Options**: Reranks a list of options based on their relevance to the query.
- **Customizable Model**: Allows the use of different ranking models.
- **Seamless Integration**: Designed to work with OVOS plugin manager.

ReRanking is a technique used to refine a list of potential answers by evaluating their relevance to a given query.
This process is crucial in scenarios where multiple options or responses need to be assessed to determine the most
appropriate one.

In retrieval chatbots, ReRanking helps in selecting the best answer from a set of retrieved documents or options,
enhancing the accuracy of the response provided to the user.

## Configuration

`MultipleChoiceSolver` are integrated into the OVOS Common Query framework, where they are used to select the most
relevant answer from a set of multiple skill responses.

```json
"common_query": {
  "reranker": "ovos-flashrank-reranker-plugin",
  "ignore_skill_scores": true,
  "ovos-flashrank-reranker-plugin": {"model": "ms-marco-TinyBERT-L-2-v2"}
}
```

> NOTE: enabling this on a raspberry pi will introduce up to 1 second of extra latency in common query pipeline

### Available Models

Below is the list of models supported as of now, by default `ms-marco-MultiBERT-L-12` is used due to being multilingual:

| Model Name                                       | Description                                                                                                                                                                                                                                                                                                                                                                    |
|--------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `ms-marco-TinyBERT-L-2-v2`             | [Model card](https://huggingface.co/cross-encoder/ms-marco-TinyBERT-L-2) Trained on the MS Marco Passage Ranking task. This model encodes queries and ranks passages retrieved from large-scale datasets like MS MARCO, focusing on machine reading comprehension and passage ranking.                                                                                         |
| `ms-marco-MiniLM-L-12-v2`                        | [Model card](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2) Trained on MS MARCO Passage Ranking, it performs well for Information Retrieval tasks, encoding queries and sorting passages. It offers high performance with a lower documents per second rate compared to other versions.                                                                         |
| `ms-marco-MultiBERT-L-12` (default)                        | Multi-lingual, [supports 100+ languages](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages)                                                                                                                                                                                                                                                |
| `ce-esci-MiniLM-L12-v2`                          | [FT on Amazon ESCI dataset](https://github.com/amazon-science/esci-data) Fine-tuned on the Amazon ESCI dataset, which includes queries in English, Japanese, and Spanish. Designed for semantic search and ranking, this model maps sentences and paragraphs to a 384-dimensional vector space, useful for tasks like clustering and product search in a multilingual context. |
| `rank-T5-flan` | [Model card](https://huggingface.co/bergum/rank-T5-flan) Best non cross-encoder reranker                                                                                                                                                                                                                                                                                                                       |
| `rank_zephyr_7b_v1_full` (4-bit-quantised GGUF)  | A 7B parameter GPT-like model fine-tuned on task-specific listwise reranking data. It is the state-of-the-art open-source reranking model for several datasets                                                                                                                                                                                                                 |

## Standalone Usage

#### FlashRankMultipleChoiceSolver

FlashRankMultipleChoiceSolver is designed to select the best answer to a question from a list of options.

In the context of retrieval chatbots, FlashRankMultipleChoiceSolver is useful for scenarios where a user query results
in a list of predefined answers or options.
The solver ranks these options based on their relevance to the query and selects the most suitable one.

```python
from ovos_flashrank_solver import FlashRankMultipleChoiceSolver

solver = FlashRankMultipleChoiceSolver()
a = solver.rerank("what is the speed of light", [
    "very fast", "10m/s", "the speed of light is C"
])
print(a)
# 2024-07-22 15:03:10.295 - OVOS - __main__:load_corpus:61 - DEBUG - indexed 3 documents
# 2024-07-22 15:03:10.297 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 1 (score: 0.7198746800422668): the speed of light is C
# 2024-07-22 15:03:10.297 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 2 (score: 0.0): 10m/s
# 2024-07-22 15:03:10.297 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 3 (score: 0.0): very fast
# [(0.7198747, 'the speed of light is C'), (0.0, '10m/s'), (0.0, 'very fast')]

# NOTE: select_answer is part of the MultipleChoiceSolver base class and uses rerank internally
a = solver.select_answer("what is the speed of light", [
    "very fast", "10m/s", "the speed of light is C"
])
print(a)  # the speed of light is C
```

#### FlashRankEvidenceSolverPlugin

FlashRankEvidenceSolverPlugin is designed to extract the most relevant sentence from a text passage that answers a given
question. This plugin uses the FlashRank algorithm to evaluate and rank sentences based on their relevance to the query.

In text extraction and machine comprehension tasks, FlashRankEvidenceSolverPlugin enables the identification of specific
sentences within a larger body of text that directly address a user's query.

For example, in a scenario where a user queries about the number of rovers exploring Mars, FlashRankEvidenceSolverPlugin
scans the provided text passage, ranks sentences based on their relevance, and extracts the most informative sentence.

```python
from ovos_flashrank_solver import FlashRankEvidenceSolverPlugin

config = {
    "lang": "en-us",
    "min_conf": 0.4,
    "n_answer": 1
}
solver = FlashRankEvidenceSolverPlugin(config)

text = """Mars is the fourth planet from the Sun. It is a dusty, cold, desert world with a very thin atmosphere. 
Mars is also a dynamic planet with seasons, polar ice caps, canyons, extinct volcanoes, and evidence that it was even more active in the past.
Mars is one of the most explored bodies in our solar system, and it's the only planet where we've sent rovers to roam the alien landscape. 
NASA currently has two rovers (Curiosity and Perseverance), one lander (InSight), and one helicopter (Ingenuity) exploring the surface of Mars.
"""
query = "how many rovers are currently exploring Mars"
answer = solver.get_best_passage(evidence=text, question=query)
print("Query:", query)
print("Answer:", answer)
# 2024-07-22 15:05:14.209 - OVOS - __main__:load_corpus:61 - DEBUG - indexed 5 documents
# 2024-07-22 15:05:14.209 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 1 (score: 1.39238703250885): NASA currently has two rovers (Curiosity and Perseverance), one lander (InSight), and one helicopter (Ingenuity) exploring the surface of Mars.
# 2024-07-22 15:05:14.210 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 2 (score: 0.38667747378349304): Mars is one of the most explored bodies in our solar system, and it's the only planet where we've sent rovers to roam the alien landscape.
# 2024-07-22 15:05:14.210 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 3 (score: 0.15732118487358093): Mars is the fourth planet from the Sun.
# 2024-07-22 15:05:14.210 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 4 (score: 0.10177625715732574): Mars is also a dynamic planet with seasons, polar ice caps, canyons, extinct volcanoes, and evidence that it was even more active in the past.
# 2024-07-22 15:05:14.210 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 5 (score: 0.0): It is a dusty, cold, desert world with a very thin atmosphere.
# Query: how many rovers are currently exploring Mars
# Answer: NASA currently has two rovers (Curiosity and Perseverance), one lander (InSight), and one helicopter (Ingenuity) exploring the surface of Mars.

```

In this example, `FlashRankEvidenceSolverPlugin` effectively identifies and retrieves the most relevant sentence from
the provided text that answers the query about the number of rovers exploring Mars.
This capability is essential for applications requiring information extraction from extensive textual content, such as
automated research assistants or content summarizers.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/TigreGotico/ovos-flashrank-reranker-plugin",
    "name": "ovos-flashrank-reranker-plugin",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "OVOS openvoiceos plugin utterance fallback query",
    "author": "jarbasai",
    "author_email": "jarbasai@mailfence.com",
    "download_url": "https://files.pythonhosted.org/packages/ae/02/fb97a63d06ca34777649d292a327a760c94b493af4f348ba6f68dbeb2d6b/ovos-flashrank-reranker-plugin-0.0.0.tar.gz",
    "platform": null,
    "description": "# FlashRankMultipleChoiceSolver for OVOS\n\nThe `FlashRankMultipleChoiceSolver` plugin is designed for the Open Voice OS (OVOS) platform to help select the best\nanswer to a question from a list of options. This plugin utilizes the FlashRank library to evaluate and rank\nmultiple-choice answers based on their relevance to the given query.\n\n## Features\n\n- **Rerank Options**: Reranks a list of options based on their relevance to the query.\n- **Customizable Model**: Allows the use of different ranking models.\n- **Seamless Integration**: Designed to work with OVOS plugin manager.\n\nReRanking is a technique used to refine a list of potential answers by evaluating their relevance to a given query.\nThis process is crucial in scenarios where multiple options or responses need to be assessed to determine the most\nappropriate one.\n\nIn retrieval chatbots, ReRanking helps in selecting the best answer from a set of retrieved documents or options,\nenhancing the accuracy of the response provided to the user.\n\n## Configuration\n\n`MultipleChoiceSolver` are integrated into the OVOS Common Query framework, where they are used to select the most\nrelevant answer from a set of multiple skill responses.\n\n```json\n\"common_query\": {\n  \"reranker\": \"ovos-flashrank-reranker-plugin\",\n  \"ignore_skill_scores\": true,\n  \"ovos-flashrank-reranker-plugin\": {\"model\": \"ms-marco-TinyBERT-L-2-v2\"}\n}\n```\n\n> NOTE: enabling this on a raspberry pi will introduce up to 1 second of extra latency in common query pipeline\n\n### Available Models\n\nBelow is the list of models supported as of now, by default `ms-marco-MultiBERT-L-12` is used due to being multilingual:\n\n| Model Name                                       | Description                                                                                                                                                                                                                                                                                                                                                                    |\n|--------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `ms-marco-TinyBERT-L-2-v2`             | [Model card](https://huggingface.co/cross-encoder/ms-marco-TinyBERT-L-2) Trained on the MS Marco Passage Ranking task. This model encodes queries and ranks passages retrieved from large-scale datasets like MS MARCO, focusing on machine reading comprehension and passage ranking.                                                                                         |\n| `ms-marco-MiniLM-L-12-v2`                        | [Model card](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2) Trained on MS MARCO Passage Ranking, it performs well for Information Retrieval tasks, encoding queries and sorting passages. It offers high performance with a lower documents per second rate compared to other versions.                                                                         |\n| `ms-marco-MultiBERT-L-12` (default)                        | Multi-lingual, [supports 100+ languages](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages)                                                                                                                                                                                                                                                |\n| `ce-esci-MiniLM-L12-v2`                          | [FT on Amazon ESCI dataset](https://github.com/amazon-science/esci-data) Fine-tuned on the Amazon ESCI dataset, which includes queries in English, Japanese, and Spanish. Designed for semantic search and ranking, this model maps sentences and paragraphs to a 384-dimensional vector space, useful for tasks like clustering and product search in a multilingual context. |\n| `rank-T5-flan` | [Model card](https://huggingface.co/bergum/rank-T5-flan) Best non cross-encoder reranker                                                                                                                                                                                                                                                                                                                       |\n| `rank_zephyr_7b_v1_full` (4-bit-quantised GGUF)  | A 7B parameter GPT-like model fine-tuned on task-specific listwise reranking data. It is the state-of-the-art open-source reranking model for several datasets                                                                                                                                                                                                                 |\n\n## Standalone Usage\n\n#### FlashRankMultipleChoiceSolver\n\nFlashRankMultipleChoiceSolver is designed to select the best answer to a question from a list of options.\n\nIn the context of retrieval chatbots, FlashRankMultipleChoiceSolver is useful for scenarios where a user query results\nin a list of predefined answers or options.\nThe solver ranks these options based on their relevance to the query and selects the most suitable one.\n\n```python\nfrom ovos_flashrank_solver import FlashRankMultipleChoiceSolver\n\nsolver = FlashRankMultipleChoiceSolver()\na = solver.rerank(\"what is the speed of light\", [\n    \"very fast\", \"10m/s\", \"the speed of light is C\"\n])\nprint(a)\n# 2024-07-22 15:03:10.295 - OVOS - __main__:load_corpus:61 - DEBUG - indexed 3 documents\n# 2024-07-22 15:03:10.297 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 1 (score: 0.7198746800422668): the speed of light is C\n# 2024-07-22 15:03:10.297 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 2 (score: 0.0): 10m/s\n# 2024-07-22 15:03:10.297 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 3 (score: 0.0): very fast\n# [(0.7198747, 'the speed of light is C'), (0.0, '10m/s'), (0.0, 'very fast')]\n\n# NOTE: select_answer is part of the MultipleChoiceSolver base class and uses rerank internally\na = solver.select_answer(\"what is the speed of light\", [\n    \"very fast\", \"10m/s\", \"the speed of light is C\"\n])\nprint(a)  # the speed of light is C\n```\n\n#### FlashRankEvidenceSolverPlugin\n\nFlashRankEvidenceSolverPlugin is designed to extract the most relevant sentence from a text passage that answers a given\nquestion. This plugin uses the FlashRank algorithm to evaluate and rank sentences based on their relevance to the query.\n\nIn text extraction and machine comprehension tasks, FlashRankEvidenceSolverPlugin enables the identification of specific\nsentences within a larger body of text that directly address a user's query.\n\nFor example, in a scenario where a user queries about the number of rovers exploring Mars, FlashRankEvidenceSolverPlugin\nscans the provided text passage, ranks sentences based on their relevance, and extracts the most informative sentence.\n\n```python\nfrom ovos_flashrank_solver import FlashRankEvidenceSolverPlugin\n\nconfig = {\n    \"lang\": \"en-us\",\n    \"min_conf\": 0.4,\n    \"n_answer\": 1\n}\nsolver = FlashRankEvidenceSolverPlugin(config)\n\ntext = \"\"\"Mars is the fourth planet from the Sun. It is a dusty, cold, desert world with a very thin atmosphere. \nMars is also a dynamic planet with seasons, polar ice caps, canyons, extinct volcanoes, and evidence that it was even more active in the past.\nMars is one of the most explored bodies in our solar system, and it's the only planet where we've sent rovers to roam the alien landscape. \nNASA currently has two rovers (Curiosity and Perseverance), one lander (InSight), and one helicopter (Ingenuity) exploring the surface of Mars.\n\"\"\"\nquery = \"how many rovers are currently exploring Mars\"\nanswer = solver.get_best_passage(evidence=text, question=query)\nprint(\"Query:\", query)\nprint(\"Answer:\", answer)\n# 2024-07-22 15:05:14.209 - OVOS - __main__:load_corpus:61 - DEBUG - indexed 5 documents\n# 2024-07-22 15:05:14.209 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 1 (score: 1.39238703250885): NASA currently has two rovers (Curiosity and Perseverance), one lander (InSight), and one helicopter (Ingenuity) exploring the surface of Mars.\n# 2024-07-22 15:05:14.210 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 2 (score: 0.38667747378349304): Mars is one of the most explored bodies in our solar system, and it's the only planet where we've sent rovers to roam the alien landscape.\n# 2024-07-22 15:05:14.210 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 3 (score: 0.15732118487358093): Mars is the fourth planet from the Sun.\n# 2024-07-22 15:05:14.210 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 4 (score: 0.10177625715732574): Mars is also a dynamic planet with seasons, polar ice caps, canyons, extinct volcanoes, and evidence that it was even more active in the past.\n# 2024-07-22 15:05:14.210 - OVOS - __main__:retrieve_from_corpus:70 - DEBUG - Rank 5 (score: 0.0): It is a dusty, cold, desert world with a very thin atmosphere.\n# Query: how many rovers are currently exploring Mars\n# Answer: NASA currently has two rovers (Curiosity and Perseverance), one lander (InSight), and one helicopter (Ingenuity) exploring the surface of Mars.\n\n```\n\nIn this example, `FlashRankEvidenceSolverPlugin` effectively identifies and retrieves the most relevant sentence from\nthe provided text that answers the query about the number of rovers exploring Mars.\nThis capability is essential for applications requiring information extraction from extensive textual content, such as\nautomated research assistants or content summarizers.\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A question solver plugin for OVOS",
    "version": "0.0.0",
    "project_urls": {
        "Homepage": "https://github.com/TigreGotico/ovos-flashrank-reranker-plugin"
    },
    "split_keywords": [
        "ovos",
        "openvoiceos",
        "plugin",
        "utterance",
        "fallback",
        "query"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "38c17e9c3b597895bbb0a22cf947f8ba1cdd83e3c60b81f095312c491cae5b0c",
                "md5": "d5c44bdddb144b1a0ccc70d9ab298f7d",
                "sha256": "e7ce3ff5a2f9b85026c42d7d059af4d3810f474e9c8145e19759e0a11a3255d8"
            },
            "downloads": -1,
            "filename": "ovos_flashrank_reranker_plugin-0.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d5c44bdddb144b1a0ccc70d9ab298f7d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 8538,
            "upload_time": "2024-10-25T22:16:20",
            "upload_time_iso_8601": "2024-10-25T22:16:20.826810Z",
            "url": "https://files.pythonhosted.org/packages/38/c1/7e9c3b597895bbb0a22cf947f8ba1cdd83e3c60b81f095312c491cae5b0c/ovos_flashrank_reranker_plugin-0.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae02fb97a63d06ca34777649d292a327a760c94b493af4f348ba6f68dbeb2d6b",
                "md5": "71830925ced342ea7023d95a5f1382e5",
                "sha256": "04cf48dd7a0f5f2af9168590f36d4bad0a7be98c1a32a76b980fdbbe21d77d1c"
            },
            "downloads": -1,
            "filename": "ovos-flashrank-reranker-plugin-0.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "71830925ced342ea7023d95a5f1382e5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 8209,
            "upload_time": "2024-10-25T22:16:22",
            "upload_time_iso_8601": "2024-10-25T22:16:22.356447Z",
            "url": "https://files.pythonhosted.org/packages/ae/02/fb97a63d06ca34777649d292a327a760c94b493af4f348ba6f68dbeb2d6b/ovos-flashrank-reranker-plugin-0.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-25 22:16:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TigreGotico",
    "github_project": "ovos-flashrank-reranker-plugin",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "ovos-plugin-manager",
            "specs": []
        },
        {
            "name": "flashrank",
            "specs": [
                [
                    ">=",
                    "0.2.8"
                ]
            ]
        }
    ],
    "lcname": "ovos-flashrank-reranker-plugin"
}

jarbasai