primeqa


Nameprimeqa JSON
Version 0.15.2 PyPI version JSON
download
home_pagehttps://github.com/primeqa/primeqa
SummaryState-of-the-art Question Answering
upload_time2023-06-27 19:37:49
maintainer
docs_urlNone
authorPrimeQA Team
requires_python>=3.7.0, <3.11.0
licenseApache
keywords question answering (qa) machine reading comprehension (mrc) information retrieval (ir)
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage
            <!---
Copyright 2022 IBM Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

<h3 align="center">
    <img width="350" alt="primeqa" src="docs/_static/img/PrimeQA.png">
    <p>The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development.</p>
</h3>

![Build Status](https://github.com/primeqa/primeqa/actions/workflows/primeqa-ci.yml/badge.svg)
[![LICENSE|Apache2.0](https://img.shields.io/github/license/saltstack/salt?color=blue)](https://www.apache.org/licenses/LICENSE-2.0.txt)
[![sphinx-doc-build](https://github.com/primeqa/primeqa/actions/workflows/sphinx-doc-build.yml/badge.svg)](https://github.com/primeqa/primeqa/actions/workflows/sphinx-doc-build.yml)   

PrimeQA is a public open source repository that enables researchers and developers to train state-of-the-art models for question answering (QA). By using PrimeQA, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. PrimeQA is built on top of the [Transformers](https://github.com/huggingface/transformers) toolkit and uses [datasets](https://huggingface.co/datasets/viewer/) and [models](https://huggingface.co/PrimeQA) that are directly downloadable.


The models within PrimeQA supports End-to-end Question Answering. PrimeQA answers questions via 
- [Information Retrieval](https://github.com/primeqa/primeqa/tree/main/primeqa/ir): Retrieving documents and passages using both traditional (e.g. BM25) and neural (e.g. ColBERT) models
- [Multilingual Machine Reading Comprehension](https://huggingface.co/ibm/tydiqa-primary-task-xlm-roberta-large): Extract and/ or generate answers given the source document or passage.
- [Multilingual Question Generation](https://huggingface.co/PrimeQA/mt5-base-tydi-question-generator): Supports generation of questions for effective domain adaptation over [tables](https://huggingface.co/PrimeQA/t5-base-table-question-generator) and [multilingual text](https://huggingface.co/PrimeQA/mt5-base-tydi-question-generator).
- [Retrieval Augmented Generation](https://github.com/primeqa/primeqa/blob/main/notebooks/retriever-reader-pipelines/prompt_reader_with_GPT.ipynb): Generate answers using the GPT-3/ChatGPT pretrained models, conditioned on retrieved passages. 

Some examples of models (applicable on benchmark datasets) supported are :
- [Traditional IR with BM25](https://github.com/primeqa/primeqa/tree/main/primeqa/ir/) Pyserini
- [Neural IR with ColBERT, DPR](https://github.com/primeqa/primeqa/tree/main/primeqa/ir) (collaboration with [Stanford NLP](https://nlp.stanford.edu/) IR led by [Chris Potts](https://web.stanford.edu/~cgpotts/) & [Matei Zaharia](https://cs.stanford.edu/~matei/)).
Replicating the experiments that [Dr. Decr](https://huggingface.co/ibm/DrDecr_XOR-TyDi_whitebox) (Li et. al, 2022) performed to reach the top of the XOR TyDI leaderboard.
- [Machine Reading Comprehension with XLM-R](https://github.com/primeqa/primeqa/tree/main/primeqa/mrc): to replicate the experiments to get to the top of the TyDI leaderboard similar to the performance of the IBM GAAMA system. Coming soon: code to replicate GAAMA's performance on Natural Questions. 

## ๐Ÿ… Top of the Leaderboard

PrimeQA is at the top of several leaderboards: XOR-TyDi, TyDiQA-main, OTT-QA and HybridQA.

### [XOR-TyDi](https://nlp.cs.washington.edu/xorqa/)
<img src="docs/_static/img/xor-tydi.png" width="50%">

### [TyDiQA-main](https://ai.google.com/research/tydiqa)
<img src="docs/_static/img/tydi-main.png" width="50%">

### [OTT-QA](https://codalab.lisn.upsaclay.fr/competitions/7967)
<img src="docs/_static/img/ott-qa.png" width="50%">

### [HybridQA](https://codalab.lisn.upsaclay.fr/competitions/7979)
<img src="docs/_static/img/hybridqa.png" width="50%">

## โœ”๏ธ Getting Started

### Installation
[Installation doc](https://primeqa.github.io/primeqa/installation.html)       

```shell
# cd to project root

# If you want to run on GPU make sure to install torch appropriately

# E.g. for torch 1.11 + CUDA 11.3:
pip install 'torch~=1.11.0' --extra-index-url https://download.pytorch.org/whl/cu113

# Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired
# Example installation commands:

# Minimal install (non-editable)
pip install .

# GPU support
pip install .[gpu]

# Full install (editable)
pip install -e .[all]
```

Please note that dependencies (specified in [setup.py](./setup.py)) are pinned to provide a stable experience.
When installing from source these can be modified, however this is not officially supported.

**Note:** in many environments, conda-forge based faiss libraries perform substantially better than the default ones installed with pip. To install faiss libraries from conda-forge, use the following steps:

- Create and activate a conda environment
- Install faiss libraries, using a command

```conda install -c conda-forge faiss=1.7.0 faiss-gpu=1.7.0```

- In `setup.py`, remove the faiss-related lines:

```commandline
"faiss-cpu~=1.7.2": ["install", "gpu"],
"faiss-gpu~=1.7.2": ["gpu"],
```

- Continue with the `pip install` commands as desctibed above.


### JAVA requirements
Java 11 is required for BM25 retrieval. Install java as follows:

```shell
conda install -c conda-forge openjdk=11
```
## :speech_balloon: Blog Posts
There're several blog posts by members of the open source community on how they've been using PrimeQA for their needs. Read some of them:
1. [PrimeQA and GPT 3](https://www.marktechpost.com/2023/03/03/with-just-20-lines-of-python-code-you-can-do-retrieval-augmented-gpt-based-qa-using-this-open-source-repository-called-primeqa/)
2. [Enterprise search with PrimeQA](https://heidloff.net/article/introduction-neural-information-retrieval/)
3. [A search engine for Trivia geeks](https://www.deleeuw.me.uk/posts/Using-PrimeQA-For-NLP-Question-Answering/)


## ๐Ÿงช Unit Tests
[Testing doc](https://primeqa.github.io/primeqa/testing.html)       

To run the unit tests you first need to [install PrimeQA](#Installation).
Make sure to install with the `[tests]` or `[all]` extras from pip.

From there you can run the tests via pytest, for example:
```shell
pytest --cov PrimeQA --cov-config .coveragerc tests/
```

For more information, see:
- Our [tox.ini](./tox.ini)
- The [pytest](https://docs.pytest.org) and [tox](https://tox.wiki/en/latest/) documentation    

## ๐Ÿ”ญ Learn more

| Section | Description |
|-|-|
| ๐Ÿ“’ [Documentation](https://primeqa.github.io/primeqa) | Full API documentation and tutorials |
| ๐Ÿ [Quick tour: Entry Points for PrimeQA](https://github.com/primeqa/primeqa/tree/main/primeqa) | Different entry points for PrimeQA: Information Retrieval, Reading Comprehension, TableQA and Question Generation |
| ๐Ÿ““ [Tutorials: Jupyter Notebooks](https://github.com/primeqa/primeqa/tree/main/notebooks) | Notebooks to get started on QA tasks |
| ๐Ÿ““ [GPT-3/ChatGPT Reader Notebooks](https://github.com/primeqa/primeqa/tree/main/notebooks/mrc/LLM_reader_predict_mode.ipynb) | Notebooks to get started with the GPT-3/ChatGPT reader components|
| ๐Ÿ’ป [Examples: Applying PrimeQA on various QA tasks](https://github.com/primeqa/primeqa/tree/main/examples) | Example scripts for fine-tuning PrimeQA models on a range of QA tasks |
| ๐Ÿค— [Model sharing and uploading](https://huggingface.co/docs/transformers/model_sharing) | Upload and share your fine-tuned models with the community |
| โœ… [Pull Request](https://primeqa.github.io/primeqa/pull_request_template.html) | PrimeQA Pull Request |
| ๐Ÿ“„ [Generate Documentation](https://primeqa.github.io/primeqa/README.html) | How Documentation works |        
| ๐Ÿ›  [Orchestrator Service REST Microservice](https://primeqa.github.io/primeqa/orchestrator.html) | Proof-of-concept code for PrimeQA Orchestrator microservice |        
| ๐Ÿ“– [Tooling UI](https://primeqa.github.io/primeqa/tooling_ui.html) | Demo UI |        

## โค๏ธ PrimeQA collaborators include       

| | | | |
|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|
|<img width="75" alt="stanford" src="docs/_static/img/collab-stanford-circle.png">| Stanford NLP |<img width="75" alt="i" src="docs/_static/img/collab-i-circle.png">| University of Illinois |
|<img width="75" alt="stuttgart" src="docs/_static/img/collab-stuttgart-circle.png">| University of Stuttgart | <img width="75" alt="notredame" src="docs/_static/img/collab-notredame-circle.png">| University of Notre Dame |
|<img width="75" alt="ohio" src="docs/_static/img/collab-ohio-circle.png">| Ohio State University |<img width="75" alt="carnegie" src="docs/_static/img/collab-carnegie-circle.png">| Carnegie Mellon University |
|<img width="75" alt="massachusetts" src="docs/_static/img/collab-massachusetts-circle.png">| University of Massachusetts |<img width="75" height="75" alt="ibm" src="docs/_static/img/collab-ibm-circle.png">| IBM Research |
| | | | |


<br>
<br>
<br>
<br>
<div align="center">
    <img width="30" alt="primeqa" src="docs/_static/primeqa_logo.png">
</div>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/primeqa/primeqa",
    "name": "primeqa",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7.0, <3.11.0",
    "maintainer_email": "",
    "keywords": "Question Answering (QA),Machine Reading Comprehension (MRC),Information Retrieval (IR)",
    "author": "PrimeQA Team",
    "author_email": "primeqa@us.ibm.com",
    "download_url": "",
    "platform": null,
    "description": "<!---\nCopyright 2022 IBM Corp.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n-->\n\n<h3 align=\"center\">\n    <img width=\"350\" alt=\"primeqa\" src=\"docs/_static/img/PrimeQA.png\">\n    <p>The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development.</p>\n</h3>\n\n![Build Status](https://github.com/primeqa/primeqa/actions/workflows/primeqa-ci.yml/badge.svg)\n[![LICENSE|Apache2.0](https://img.shields.io/github/license/saltstack/salt?color=blue)](https://www.apache.org/licenses/LICENSE-2.0.txt)\n[![sphinx-doc-build](https://github.com/primeqa/primeqa/actions/workflows/sphinx-doc-build.yml/badge.svg)](https://github.com/primeqa/primeqa/actions/workflows/sphinx-doc-build.yml)   \n\nPrimeQA is a public open source repository that enables researchers and developers to train state-of-the-art models for question answering (QA). By using PrimeQA, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. PrimeQA is built on top of the [Transformers](https://github.com/huggingface/transformers) toolkit and uses [datasets](https://huggingface.co/datasets/viewer/) and [models](https://huggingface.co/PrimeQA) that are directly downloadable.\n\n\nThe models within PrimeQA supports End-to-end Question Answering. PrimeQA answers questions via \n- [Information Retrieval](https://github.com/primeqa/primeqa/tree/main/primeqa/ir): Retrieving documents and passages using both traditional (e.g. BM25) and neural (e.g. ColBERT) models\n- [Multilingual Machine Reading Comprehension](https://huggingface.co/ibm/tydiqa-primary-task-xlm-roberta-large): Extract and/ or generate answers given the source document or passage.\n- [Multilingual Question Generation](https://huggingface.co/PrimeQA/mt5-base-tydi-question-generator): Supports generation of questions for effective domain adaptation over [tables](https://huggingface.co/PrimeQA/t5-base-table-question-generator) and [multilingual text](https://huggingface.co/PrimeQA/mt5-base-tydi-question-generator).\n- [Retrieval Augmented Generation](https://github.com/primeqa/primeqa/blob/main/notebooks/retriever-reader-pipelines/prompt_reader_with_GPT.ipynb): Generate answers using the GPT-3/ChatGPT pretrained models, conditioned on retrieved passages. \n\nSome examples of models (applicable on benchmark datasets) supported are :\n- [Traditional IR with BM25](https://github.com/primeqa/primeqa/tree/main/primeqa/ir/) Pyserini\n- [Neural IR with ColBERT, DPR](https://github.com/primeqa/primeqa/tree/main/primeqa/ir) (collaboration with [Stanford NLP](https://nlp.stanford.edu/) IR led by [Chris Potts](https://web.stanford.edu/~cgpotts/) & [Matei Zaharia](https://cs.stanford.edu/~matei/)).\nReplicating the experiments that [Dr. Decr](https://huggingface.co/ibm/DrDecr_XOR-TyDi_whitebox) (Li et. al, 2022) performed to reach the top of the XOR TyDI leaderboard.\n- [Machine Reading Comprehension with XLM-R](https://github.com/primeqa/primeqa/tree/main/primeqa/mrc): to replicate the experiments to get to the top of the TyDI leaderboard similar to the performance of the IBM GAAMA system. Coming soon: code to replicate GAAMA's performance on Natural Questions. \n\n## \ud83c\udfc5 Top of the Leaderboard\n\nPrimeQA is at the top of several leaderboards: XOR-TyDi, TyDiQA-main, OTT-QA and HybridQA.\n\n### [XOR-TyDi](https://nlp.cs.washington.edu/xorqa/)\n<img src=\"docs/_static/img/xor-tydi.png\" width=\"50%\">\n\n### [TyDiQA-main](https://ai.google.com/research/tydiqa)\n<img src=\"docs/_static/img/tydi-main.png\" width=\"50%\">\n\n### [OTT-QA](https://codalab.lisn.upsaclay.fr/competitions/7967)\n<img src=\"docs/_static/img/ott-qa.png\" width=\"50%\">\n\n### [HybridQA](https://codalab.lisn.upsaclay.fr/competitions/7979)\n<img src=\"docs/_static/img/hybridqa.png\" width=\"50%\">\n\n## \u2714\ufe0f Getting Started\n\n### Installation\n[Installation doc](https://primeqa.github.io/primeqa/installation.html)       \n\n```shell\n# cd to project root\n\n# If you want to run on GPU make sure to install torch appropriately\n\n# E.g. for torch 1.11 + CUDA 11.3:\npip install 'torch~=1.11.0' --extra-index-url https://download.pytorch.org/whl/cu113\n\n# Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired\n# Example installation commands:\n\n# Minimal install (non-editable)\npip install .\n\n# GPU support\npip install .[gpu]\n\n# Full install (editable)\npip install -e .[all]\n```\n\nPlease note that dependencies (specified in [setup.py](./setup.py)) are pinned to provide a stable experience.\nWhen installing from source these can be modified, however this is not officially supported.\n\n**Note:** in many environments, conda-forge based faiss libraries perform substantially better than the default ones installed with pip. To install faiss libraries from conda-forge, use the following steps:\n\n- Create and activate a conda environment\n- Install faiss libraries, using a command\n\n```conda install -c conda-forge faiss=1.7.0 faiss-gpu=1.7.0```\n\n- In `setup.py`, remove the faiss-related lines:\n\n```commandline\n\"faiss-cpu~=1.7.2\": [\"install\", \"gpu\"],\n\"faiss-gpu~=1.7.2\": [\"gpu\"],\n```\n\n- Continue with the `pip install` commands as desctibed above.\n\n\n### JAVA requirements\nJava 11 is required for BM25 retrieval. Install java as follows:\n\n```shell\nconda install -c conda-forge openjdk=11\n```\n## :speech_balloon: Blog Posts\nThere're several blog posts by members of the open source community on how they've been using PrimeQA for their needs. Read some of them:\n1. [PrimeQA and GPT 3](https://www.marktechpost.com/2023/03/03/with-just-20-lines-of-python-code-you-can-do-retrieval-augmented-gpt-based-qa-using-this-open-source-repository-called-primeqa/)\n2. [Enterprise search with PrimeQA](https://heidloff.net/article/introduction-neural-information-retrieval/)\n3. [A search engine for Trivia geeks](https://www.deleeuw.me.uk/posts/Using-PrimeQA-For-NLP-Question-Answering/)\n\n\n## \ud83e\uddea Unit Tests\n[Testing doc](https://primeqa.github.io/primeqa/testing.html)       \n\nTo run the unit tests you first need to [install PrimeQA](#Installation).\nMake sure to install with the `[tests]` or `[all]` extras from pip.\n\nFrom there you can run the tests via pytest, for example:\n```shell\npytest --cov PrimeQA --cov-config .coveragerc tests/\n```\n\nFor more information, see:\n- Our [tox.ini](./tox.ini)\n- The [pytest](https://docs.pytest.org) and [tox](https://tox.wiki/en/latest/) documentation    \n\n## \ud83d\udd2d Learn more\n\n| Section | Description |\n|-|-|\n| \ud83d\udcd2 [Documentation](https://primeqa.github.io/primeqa) | Full API documentation and tutorials |\n| \ud83c\udfc1 [Quick tour: Entry Points for PrimeQA](https://github.com/primeqa/primeqa/tree/main/primeqa) | Different entry points for PrimeQA: Information Retrieval, Reading Comprehension, TableQA and Question Generation |\n| \ud83d\udcd3 [Tutorials: Jupyter Notebooks](https://github.com/primeqa/primeqa/tree/main/notebooks) | Notebooks to get started on QA tasks |\n| \ud83d\udcd3 [GPT-3/ChatGPT Reader Notebooks](https://github.com/primeqa/primeqa/tree/main/notebooks/mrc/LLM_reader_predict_mode.ipynb) | Notebooks to get started with the GPT-3/ChatGPT reader components|\n| \ud83d\udcbb [Examples: Applying PrimeQA on various QA tasks](https://github.com/primeqa/primeqa/tree/main/examples) | Example scripts for fine-tuning PrimeQA models on a range of QA tasks |\n| \ud83e\udd17 [Model sharing and uploading](https://huggingface.co/docs/transformers/model_sharing) | Upload and share your fine-tuned models with the community |\n| \u2705 [Pull Request](https://primeqa.github.io/primeqa/pull_request_template.html) | PrimeQA Pull Request |\n| \ud83d\udcc4 [Generate Documentation](https://primeqa.github.io/primeqa/README.html) | How Documentation works |        \n| \ud83d\udee0 [Orchestrator Service REST Microservice](https://primeqa.github.io/primeqa/orchestrator.html) | Proof-of-concept code for PrimeQA Orchestrator microservice |        \n| \ud83d\udcd6 [Tooling UI](https://primeqa.github.io/primeqa/tooling_ui.html) | Demo UI |        \n\n## \u2764\ufe0f PrimeQA collaborators include       \n\n| | | | |\n|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|\n|<img width=\"75\" alt=\"stanford\" src=\"docs/_static/img/collab-stanford-circle.png\">| Stanford NLP |<img width=\"75\" alt=\"i\" src=\"docs/_static/img/collab-i-circle.png\">| University of Illinois |\n|<img width=\"75\" alt=\"stuttgart\" src=\"docs/_static/img/collab-stuttgart-circle.png\">| University of Stuttgart | <img width=\"75\" alt=\"notredame\" src=\"docs/_static/img/collab-notredame-circle.png\">| University of Notre Dame |\n|<img width=\"75\" alt=\"ohio\" src=\"docs/_static/img/collab-ohio-circle.png\">| Ohio State University |<img width=\"75\" alt=\"carnegie\" src=\"docs/_static/img/collab-carnegie-circle.png\">| Carnegie Mellon University |\n|<img width=\"75\" alt=\"massachusetts\" src=\"docs/_static/img/collab-massachusetts-circle.png\">| University of Massachusetts |<img width=\"75\" height=\"75\" alt=\"ibm\" src=\"docs/_static/img/collab-ibm-circle.png\">| IBM Research |\n| | | | |\n\n\n<br>\n<br>\n<br>\n<br>\n<div align=\"center\">\n    <img width=\"30\" alt=\"primeqa\" src=\"docs/_static/primeqa_logo.png\">\n</div>\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "State-of-the-art Question Answering",
    "version": "0.15.2",
    "project_urls": {
        "Homepage": "https://github.com/primeqa/primeqa"
    },
    "split_keywords": [
        "question answering (qa)",
        "machine reading comprehension (mrc)",
        "information retrieval (ir)"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "03a753169047f50c768e89d2ec5668e3d56732becfda1590e9ffd04502d2d1cb",
                "md5": "5a728da5fb9c9948df157b09905a78b5",
                "sha256": "d0b899b791925251e18722dbab2efa050b325675c6f4fbf07ac0c29169b9a663"
            },
            "downloads": -1,
            "filename": "primeqa-0.15.2-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5a728da5fb9c9948df157b09905a78b5",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.7.0, <3.11.0",
            "size": 492119,
            "upload_time": "2023-06-27T19:37:49",
            "upload_time_iso_8601": "2023-06-27T19:37:49.876424Z",
            "url": "https://files.pythonhosted.org/packages/03/a7/53169047f50c768e89d2ec5668e3d56732becfda1590e9ffd04502d2d1cb/primeqa-0.15.2-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-27 19:37:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "primeqa",
    "github_project": "primeqa",
    "travis_ci": true,
    "coveralls": true,
    "github_actions": true,
    "tox": true,
    "lcname": "primeqa"
}
        
Elapsed time: 0.20450s