<div align="center">
<h1>
Question Answering kit
</h1>
</div>
## Requirements
- python >= 3.8
```bash
# python environment
which python3
python3 -m venv nanoEnv
source ./nanoEnv/bin/activate
# m1 chip, problem shooting pyserini installation
# CFLAGS="-mavx -DWARN(a)=(a)" pip install nmslib
# pip3 upgrade
pip3 install --upgrade pip
pip3 install -r requirements.txt
# we also need to install tessaract library, well google it for your os
## For Linux
sudo apt install tesseract-ocr -y
sudo apt install tesseract-ocr-heb
sudo apt install tesseract-ocr-all -y
```
- Elasticsearch
```bash
# run elasticsearch in a docker container
docker run -d -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.9.2
# after creation of the container, run the following command to start the container
docker start <container_id>
```
## Todos
- [X] Migrate/Re-Implement full QA functions
- [X] Implementation of pdf conversion.
- [X] Implementation of file extraction.
- [X] Implementation of Retriever via ElasticSearch
- [X] Implementation of fine-tuning the reader with adapter
- [X] Put tests
# Download WikiDump
```bash
python download_wikidump.py --lang en --latest --delete-dump
```
## ChangeLog
- 2023-03-02: replace xpdf by PyMuPDF
Raw data
{
"_id": null,
"home_page": "https://github.com/gabinguo/nanoQA",
"name": "nanoqa",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "Question Answering,Huggingface Transformers",
"author": "Kunpeng GUO",
"author_email": "gabin.guo@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d9/8d/c256ec4296308bc0275b744c39b6b02d8eadb679489ed4ef0a691b966026/nanoqa-0.0.37.tar.gz",
"platform": null,
"description": "<div align=\"center\"> \n <h1>\n Question Answering kit\n </h1>\n</div>\n\n## Requirements\n\n - python >= 3.8\n\n```bash\n# python environment\nwhich python3\npython3 -m venv nanoEnv\nsource ./nanoEnv/bin/activate\n\n# m1 chip, problem shooting pyserini installation\n# CFLAGS=\"-mavx -DWARN(a)=(a)\" pip install nmslib \n\n# pip3 upgrade\npip3 install --upgrade pip\npip3 install -r requirements.txt\n\n# we also need to install tessaract library, well google it for your os\n## For Linux\nsudo apt install tesseract-ocr -y\nsudo apt install tesseract-ocr-heb\nsudo apt install tesseract-ocr-all -y\n```\n\n - Elasticsearch\n\n```bash\n# run elasticsearch in a docker container\ndocker run -d -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e \"discovery.type=single-node\" docker.elastic.co/elasticsearch/elasticsearch:7.9.2\n# after creation of the container, run the following command to start the container\ndocker start <container_id>\n```\n\n## Todos\n\n- [X] Migrate/Re-Implement full QA functions\n- [X] Implementation of pdf conversion.\n- [X] Implementation of file extraction.\n- [X] Implementation of Retriever via ElasticSearch\n- [X] Implementation of fine-tuning the reader with adapter\n- [X] Put tests\n\n# Download WikiDump\n\n```bash\npython download_wikidump.py --lang en --latest --delete-dump \n```\n\n\n## ChangeLog\n\n- 2023-03-02: replace xpdf by PyMuPDF\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "",
"version": "0.0.37",
"project_urls": {
"Homepage": "https://github.com/gabinguo/nanoQA",
"Repository": "https://github.com/gabinguo/nanoQA"
},
"split_keywords": [
"question answering",
"huggingface transformers"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3fd559a41ebeeec296018626a97fb2da399b6ded10b4f263379ee0bbae8e3c6f",
"md5": "78b78767175c1fd6ef597b3a1a059733",
"sha256": "edf2f57341fa7a3452a4385a8745b15a763cf51845027d39ccfede86ac6ae645"
},
"downloads": -1,
"filename": "nanoqa-0.0.37-py3-none-any.whl",
"has_sig": false,
"md5_digest": "78b78767175c1fd6ef597b3a1a059733",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 128218,
"upload_time": "2023-05-14T13:14:01",
"upload_time_iso_8601": "2023-05-14T13:14:01.772240Z",
"url": "https://files.pythonhosted.org/packages/3f/d5/59a41ebeeec296018626a97fb2da399b6ded10b4f263379ee0bbae8e3c6f/nanoqa-0.0.37-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d98dc256ec4296308bc0275b744c39b6b02d8eadb679489ed4ef0a691b966026",
"md5": "20b29c3dd2c5ba1ec9f5dfb44117c113",
"sha256": "19ce7bdc2e4f85e8ba02a7518547082dd73118e6ffb81ff6a37d9a6f614ac30e"
},
"downloads": -1,
"filename": "nanoqa-0.0.37.tar.gz",
"has_sig": false,
"md5_digest": "20b29c3dd2c5ba1ec9f5dfb44117c113",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 115354,
"upload_time": "2023-05-14T13:14:03",
"upload_time_iso_8601": "2023-05-14T13:14:03.871138Z",
"url": "https://files.pythonhosted.org/packages/d9/8d/c256ec4296308bc0275b744c39b6b02d8eadb679489ed4ef0a691b966026/nanoqa-0.0.37.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-14 13:14:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "gabinguo",
"github_project": "nanoQA",
"github_not_found": true,
"lcname": "nanoqa"
}