nanoqa


Namenanoqa JSON
Version 0.0.37 PyPI version JSON
download
home_pagehttps://github.com/gabinguo/nanoQA
Summary
upload_time2023-05-14 13:14:03
maintainer
docs_urlNone
authorKunpeng GUO
requires_python>=3.8,<4.0
licenseApache
keywords question answering huggingface transformers
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center"> 
    <h1>
        Question Answering kit
    </h1>
</div>

## Requirements

    - python >= 3.8

```bash
# python environment
which python3
python3 -m venv nanoEnv
source ./nanoEnv/bin/activate

# m1 chip, problem shooting pyserini installation
# CFLAGS="-mavx -DWARN(a)=(a)" pip install nmslib 

# pip3 upgrade
pip3 install --upgrade pip
pip3 install -r requirements.txt

# we also need to install tessaract library, well google it for your os
## For Linux
sudo apt install tesseract-ocr -y
sudo apt install tesseract-ocr-heb
sudo apt install tesseract-ocr-all -y
```

    - Elasticsearch

```bash
# run elasticsearch in a docker container
docker run -d -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.9.2
# after creation of the container, run the following command to start the container
docker start <container_id>
```

## Todos

- [X] Migrate/Re-Implement full QA functions
- [X] Implementation of pdf conversion.
- [X] Implementation of file extraction.
- [X] Implementation of Retriever via ElasticSearch
- [X] Implementation of fine-tuning the reader with adapter
- [X] Put tests

# Download WikiDump

```bash
python download_wikidump.py --lang en --latest --delete-dump 
```


## ChangeLog

- 2023-03-02: replace xpdf by PyMuPDF

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/gabinguo/nanoQA",
    "name": "nanoqa",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "Question Answering,Huggingface Transformers",
    "author": "Kunpeng GUO",
    "author_email": "gabin.guo@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d9/8d/c256ec4296308bc0275b744c39b6b02d8eadb679489ed4ef0a691b966026/nanoqa-0.0.37.tar.gz",
    "platform": null,
    "description": "<div align=\"center\"> \n    <h1>\n        Question Answering kit\n    </h1>\n</div>\n\n## Requirements\n\n    - python >= 3.8\n\n```bash\n# python environment\nwhich python3\npython3 -m venv nanoEnv\nsource ./nanoEnv/bin/activate\n\n# m1 chip, problem shooting pyserini installation\n# CFLAGS=\"-mavx -DWARN(a)=(a)\" pip install nmslib \n\n# pip3 upgrade\npip3 install --upgrade pip\npip3 install -r requirements.txt\n\n# we also need to install tessaract library, well google it for your os\n## For Linux\nsudo apt install tesseract-ocr -y\nsudo apt install tesseract-ocr-heb\nsudo apt install tesseract-ocr-all -y\n```\n\n    - Elasticsearch\n\n```bash\n# run elasticsearch in a docker container\ndocker run -d -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e \"discovery.type=single-node\" docker.elastic.co/elasticsearch/elasticsearch:7.9.2\n# after creation of the container, run the following command to start the container\ndocker start <container_id>\n```\n\n## Todos\n\n- [X] Migrate/Re-Implement full QA functions\n- [X] Implementation of pdf conversion.\n- [X] Implementation of file extraction.\n- [X] Implementation of Retriever via ElasticSearch\n- [X] Implementation of fine-tuning the reader with adapter\n- [X] Put tests\n\n# Download WikiDump\n\n```bash\npython download_wikidump.py --lang en --latest --delete-dump \n```\n\n\n## ChangeLog\n\n- 2023-03-02: replace xpdf by PyMuPDF\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "",
    "version": "0.0.37",
    "project_urls": {
        "Homepage": "https://github.com/gabinguo/nanoQA",
        "Repository": "https://github.com/gabinguo/nanoQA"
    },
    "split_keywords": [
        "question answering",
        "huggingface transformers"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3fd559a41ebeeec296018626a97fb2da399b6ded10b4f263379ee0bbae8e3c6f",
                "md5": "78b78767175c1fd6ef597b3a1a059733",
                "sha256": "edf2f57341fa7a3452a4385a8745b15a763cf51845027d39ccfede86ac6ae645"
            },
            "downloads": -1,
            "filename": "nanoqa-0.0.37-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "78b78767175c1fd6ef597b3a1a059733",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 128218,
            "upload_time": "2023-05-14T13:14:01",
            "upload_time_iso_8601": "2023-05-14T13:14:01.772240Z",
            "url": "https://files.pythonhosted.org/packages/3f/d5/59a41ebeeec296018626a97fb2da399b6ded10b4f263379ee0bbae8e3c6f/nanoqa-0.0.37-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d98dc256ec4296308bc0275b744c39b6b02d8eadb679489ed4ef0a691b966026",
                "md5": "20b29c3dd2c5ba1ec9f5dfb44117c113",
                "sha256": "19ce7bdc2e4f85e8ba02a7518547082dd73118e6ffb81ff6a37d9a6f614ac30e"
            },
            "downloads": -1,
            "filename": "nanoqa-0.0.37.tar.gz",
            "has_sig": false,
            "md5_digest": "20b29c3dd2c5ba1ec9f5dfb44117c113",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 115354,
            "upload_time": "2023-05-14T13:14:03",
            "upload_time_iso_8601": "2023-05-14T13:14:03.871138Z",
            "url": "https://files.pythonhosted.org/packages/d9/8d/c256ec4296308bc0275b744c39b6b02d8eadb679489ed4ef0a691b966026/nanoqa-0.0.37.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-14 13:14:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "gabinguo",
    "github_project": "nanoQA",
    "github_not_found": true,
    "lcname": "nanoqa"
}
        
Elapsed time: 0.60523s