nlp-toolbox


Namenlp-toolbox JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/thinh-vu/nlp_toolbox
SummaryNatural Language Processing Tools
upload_time2023-03-25 18:45:06
maintainer
docs_urlNone
authorThinh Vu
requires_python>=3.7
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # nlp_toolbox
nlp_toolbox is an open-source GitHub repository that provides a collection of tools for natural language processing tasks. The repository provides functions for loading text from multiple sources such as the web and ebooks. Additionally, it includes functions for summarizing text, OCR, interacting with the OpenAI GPT API, and generating word clouds.

<div>
  <img src="https://img.shields.io/pypi/pyversions/nlp_toolbox?logoColor=brown&style=plastic" alt= "Version"/>
  <img src="https://img.shields.io/pypi/dm/nlp_toolbox" alt="Download Badge"/>
  <img src="https://img.shields.io/github/last-commit/thinh-vu/nlp_toolbox" alt="Commit Badge"/>
  <img src="https://img.shields.io/github/license/thinh-vu/nlp_toolbox?color=red" alt="License Badge"/>
</div>

# II. REFERENCES
## 2.1. How to use this package?
- Install the stable version: `pip install nlp_toolbox`
- You can install the latest `nlp_toolbox` version from source with the following command:
`pip install git+https://github.com/thinh-vu/nlp_toolbox.git@main`

_(*) You might need to insert a `!` before your command when running terminal commands on Google Colab._

- To start using functions, you need to import them: `from nlp_toolbox import *`

# III. DEPENDENCIES
## ChatGPT API
ChatGPT API simplifies NLP tasks by allowing you to send a request in a prompt format and receive a response without the need for dependent packages.

To send a request to ChatGPT API endpoint, you will need an OpenAI API key that can be obtained from [OpenAI](https://platform.openai.com/account/api-keys).

## spacy

Download pre-trained files using terminal.
```
python -m spacy download en_core_web_lg
python -m spacy download en_core_web_sm
```

## pytesseract

<details>
  <summary> Quick installation guide </summary>

  **On Linux**

  ```
  sudo apt-get update
  sudo apt-get install libleptonica-dev tesseract-ocr tesseract-ocr-dev libtesseract-dev python3-pil tesseract-ocr-eng tesseract-ocr-script-latn
  ```
  **On Mac**
  `brew install tesseract`

  **On Windows**
  - Download binary from https://github.com/UB-Mannheim/tesseract/wiki. 
  - Add `pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'` to your script.
  - Install python package using pip:
  ```
  pip install tesseract
  pip install tesseract-ocr
  ```
</details>

Reference: [Installation guide](https://tesseract-ocr.github.io/tessdoc/Installation.html)

- For Windows: Specific the location of the pytesseract by adding this code to your python project `pytesseract.pytesseract.tesseract_cmd = r'C:\Users\mrthi\AppData\Local\Tesseract-OCR\tesseract.exe'`

- Add pre-trained languages data: 
  - Visit github and download the data file: [tessdata](https://github.com/tesseract-ocr/tessdata), eg: `vie.traineddata` for Vietnamese
  - Copy the downloaded file to the `tessdata` folder, Eg `C:\Users\YOUR-USER-NAME\AppData\Local\Tesseract-OCR\tessdata`

# IV. 🙋‍♂️ CONTACT INFORMATION
You can contact me at one of my social network profiles:

<div id="badges" align="center">
  <a href="https://www.linkedin.com/in/thinh-vu">
    <img src="https://img.shields.io/badge/LinkedIn-blue?style=for-the-badge&logo=linkedin&logoColor=white" alt="LinkedIn Badge"/>
  </a>
  <a href="https://www.messenger.com/t/mr.thinh.ueh">
    <img src="https://img.shields.io/badge/Messenger-00B2FF?style=for-the-badge&logo=messenger&logoColor=white" alt="Messenger Badge"/>
  <a href="https://www.youtube.com/channel/UCYgG-bmk92OhYsP20TS0MbQ">
    <img src="https://img.shields.io/badge/YouTube-red?style=for-the-badge&logo=youtube&logoColor=white" alt="Youtube Badge"/>
  </a>
  </a>
    <a href="https://github.com/thinh-vu">
    <img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white" alt="Github Badge"/>
  </a>
</div>

---

If you find value in my open-source projects and would like to support their development, you can donate via [Paypal](https://paypal.me/thinhvuphoto?country.x=VN&locale.x=en_US) or Momo e-wallet (VN). Your contribution will help me maintain my blog hosting fee and continue to create high-quality content. Thank you for your support!

![momo-qr](https://github.com/thinh-vu/vnstock/blob/main/src/momo-qr-thinhvu.jpeg?raw=true)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/thinh-vu/nlp_toolbox",
    "name": "nlp-toolbox",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Thinh Vu",
    "author_email": "mrthinh@live.com",
    "download_url": "https://files.pythonhosted.org/packages/50/34/69e0060a5c9f853416e48b4ff77cc0f648f27ca531c2dcc36634f474e616/nlp_toolbox-0.0.3.tar.gz",
    "platform": null,
    "description": "# nlp_toolbox\nnlp_toolbox is an open-source GitHub repository that provides a collection of tools for natural language processing tasks. The repository provides functions for loading text from multiple sources such as the web and ebooks. Additionally, it includes functions for summarizing text, OCR, interacting with the OpenAI GPT API, and generating word clouds.\n\n<div>\n  <img src=\"https://img.shields.io/pypi/pyversions/nlp_toolbox?logoColor=brown&style=plastic\" alt= \"Version\"/>\n  <img src=\"https://img.shields.io/pypi/dm/nlp_toolbox\" alt=\"Download Badge\"/>\n  <img src=\"https://img.shields.io/github/last-commit/thinh-vu/nlp_toolbox\" alt=\"Commit Badge\"/>\n  <img src=\"https://img.shields.io/github/license/thinh-vu/nlp_toolbox?color=red\" alt=\"License Badge\"/>\n</div>\n\n# II. REFERENCES\n## 2.1. How to use this package?\n- Install the stable version: `pip install nlp_toolbox`\n- You can install the latest `nlp_toolbox` version from source with the following command:\n`pip install git+https://github.com/thinh-vu/nlp_toolbox.git@main`\n\n_(*) You might need to insert a `!` before your command when running terminal commands on Google Colab._\n\n- To start using functions, you need to import them: `from nlp_toolbox import *`\n\n# III. DEPENDENCIES\n## ChatGPT API\nChatGPT API simplifies NLP tasks by allowing you to send a request in a prompt format and receive a response without the need for dependent packages.\n\nTo send a request to ChatGPT API endpoint, you will need an OpenAI API key that can be obtained from [OpenAI](https://platform.openai.com/account/api-keys).\n\n## spacy\n\nDownload pre-trained files using terminal.\n```\npython -m spacy download en_core_web_lg\npython -m spacy download en_core_web_sm\n```\n\n## pytesseract\n\n<details>\n  <summary> Quick installation guide </summary>\n\n  **On Linux**\n\n  ```\n  sudo apt-get update\n  sudo apt-get install libleptonica-dev tesseract-ocr tesseract-ocr-dev libtesseract-dev python3-pil tesseract-ocr-eng tesseract-ocr-script-latn\n  ```\n  **On Mac**\n  `brew install tesseract`\n\n  **On Windows**\n  - Download binary from https://github.com/UB-Mannheim/tesseract/wiki. \n  - Add `pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'` to your script.\n  - Install python package using pip:\n  ```\n  pip install tesseract\n  pip install tesseract-ocr\n  ```\n</details>\n\nReference: [Installation guide](https://tesseract-ocr.github.io/tessdoc/Installation.html)\n\n- For Windows: Specific the location of the pytesseract by adding this code to your python project `pytesseract.pytesseract.tesseract_cmd = r'C:\\Users\\mrthi\\AppData\\Local\\Tesseract-OCR\\tesseract.exe'`\n\n- Add pre-trained languages data: \n  - Visit github and download the data file: [tessdata](https://github.com/tesseract-ocr/tessdata), eg: `vie.traineddata` for Vietnamese\n  - Copy the downloaded file to the `tessdata` folder, Eg `C:\\Users\\YOUR-USER-NAME\\AppData\\Local\\Tesseract-OCR\\tessdata`\n\n# IV. \ud83d\ude4b\u200d\u2642\ufe0f CONTACT INFORMATION\nYou can contact me at one of my social network profiles:\n\n<div id=\"badges\" align=\"center\">\n  <a href=\"https://www.linkedin.com/in/thinh-vu\">\n    <img src=\"https://img.shields.io/badge/LinkedIn-blue?style=for-the-badge&logo=linkedin&logoColor=white\" alt=\"LinkedIn Badge\"/>\n  </a>\n  <a href=\"https://www.messenger.com/t/mr.thinh.ueh\">\n    <img src=\"https://img.shields.io/badge/Messenger-00B2FF?style=for-the-badge&logo=messenger&logoColor=white\" alt=\"Messenger Badge\"/>\n  <a href=\"https://www.youtube.com/channel/UCYgG-bmk92OhYsP20TS0MbQ\">\n    <img src=\"https://img.shields.io/badge/YouTube-red?style=for-the-badge&logo=youtube&logoColor=white\" alt=\"Youtube Badge\"/>\n  </a>\n  </a>\n    <a href=\"https://github.com/thinh-vu\">\n    <img src=\"https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white\" alt=\"Github Badge\"/>\n  </a>\n</div>\n\n---\n\nIf you find value in my open-source projects and would like to support their development, you can donate via [Paypal](https://paypal.me/thinhvuphoto?country.x=VN&locale.x=en_US) or Momo e-wallet (VN). Your contribution will help me maintain my blog hosting fee and continue to create high-quality content. Thank you for your support!\n\n![momo-qr](https://github.com/thinh-vu/vnstock/blob/main/src/momo-qr-thinhvu.jpeg?raw=true)\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Natural Language Processing Tools",
    "version": "0.0.3",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a781ccb1340d595a745a447a59058dbbcf8ae75b602aaea47e1b70443e517cf1",
                "md5": "1ba9c0fdd0e4f8b23920b289f3b8931e",
                "sha256": "53bf6ef2af98ccf039ce78637cc10bb566c9fa45ffe1b5a645c7db83d314e29c"
            },
            "downloads": -1,
            "filename": "nlp_toolbox-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1ba9c0fdd0e4f8b23920b289f3b8931e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 9386,
            "upload_time": "2023-03-25T18:45:04",
            "upload_time_iso_8601": "2023-03-25T18:45:04.585425Z",
            "url": "https://files.pythonhosted.org/packages/a7/81/ccb1340d595a745a447a59058dbbcf8ae75b602aaea47e1b70443e517cf1/nlp_toolbox-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "503469e0060a5c9f853416e48b4ff77cc0f648f27ca531c2dcc36634f474e616",
                "md5": "32b224e11fd398e5e2186afea44cac48",
                "sha256": "485f8f54d140379eab6ad67264c079612c6c8802471ede7feea48b270a67d459"
            },
            "downloads": -1,
            "filename": "nlp_toolbox-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "32b224e11fd398e5e2186afea44cac48",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 9598,
            "upload_time": "2023-03-25T18:45:06",
            "upload_time_iso_8601": "2023-03-25T18:45:06.513330Z",
            "url": "https://files.pythonhosted.org/packages/50/34/69e0060a5c9f853416e48b4ff77cc0f648f27ca531c2dcc36634f474e616/nlp_toolbox-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-25 18:45:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "thinh-vu",
    "github_project": "nlp_toolbox",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "nlp-toolbox"
}
        
Elapsed time: 0.11010s