# Ruppell: powerful Python text extractor toolkit
## What is it?
**Ruppell** is a Python package to help in documents' text extraction.
## Main Features
Here are just a few of the things that ruppell does well:
- Create datasets from multiple files.
- Extract documents' text (pdf, docx, jpeg, jpg, png).
- Create Pandas dataframe from documents' folder.
- Convert documents to .txt files
## Where to get it
Binary installers for the latest released version are available at the [Python
package index](https://pypi.org/project/ruppell/).
```sh
pip install ruppell
```
## Dependencies
- [Pillow](https://github.com/python-pillow/Pillow)
- [Pytesseract](https://github.com/madmaze/pytesseract)
- [Pdfminer.six](https://github.com/pdfminer/pdfminer.six)
- [Docx2txt](https://github.com/ankushshah89/python-docx2txt)
- [Pandas](https://github.com/pandas-dev/pandas)
- Python >= 3.6
## Example
```
>>> import ruppell
>>> ruppell.image_to_string('image.png')
'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer id bibendum sapien.'
```
## Supported Languages
The language codes are **ISO 639-2/B** or **ISO 639-2/T**.
All languages codes [here](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes).
## Contributing
If you think that we can do the Ruppell more powerful please contribute with this project. And let's improve it to help other developers.
Create a pull request or let's talk about something in issues. Thanks a lot.
## Author
Jorge Melgarejo, melgarejo.colarte@gmail.com
## License
[MIT](LICENSE)
Raw data
{
"_id": null,
"home_page": "https://github.com/joorgelm/ruppell",
"name": "ruppell",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "ocr text extractor",
"author": "Jorge Melgarejo",
"author_email": "melgarejo.colarte@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/94/ef/5305c392e88289396096ab607d43ffed8d524ac48d5dc5ae9449dab46c0f/ruppell-1.0.0.tar.gz",
"platform": null,
"description": "# Ruppell: powerful Python text extractor toolkit\n\n## What is it?\n\n**Ruppell** is a Python package to help in documents' text extraction.\n\n## Main Features\nHere are just a few of the things that ruppell does well:\n\n - Create datasets from multiple files.\n - Extract documents' text (pdf, docx, jpeg, jpg, png).\n - Create Pandas dataframe from documents' folder.\n - Convert documents to .txt files\n\n## Where to get it\n\nBinary installers for the latest released version are available at the [Python\npackage index](https://pypi.org/project/ruppell/).\n\n```sh\npip install ruppell\n```\n\n## Dependencies\n- [Pillow](https://github.com/python-pillow/Pillow)\n- [Pytesseract](https://github.com/madmaze/pytesseract)\n- [Pdfminer.six](https://github.com/pdfminer/pdfminer.six)\n- [Docx2txt](https://github.com/ankushshah89/python-docx2txt)\n- [Pandas](https://github.com/pandas-dev/pandas)\n- Python >= 3.6\n\n## Example\n\n```\n>>> import ruppell\n>>> ruppell.image_to_string('image.png')\n'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer id bibendum sapien.'\n```\n\n## Supported Languages\n\nThe language codes are **ISO 639-2/B** or **ISO 639-2/T**.\n\nAll languages codes [here](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes).\n\n## Contributing\n\t\nIf you think that we can do the Ruppell more powerful please contribute with this project. And let's improve it to help other developers.\n\nCreate a pull request or let's talk about something in issues. Thanks a lot.\n\n## Author\nJorge Melgarejo, melgarejo.colarte@gmail.com\n\n## License\n[MIT](LICENSE)\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Ruppell is a Python package to help in text extraction from documents.",
"version": "1.0.0",
"project_urls": {
"Download": "https://github.com/joorgelm/ruppell/archive/1.0.0.tar.gz",
"Homepage": "https://github.com/joorgelm/ruppell"
},
"split_keywords": [
"ocr",
"text",
"extractor"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e5fdfb190e31483b70a554119ce6813eb9ed6fb9620f2048211fbc07abc9c049",
"md5": "b0feb7fa70d1ce9503f56d3e440e8def",
"sha256": "6ad8bda1f04e9d7442c2580d40a53f59d58f7eb058cf90559057702016aec1b6"
},
"downloads": -1,
"filename": "ruppell-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b0feb7fa70d1ce9503f56d3e440e8def",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 5767,
"upload_time": "2023-07-13T19:15:07",
"upload_time_iso_8601": "2023-07-13T19:15:07.862868Z",
"url": "https://files.pythonhosted.org/packages/e5/fd/fb190e31483b70a554119ce6813eb9ed6fb9620f2048211fbc07abc9c049/ruppell-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "94ef5305c392e88289396096ab607d43ffed8d524ac48d5dc5ae9449dab46c0f",
"md5": "1e1ee9f183c3e43656dc000d4903de50",
"sha256": "007ae66ca6fb284774c4269fe18774d7df0d2f618422323cb9fd12ed2d36417d"
},
"downloads": -1,
"filename": "ruppell-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "1e1ee9f183c3e43656dc000d4903de50",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 4675,
"upload_time": "2023-07-13T19:15:09",
"upload_time_iso_8601": "2023-07-13T19:15:09.082897Z",
"url": "https://files.pythonhosted.org/packages/94/ef/5305c392e88289396096ab607d43ffed8d524ac48d5dc5ae9449dab46c0f/ruppell-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-13 19:15:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "joorgelm",
"github_project": "ruppell",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "Pillow",
"specs": [
[
"~=",
"10.0.0"
]
]
},
{
"name": "pytesseract",
"specs": [
[
"~=",
"0.3.10"
]
]
},
{
"name": "setuptools",
"specs": [
[
"~=",
"68.0.0"
]
]
},
{
"name": "pdfminer.six",
"specs": [
[
"==",
"20221105"
]
]
},
{
"name": "docx2txt",
"specs": [
[
"~=",
"0.8"
]
]
},
{
"name": "pandas",
"specs": [
[
"~=",
"2.0.3"
]
]
}
],
"lcname": "ruppell"
}