ruppell


Nameruppell JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/joorgelm/ruppell
SummaryRuppell is a Python package to help in text extraction from documents.
upload_time2023-07-13 19:15:09
maintainer
docs_urlNone
authorJorge Melgarejo
requires_python
licenseMIT License
keywords ocr text extractor
VCS
bugtrack_url
requirements Pillow pytesseract setuptools pdfminer.six docx2txt pandas
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Ruppell: powerful Python text extractor toolkit

## What is it?

**Ruppell** is a Python package to help in documents' text extraction.

## Main Features
Here are just a few of the things that ruppell does well:

  - Create datasets from multiple files.
  - Extract documents' text (pdf, docx, jpeg, jpg, png).
  - Create Pandas dataframe from documents' folder.
  - Convert documents to .txt files

## Where to get it

Binary installers for the latest released version are available at the [Python
package index](https://pypi.org/project/ruppell/).

```sh
pip install ruppell
```

## Dependencies
- [Pillow](https://github.com/python-pillow/Pillow)
- [Pytesseract](https://github.com/madmaze/pytesseract)
- [Pdfminer.six](https://github.com/pdfminer/pdfminer.six)
- [Docx2txt](https://github.com/ankushshah89/python-docx2txt)
- [Pandas](https://github.com/pandas-dev/pandas)
- Python >= 3.6

## Example

```
>>> import ruppell
>>> ruppell.image_to_string('image.png')
'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer id bibendum sapien.'
```

## Supported Languages

The language codes are **ISO 639-2/B** or **ISO 639-2/T**.

All languages codes [here](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes).

## Contributing
	
If you think that we can do the Ruppell more powerful please contribute with this project. And let's improve it to help other developers.

Create a pull request or let's talk about something in issues. Thanks a lot.

## Author
Jorge Melgarejo, melgarejo.colarte@gmail.com

## License
[MIT](LICENSE)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/joorgelm/ruppell",
    "name": "ruppell",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "ocr text extractor",
    "author": "Jorge Melgarejo",
    "author_email": "melgarejo.colarte@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/94/ef/5305c392e88289396096ab607d43ffed8d524ac48d5dc5ae9449dab46c0f/ruppell-1.0.0.tar.gz",
    "platform": null,
    "description": "# Ruppell: powerful Python text extractor toolkit\n\n## What is it?\n\n**Ruppell** is a Python package to help in documents' text extraction.\n\n## Main Features\nHere are just a few of the things that ruppell does well:\n\n  - Create datasets from multiple files.\n  - Extract documents' text (pdf, docx, jpeg, jpg, png).\n  - Create Pandas dataframe from documents' folder.\n  - Convert documents to .txt files\n\n## Where to get it\n\nBinary installers for the latest released version are available at the [Python\npackage index](https://pypi.org/project/ruppell/).\n\n```sh\npip install ruppell\n```\n\n## Dependencies\n- [Pillow](https://github.com/python-pillow/Pillow)\n- [Pytesseract](https://github.com/madmaze/pytesseract)\n- [Pdfminer.six](https://github.com/pdfminer/pdfminer.six)\n- [Docx2txt](https://github.com/ankushshah89/python-docx2txt)\n- [Pandas](https://github.com/pandas-dev/pandas)\n- Python >= 3.6\n\n## Example\n\n```\n>>> import ruppell\n>>> ruppell.image_to_string('image.png')\n'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer id bibendum sapien.'\n```\n\n## Supported Languages\n\nThe language codes are **ISO 639-2/B** or **ISO 639-2/T**.\n\nAll languages codes [here](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes).\n\n## Contributing\n\t\nIf you think that we can do the Ruppell more powerful please contribute with this project. And let's improve it to help other developers.\n\nCreate a pull request or let's talk about something in issues. Thanks a lot.\n\n## Author\nJorge Melgarejo, melgarejo.colarte@gmail.com\n\n## License\n[MIT](LICENSE)\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Ruppell is a Python package to help in text extraction from documents.",
    "version": "1.0.0",
    "project_urls": {
        "Download": "https://github.com/joorgelm/ruppell/archive/1.0.0.tar.gz",
        "Homepage": "https://github.com/joorgelm/ruppell"
    },
    "split_keywords": [
        "ocr",
        "text",
        "extractor"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e5fdfb190e31483b70a554119ce6813eb9ed6fb9620f2048211fbc07abc9c049",
                "md5": "b0feb7fa70d1ce9503f56d3e440e8def",
                "sha256": "6ad8bda1f04e9d7442c2580d40a53f59d58f7eb058cf90559057702016aec1b6"
            },
            "downloads": -1,
            "filename": "ruppell-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b0feb7fa70d1ce9503f56d3e440e8def",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 5767,
            "upload_time": "2023-07-13T19:15:07",
            "upload_time_iso_8601": "2023-07-13T19:15:07.862868Z",
            "url": "https://files.pythonhosted.org/packages/e5/fd/fb190e31483b70a554119ce6813eb9ed6fb9620f2048211fbc07abc9c049/ruppell-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "94ef5305c392e88289396096ab607d43ffed8d524ac48d5dc5ae9449dab46c0f",
                "md5": "1e1ee9f183c3e43656dc000d4903de50",
                "sha256": "007ae66ca6fb284774c4269fe18774d7df0d2f618422323cb9fd12ed2d36417d"
            },
            "downloads": -1,
            "filename": "ruppell-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1e1ee9f183c3e43656dc000d4903de50",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 4675,
            "upload_time": "2023-07-13T19:15:09",
            "upload_time_iso_8601": "2023-07-13T19:15:09.082897Z",
            "url": "https://files.pythonhosted.org/packages/94/ef/5305c392e88289396096ab607d43ffed8d524ac48d5dc5ae9449dab46c0f/ruppell-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-13 19:15:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "joorgelm",
    "github_project": "ruppell",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "Pillow",
            "specs": [
                [
                    "~=",
                    "10.0.0"
                ]
            ]
        },
        {
            "name": "pytesseract",
            "specs": [
                [
                    "~=",
                    "0.3.10"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "~=",
                    "68.0.0"
                ]
            ]
        },
        {
            "name": "pdfminer.six",
            "specs": [
                [
                    "==",
                    "20221105"
                ]
            ]
        },
        {
            "name": "docx2txt",
            "specs": [
                [
                    "~=",
                    "0.8"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "~=",
                    "2.0.3"
                ]
            ]
        }
    ],
    "lcname": "ruppell"
}
        
Elapsed time: 0.09128s