toolri


Nametoolri JSON
Version 1.0.1 PyPI version JSON
download
home_pageNone
SummaryA tool for extracting, labeling and linking entities in document images for Information Extraction tasks.
upload_time2024-08-09 20:57:08
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ToolRI

ToolRI was created to simplify and standardize the creation of samples for the task of Information Extraction in document images. The tool allows text extraction by OCR, the creation of document entities and their labeling and linking. The project was created purely with Python and can be run on any desktop platform. The graphical user interface is implemented thanks to the amazing <a href="https://github.com/tomschimansky/customtkinter">CustomTkinter</a> library.

## Instalation

### PyPi

Install the ToolRI package with `pip`:

    pip install toolri

### Source

Clone the ToolRI repository with:

    git clone https://github.com/Victorgonl/ToolRI

And install using `pip`:

    pip install ./ToolRI/

## Standalone

### Download

You can download a portable binary of the tool to start using right away. Download and run a version on the <a href="https://github.com/Victorgonl/ToolRI/releases">releases</a> page of the ToolRI repository.

### Build

To build the standalone version of ToolRI into a portable binary, clone the repository:

    git clone https://github.com/Victorgonl/ToolRI

Change current directory to `./ToolRI`:

    cd ./ToolRI

Install all the dependencies found on `requirements.txt`:

    pip install -r requirements.txt

And run the script `toolri_build.py`:

    python3 toolri_build.py

The binary will be available on `dist` folder.

## Documentation

***Under construction.*** :construction:

## Tesseract OCR

To be able to use the OCR function in ToolRI, Tesseract OCR must be installed separately.

***For now, OCR is configured for English and Portuguese languages only, but it will be updated soon for all languages available.*** :construction:

### Debian based

Use the command:

    sudo apt-get install tesseract-ocr tesseract-ocr-eng tesseract-ocr-por

### Windows

- Download and run the installer available at https://github.com/UB-Mannheim/tesseract/wiki.

- Make sure to install Tesseract on `C:\Program Files\Tesseract-OCR\` (the default directory) due to a predefined configuration in current ToolRI version.

## Usage

ToolRI was developed and used to create the <a href="https://github.com/LabRI-Information-Retrieval-Lab/UFLA-FORMS">UFLA-FORMS</a> dataset. Download the dataset to try the tool on the available samples or create a new metadata for any document image available.

### Example

    import toolri

    image = toolri.load_image("document_image.jpg")

    labels = [
        toolri.ToolRILabel(name="QUESTION", color="#004B80", links=["ANSWER"], is_visible=True),
        toolri.ToolRILabel(name="ANSWER", color="#00943E", links=[], is_visible=True)
    ]

    data = toolri.toolri(image=image, data=data, labels=labels)

    toolri.draw_data_on_image(image=image, data=data, labels=labels).show()

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "toolri",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "\"Victor G. Lima\" <victorgonl@outlook.com>",
    "download_url": "https://files.pythonhosted.org/packages/3c/04/6eb6d4cae313d2dc1d63f97230b7296aca26cde433fcd2fc5a8dfef060df/toolri-1.0.1.tar.gz",
    "platform": null,
    "description": "# ToolRI\n\nToolRI was created to simplify and standardize the creation of samples for the task of Information Extraction in document images. The tool allows text extraction by OCR, the creation of document entities and their labeling and linking. The project was created purely with Python and can be run on any desktop platform. The graphical user interface is implemented thanks to the amazing <a href=\"https://github.com/tomschimansky/customtkinter\">CustomTkinter</a> library.\n\n## Instalation\n\n### PyPi\n\nInstall the ToolRI package with `pip`:\n\n    pip install toolri\n\n### Source\n\nClone the ToolRI repository with:\n\n    git clone https://github.com/Victorgonl/ToolRI\n\nAnd install using `pip`:\n\n    pip install ./ToolRI/\n\n## Standalone\n\n### Download\n\nYou can download a portable binary of the tool to start using right away. Download and run a version on the <a href=\"https://github.com/Victorgonl/ToolRI/releases\">releases</a> page of the ToolRI repository.\n\n### Build\n\nTo build the standalone version of ToolRI into a portable binary, clone the repository:\n\n    git clone https://github.com/Victorgonl/ToolRI\n\nChange current directory to `./ToolRI`:\n\n    cd ./ToolRI\n\nInstall all the dependencies found on `requirements.txt`:\n\n    pip install -r requirements.txt\n\nAnd run the script `toolri_build.py`:\n\n    python3 toolri_build.py\n\nThe binary will be available on `dist` folder.\n\n## Documentation\n\n***Under construction.*** :construction:\n\n## Tesseract OCR\n\nTo be able to use the OCR function in ToolRI, Tesseract OCR must be installed separately.\n\n***For now, OCR is configured for English and Portuguese languages only, but it will be updated soon for all languages available.*** :construction:\n\n### Debian based\n\nUse the command:\n\n    sudo apt-get install tesseract-ocr tesseract-ocr-eng tesseract-ocr-por\n\n### Windows\n\n- Download and run the installer available at https://github.com/UB-Mannheim/tesseract/wiki.\n\n- Make sure to install Tesseract on `C:\\Program Files\\Tesseract-OCR\\` (the default directory) due to a predefined configuration in current ToolRI version.\n\n## Usage\n\nToolRI was developed and used to create the <a href=\"https://github.com/LabRI-Information-Retrieval-Lab/UFLA-FORMS\">UFLA-FORMS</a> dataset. Download the dataset to try the tool on the available samples or create a new metadata for any document image available.\n\n### Example\n\n    import toolri\n\n    image = toolri.load_image(\"document_image.jpg\")\n\n    labels = [\n        toolri.ToolRILabel(name=\"QUESTION\", color=\"#004B80\", links=[\"ANSWER\"], is_visible=True),\n        toolri.ToolRILabel(name=\"ANSWER\", color=\"#00943E\", links=[], is_visible=True)\n    ]\n\n    data = toolri.toolri(image=image, data=data, labels=labels)\n\n    toolri.draw_data_on_image(image=image, data=data, labels=labels).show()\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A tool for extracting, labeling and linking entities in document images for Information Extraction tasks.",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/Victorgonl/ToolRI",
        "Issues": "https://github.com/Victorgonl/ToolRI/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d188d6fd08b882718d58c696f556a0eb9a852dd41de612629f1e87b1c583c750",
                "md5": "23847b010f48594ff6e74cfe2a69ff9d",
                "sha256": "7e23afe819e4ad38ebfeacb836e32934363a0f9bb00b5309a588c85955d4dd4a"
            },
            "downloads": -1,
            "filename": "toolri-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "23847b010f48594ff6e74cfe2a69ff9d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 55086,
            "upload_time": "2024-08-09T20:57:07",
            "upload_time_iso_8601": "2024-08-09T20:57:07.511478Z",
            "url": "https://files.pythonhosted.org/packages/d1/88/d6fd08b882718d58c696f556a0eb9a852dd41de612629f1e87b1c583c750/toolri-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3c046eb6d4cae313d2dc1d63f97230b7296aca26cde433fcd2fc5a8dfef060df",
                "md5": "640b68c5eca4b7e3b0d4541fec06056e",
                "sha256": "07438a37a75d8aab9bbe4957315b34b475745f2f1f3cc2ec9f397c115ca36c3d"
            },
            "downloads": -1,
            "filename": "toolri-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "640b68c5eca4b7e3b0d4541fec06056e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 38444,
            "upload_time": "2024-08-09T20:57:08",
            "upload_time_iso_8601": "2024-08-09T20:57:08.962351Z",
            "url": "https://files.pythonhosted.org/packages/3c/04/6eb6d4cae313d2dc1d63f97230b7296aca26cde433fcd2fc5a8dfef060df/toolri-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-09 20:57:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Victorgonl",
    "github_project": "ToolRI",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "toolri"
}
        
Elapsed time: 0.32590s