tabledetector


Nametabledetector JSON
Version 1.0.2 PyPI version JSON
download
home_pagehttps://github.com/rajban94/TableDetector
SummaryEnd-to-End table structure detector
upload_time2024-07-04 08:20:56
maintainerNone
docs_urlNone
authorRishav Banerjee
requires_pythonNone
licenseMIT License
keywords table detector
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Tabledetector

[![PyPI](https://img.shields.io/pypi/v/tabledetector)](https://pypi.org/project/tabledetector/)

Tabledetector is a Python package that takes PDFs or Images as input, checks the alignment, re-aligns if required, detects the table structure, extracts data, return as pandas dataframe for further use. The current implementation focuses on bordered, semibordered and unbordered table structures.

## Features

- **PDF Input:** Accepts PDF/Image files as input for table detection.
- **Alignment Check:** Verifies and adjusts alignment of input.
- **Table Detection:** Identifies bordered, semibordered and unbordered tables in the PDF/Image File.
- **Table Extraction:** Extract the tabular data in the form of dataframe.

## Libraries Used

- Python 3.x
- OpenCV
- NumPy
- pdf2image
- Pillow
- scipy
- jinja2
- easyocr
- pandas

## Create and Activate Environment
```bash
conda create -n <env_name> python=3.7
conda activate <env_name>
```
## Installation of package using pip

```bash
pip install tabledetector
```

## Clone the repository for latest development release

```bash
git clone https://github.com/rajban94/TableDetector.git
```

## Dependency
To utilize this library on Windows, ensure that Poppler is installed and its path is added to the environment variables.

## Usage

## Detection
For bordered table detection and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", type="bordered", rotation=False, method='detect')
```

For semibordered table detection and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="semibordered", rotation=False, method='detect')
```

For unbordered table detection and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="unbordered", rotation=False, method='detect')
```

## Extraction
For bordered table detection and extraction and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", type="bordered", rotation=False, method='extract')
```

For semibordered table detection and extraction and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="semibordered", rotation=False, method='extract')
```

For unbordered table detection and extraction and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="unbordered", rotation=False, method='extract')
```
If no method is mentioned in that case it will check for all the methods and will provide the result accordingly. Also if rotation required make the rotation = True.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rajban94/TableDetector",
    "name": "tabledetector",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "table detector",
    "author": "Rishav Banerjee",
    "author_email": "rishavbanerjee10.rb@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/73/64/36e34094bf2f5b2dba3dcaa1f5eb8076079842583ee210563a05f2ea329a/tabledetector-1.0.2.tar.gz",
    "platform": null,
    "description": "# Tabledetector\r\n\r\n[![PyPI](https://img.shields.io/pypi/v/tabledetector)](https://pypi.org/project/tabledetector/)\r\n\r\nTabledetector is a Python package that takes PDFs or Images as input, checks the alignment, re-aligns if required, detects the table structure, extracts data, return as pandas dataframe for further use. The current implementation focuses on bordered, semibordered and unbordered table structures.\r\n\r\n## Features\r\n\r\n- **PDF Input:** Accepts PDF/Image files as input for table detection.\r\n- **Alignment Check:** Verifies and adjusts alignment of input.\r\n- **Table Detection:** Identifies bordered, semibordered and unbordered tables in the PDF/Image File.\r\n- **Table Extraction:** Extract the tabular data in the form of dataframe.\r\n\r\n## Libraries Used\r\n\r\n- Python 3.x\r\n- OpenCV\r\n- NumPy\r\n- pdf2image\r\n- Pillow\r\n- scipy\r\n- jinja2\r\n- easyocr\r\n- pandas\r\n\r\n## Create and Activate Environment\r\n```bash\r\nconda create -n <env_name> python=3.7\r\nconda activate <env_name>\r\n```\r\n## Installation of package using pip\r\n\r\n```bash\r\npip install tabledetector\r\n```\r\n\r\n## Clone the repository for latest development release\r\n\r\n```bash\r\ngit clone https://github.com/rajban94/TableDetector.git\r\n```\r\n\r\n## Dependency\r\nTo utilize this library on Windows, ensure that Poppler is installed and its path is added to the environment variables.\r\n\r\n## Usage\r\n\r\n## Detection\r\nFor bordered table detection and if rotation not required:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", type=\"bordered\", rotation=False, method='detect')\r\n```\r\n\r\nFor semibordered table detection and if rotation not required:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", method=\"semibordered\", rotation=False, method='detect')\r\n```\r\n\r\nFor unbordered table detection and if rotation not required:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", method=\"unbordered\", rotation=False, method='detect')\r\n```\r\n\r\n## Extraction\r\nFor bordered table detection and extraction and if rotation not required:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", type=\"bordered\", rotation=False, method='extract')\r\n```\r\n\r\nFor semibordered table detection and extraction and if rotation not required:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", method=\"semibordered\", rotation=False, method='extract')\r\n```\r\n\r\nFor unbordered table detection and extraction and if rotation not required:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", method=\"unbordered\", rotation=False, method='extract')\r\n```\r\nIf no method is mentioned in that case it will check for all the methods and will provide the result accordingly. Also if rotation required make the rotation = True.\r\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "End-to-End table structure detector",
    "version": "1.0.2",
    "project_urls": {
        "Download": "https://github.com/rajban94/TableDetector.git",
        "Homepage": "https://github.com/rajban94/TableDetector"
    },
    "split_keywords": [
        "table",
        "detector"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7761975ab43474d7218ce021c3d25dabc67233dbd5526cc08b0cbc3fbbb51cb8",
                "md5": "4b75ed5daa502d951202b6924d2ebf63",
                "sha256": "3da9f2cdbeab563de740d9ee17ad2a98e26fc01cfe195bd376f1dd9b66728b9a"
            },
            "downloads": -1,
            "filename": "tabledetector-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4b75ed5daa502d951202b6924d2ebf63",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 19541,
            "upload_time": "2024-07-04T08:20:54",
            "upload_time_iso_8601": "2024-07-04T08:20:54.411351Z",
            "url": "https://files.pythonhosted.org/packages/77/61/975ab43474d7218ce021c3d25dabc67233dbd5526cc08b0cbc3fbbb51cb8/tabledetector-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "736436e34094bf2f5b2dba3dcaa1f5eb8076079842583ee210563a05f2ea329a",
                "md5": "ffc292d4349722d6fcceebb6683f80c1",
                "sha256": "d4d9b400a0c1e86668c57b4d262189b83f705124349abe47505f775534a0e102"
            },
            "downloads": -1,
            "filename": "tabledetector-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "ffc292d4349722d6fcceebb6683f80c1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 11171,
            "upload_time": "2024-07-04T08:20:56",
            "upload_time_iso_8601": "2024-07-04T08:20:56.461085Z",
            "url": "https://files.pythonhosted.org/packages/73/64/36e34094bf2f5b2dba3dcaa1f5eb8076079842583ee210563a05f2ea329a/tabledetector-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-04 08:20:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rajban94",
    "github_project": "TableDetector",
    "github_not_found": true,
    "lcname": "tabledetector"
}
        
Elapsed time: 4.52412s