tabledetector


Nametabledetector JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/rajban94/TableDetector
SummaryEnd-to-End table structure detector
upload_time2024-05-04 07:22:34
maintainerNone
docs_urlNone
authorRishav Banerjee
requires_pythonNone
licenseMIT License
keywords table detector
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Tabledetector

[![PyPI](https://img.shields.io/pypi/v/tabledetector)](https://pypi.org/project/tabledetector/)

Tabledetector is a Python package that takes PDFs or Images as input, checks the alignment, re-aligns if required, detects the table structure, extracts data, return as pandas dataframe for further use. The current implementation focuses on bordered, semibordered and unbordered table structures.

## Features

- **PDF Input:** Accepts PDF/Image files as input for table detection.
- **Alignment Check:** Verifies and adjusts alignment of input.
- **Table Detection:** Identifies bordered, semibordered and unbordered tables in the PDF/Image File.
- **Table Extraction:** Extract the tabular data in the form of dataframe.

## Libraries Used

- Python 3.x
- OpenCV
- NumPy
- pdf2image
- Pillow
- scipy
- jinja2
- easyocr
- pandas

## Create and Activate Environment
```bash
conda create -n <env_name> python=3.7
conda activate <env_name>
```
## Installation of package using pip

```bash
pip install tabledetector
```

## Clone the repository for latest development release

```bash
git clone https://github.com/rajban94/TableDetector.git
```

## Dependency
To utilize this library on Windows, ensure that Poppler is installed and its path is added to the environment variables.

## Usage
For bordered table detection:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="bordered")
```

For semibordered table detection:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="semibordered")
```

For unbordered table detection:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="unbordered")
```
If no method is mentioned in that case it will check for all the methods and will provide the result accordingly.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rajban94/TableDetector",
    "name": "tabledetector",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "table detector",
    "author": "Rishav Banerjee",
    "author_email": "rishavbanerjee10.rb@gmail.com",
    "download_url": "https://github.com/rajban94/TableDetector.git",
    "platform": null,
    "description": "# Tabledetector\r\n\r\n[![PyPI](https://img.shields.io/pypi/v/tabledetector)](https://pypi.org/project/tabledetector/)\r\n\r\nTabledetector is a Python package that takes PDFs or Images as input, checks the alignment, re-aligns if required, detects the table structure, extracts data, return as pandas dataframe for further use. The current implementation focuses on bordered, semibordered and unbordered table structures.\r\n\r\n## Features\r\n\r\n- **PDF Input:** Accepts PDF/Image files as input for table detection.\r\n- **Alignment Check:** Verifies and adjusts alignment of input.\r\n- **Table Detection:** Identifies bordered, semibordered and unbordered tables in the PDF/Image File.\r\n- **Table Extraction:** Extract the tabular data in the form of dataframe.\r\n\r\n## Libraries Used\r\n\r\n- Python 3.x\r\n- OpenCV\r\n- NumPy\r\n- pdf2image\r\n- Pillow\r\n- scipy\r\n- jinja2\r\n- easyocr\r\n- pandas\r\n\r\n## Create and Activate Environment\r\n```bash\r\nconda create -n <env_name> python=3.7\r\nconda activate <env_name>\r\n```\r\n## Installation of package using pip\r\n\r\n```bash\r\npip install tabledetector\r\n```\r\n\r\n## Clone the repository for latest development release\r\n\r\n```bash\r\ngit clone https://github.com/rajban94/TableDetector.git\r\n```\r\n\r\n## Dependency\r\nTo utilize this library on Windows, ensure that Poppler is installed and its path is added to the environment variables.\r\n\r\n## Usage\r\nFor bordered table detection:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", method=\"bordered\")\r\n```\r\n\r\nFor semibordered table detection:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", method=\"semibordered\")\r\n```\r\n\r\nFor unbordered table detection:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", method=\"unbordered\")\r\n```\r\nIf no method is mentioned in that case it will check for all the methods and will provide the result accordingly.\r\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "End-to-End table structure detector",
    "version": "1.0.1",
    "project_urls": {
        "Download": "https://github.com/rajban94/TableDetector.git",
        "Homepage": "https://github.com/rajban94/TableDetector"
    },
    "split_keywords": [
        "table",
        "detector"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "db52d861ad6a82ce2bb067556f4f1ffb4287d9c1a669f54c1540bfad2f773414",
                "md5": "17a8f71b5e2861e908bb3636af9fb716",
                "sha256": "4ce6a5336725d9449af9d7680e65252a45ea1e2c705022ec6959d8667df80d88"
            },
            "downloads": -1,
            "filename": "tabledetector-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "17a8f71b5e2861e908bb3636af9fb716",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 19246,
            "upload_time": "2024-05-04T07:22:34",
            "upload_time_iso_8601": "2024-05-04T07:22:34.643347Z",
            "url": "https://files.pythonhosted.org/packages/db/52/d861ad6a82ce2bb067556f4f1ffb4287d9c1a669f54c1540bfad2f773414/tabledetector-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-04 07:22:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rajban94",
    "github_project": "TableDetector",
    "github_not_found": true,
    "lcname": "tabledetector"
}
        
Elapsed time: 0.25544s