# Tabledetector
[![PyPI](https://img.shields.io/pypi/v/tabledetector)](https://pypi.org/project/tabledetector/)
Tabledetector is a Python package that takes PDFs or Images as input, checks the alignment, re-aligns if required, detects the table structure, extracts data, return as pandas dataframe for further use. The current implementation focuses on bordered, semibordered and unbordered table structures.
## Features
- **PDF Input:** Accepts PDF/Image files as input for table detection.
- **Alignment Check:** Verifies and adjusts alignment of input.
- **Table Detection:** Identifies bordered, semibordered and unbordered tables in the PDF/Image File.
- **Table Extraction:** Extract the tabular data in the form of dataframe.
## Libraries Used
- Python 3.x
- OpenCV
- NumPy
- pdf2image
- Pillow
- scipy
- jinja2
- easyocr
- pandas
## Create and Activate Environment
```bash
conda create -n <env_name> python=3.7
conda activate <env_name>
```
## Installation of package using pip
```bash
pip install tabledetector
```
## Clone the repository for latest development release
```bash
git clone https://github.com/rajban94/TableDetector.git
```
## Dependency
To utilize this library on Windows, ensure that Poppler is installed and its path is added to the environment variables.
## Usage
## Detection
For bordered table detection and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", type="bordered", rotation=False, method='detect')
```
For semibordered table detection and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="semibordered", rotation=False, method='detect')
```
For unbordered table detection and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="unbordered", rotation=False, method='detect')
```
## Extraction
For bordered table detection and extraction and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", type="bordered", rotation=False, method='extract')
```
For semibordered table detection and extraction and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="semibordered", rotation=False, method='extract')
```
For unbordered table detection and extraction and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="unbordered", rotation=False, method='extract')
```
If no method is mentioned in that case it will check for all the methods and will provide the result accordingly. Also if rotation required make the rotation = True.
Raw data
{
"_id": null,
"home_page": "https://github.com/rajban94/TableDetector",
"name": "tabledetector",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "table detector",
"author": "Rishav Banerjee",
"author_email": "rishavbanerjee10.rb@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/73/64/36e34094bf2f5b2dba3dcaa1f5eb8076079842583ee210563a05f2ea329a/tabledetector-1.0.2.tar.gz",
"platform": null,
"description": "# Tabledetector\r\n\r\n[![PyPI](https://img.shields.io/pypi/v/tabledetector)](https://pypi.org/project/tabledetector/)\r\n\r\nTabledetector is a Python package that takes PDFs or Images as input, checks the alignment, re-aligns if required, detects the table structure, extracts data, return as pandas dataframe for further use. The current implementation focuses on bordered, semibordered and unbordered table structures.\r\n\r\n## Features\r\n\r\n- **PDF Input:** Accepts PDF/Image files as input for table detection.\r\n- **Alignment Check:** Verifies and adjusts alignment of input.\r\n- **Table Detection:** Identifies bordered, semibordered and unbordered tables in the PDF/Image File.\r\n- **Table Extraction:** Extract the tabular data in the form of dataframe.\r\n\r\n## Libraries Used\r\n\r\n- Python 3.x\r\n- OpenCV\r\n- NumPy\r\n- pdf2image\r\n- Pillow\r\n- scipy\r\n- jinja2\r\n- easyocr\r\n- pandas\r\n\r\n## Create and Activate Environment\r\n```bash\r\nconda create -n <env_name> python=3.7\r\nconda activate <env_name>\r\n```\r\n## Installation of package using pip\r\n\r\n```bash\r\npip install tabledetector\r\n```\r\n\r\n## Clone the repository for latest development release\r\n\r\n```bash\r\ngit clone https://github.com/rajban94/TableDetector.git\r\n```\r\n\r\n## Dependency\r\nTo utilize this library on Windows, ensure that Poppler is installed and its path is added to the environment variables.\r\n\r\n## Usage\r\n\r\n## Detection\r\nFor bordered table detection and if rotation not required:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", type=\"bordered\", rotation=False, method='detect')\r\n```\r\n\r\nFor semibordered table detection and if rotation not required:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", method=\"semibordered\", rotation=False, method='detect')\r\n```\r\n\r\nFor unbordered table detection and if rotation not required:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", method=\"unbordered\", rotation=False, method='detect')\r\n```\r\n\r\n## Extraction\r\nFor bordered table detection and extraction and if rotation not required:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", type=\"bordered\", rotation=False, method='extract')\r\n```\r\n\r\nFor semibordered table detection and extraction and if rotation not required:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", method=\"semibordered\", rotation=False, method='extract')\r\n```\r\n\r\nFor unbordered table detection and extraction and if rotation not required:\r\n```bash\r\nimport tabledetector as td\r\nresult = td.detect(pdf_path=\"pdf_path\", method=\"unbordered\", rotation=False, method='extract')\r\n```\r\nIf no method is mentioned in that case it will check for all the methods and will provide the result accordingly. Also if rotation required make the rotation = True.\r\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "End-to-End table structure detector",
"version": "1.0.2",
"project_urls": {
"Download": "https://github.com/rajban94/TableDetector.git",
"Homepage": "https://github.com/rajban94/TableDetector"
},
"split_keywords": [
"table",
"detector"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7761975ab43474d7218ce021c3d25dabc67233dbd5526cc08b0cbc3fbbb51cb8",
"md5": "4b75ed5daa502d951202b6924d2ebf63",
"sha256": "3da9f2cdbeab563de740d9ee17ad2a98e26fc01cfe195bd376f1dd9b66728b9a"
},
"downloads": -1,
"filename": "tabledetector-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4b75ed5daa502d951202b6924d2ebf63",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 19541,
"upload_time": "2024-07-04T08:20:54",
"upload_time_iso_8601": "2024-07-04T08:20:54.411351Z",
"url": "https://files.pythonhosted.org/packages/77/61/975ab43474d7218ce021c3d25dabc67233dbd5526cc08b0cbc3fbbb51cb8/tabledetector-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "736436e34094bf2f5b2dba3dcaa1f5eb8076079842583ee210563a05f2ea329a",
"md5": "ffc292d4349722d6fcceebb6683f80c1",
"sha256": "d4d9b400a0c1e86668c57b4d262189b83f705124349abe47505f775534a0e102"
},
"downloads": -1,
"filename": "tabledetector-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "ffc292d4349722d6fcceebb6683f80c1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 11171,
"upload_time": "2024-07-04T08:20:56",
"upload_time_iso_8601": "2024-07-04T08:20:56.461085Z",
"url": "https://files.pythonhosted.org/packages/73/64/36e34094bf2f5b2dba3dcaa1f5eb8076079842583ee210563a05f2ea329a/tabledetector-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-04 08:20:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "rajban94",
"github_project": "TableDetector",
"github_not_found": true,
"lcname": "tabledetector"
}