sandyie-read


Namesandyie-read JSON
Version 0.4.7 PyPI version JSON
download
home_pagehttps://github.com/SanjayDK3669/sandyie_read
SummaryA lightweight Python library to read various data formats including PDF, images, YAML, and more.
upload_time2025-07-26 13:38:31
maintainerNone
docs_urlNone
authorSanju (Sandyie)
requires_python>=3.7
licenseMIT
keywords
VCS
bugtrack_url
requirements pandas openpyxl xlrd PyPDF2 js2py pytest anyio python-dotenv PyMuPDF opencv-python
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Great! Here's your updated `README.md` with:

* โœ… Shields.io badges
* ๐Ÿ”— PyPI link
* ๐Ÿ“„ Auto-generated Docs section
* ๐Ÿ“ฌ Contribution and License section

---

````markdown
# Sandyie Read ๐Ÿ“š

[![PyPI version](https://img.shields.io/pypi/v/sandyie_read?color=blue)](https://pypi.org/project/sandyie-read/)
[![Downloads](https://img.shields.io/pypi/dm/sandyie_read)](https://pypi.org/project/sandyie-read/)
[![License](https://img.shields.io/github/license/sandyie/sandyie-read)](LICENSE)
[![Python](https://img.shields.io/badge/Python-3.7%2B-blue.svg)](https://www.python.org/downloads/)

**Sandyie Read** is a lightweight Python library that helps you effortlessly read and extract data from a variety of file formats including PDF, images (JPG, PNG), YAML, and more โ€” all with clean logging and custom exception handling.

---

## ๐Ÿ”ง Features

- โœ… Read and extract content from:
  - PDF (text-based and scanned with OCR)
  - Image files (JPG, PNG)
  - YAML files
  - Text files
  - CSV, Excel (if supported)
- ๐Ÿง  OCR support for scanned documents using Tesseract
- ๐Ÿ“‹ Clean, human-readable logging
- ๐Ÿ›ก๏ธ Custom exception handling (via `SandyieException`)

---

## ๐Ÿ“ฆ Installation

```bash
pip install sandyie_read
````

---

## ๐Ÿš€ Quick Start

```python
from sandyie_read import read

data = read("example.pdf")
print(data)
```

---

## ๐Ÿ“ Supported File Types & Examples

### 1. ๐Ÿ“„ PDF (Text-based or Scanned)

```python
data = read("sample.pdf")
print(data)
```

๐ŸŸข **Returns:**
A `string` containing all extracted text. OCR is auto-applied to scanned PDFs.

---

### 2. ๐Ÿ–ผ๏ธ Image Files (PNG, JPG)

```python
data = read("photo.jpg")
print(data)
```

๐ŸŸข **Returns:**
A `string` of extracted text using OCR (via Tesseract).

---

### 3. โš™๏ธ YAML Files

```python
data = read("config.yaml")
print(data)
```

๐ŸŸข **Returns:**
A `dictionary` representing the parsed YAML structure.

---

### 4. ๐Ÿ“„ Text Files (.txt)

```python
data = read("notes.txt")
print(data)
```

๐ŸŸข **Returns:**
A `string` containing the full content of the file.

---

### 5. ๐Ÿ“Š CSV Files

```python
data = read("data.csv")
print(data)
```

๐ŸŸข **Returns:**
A `pandas.DataFrame` of structured tabular data.

---

### 6. ๐Ÿ“ˆ Excel Files (.xlsx, .xls)

```python
data = read("report.xlsx")
print(data)
```

๐ŸŸข **Returns:**
A `pandas.DataFrame` or a dictionary of DataFrames (if multiple sheets exist).

---

## โš ๏ธ Error Handling

All exceptions are wrapped in a custom `SandyieException` class, providing clean and traceable messages.

---

## ๐Ÿงช Logging

Logs show:

* File type detection
* Successful/failed read attempts
* Detailed file handling info

---

## ๐Ÿ“š Auto-Generated Docs

Coming soon at: **[https://sandyie.in/docs](https://sandyie.in/docs)**
Will include:

* API Reference
* Exception documentation
* Usage notebooks

---

## ๐Ÿค Contribution

Found a bug or want a new feature? Feel free to [create an issue](https://github.com/sandyie/sandyie-read/issues) or submit a PR.

---

## ๐Ÿ“„ License

This project is licensed under the MIT License โ€“ see the [LICENSE](LICENSE) file for details.

---

## ๐Ÿ“ฌ Author

**Sanju (aka Sandyie)**
๐Ÿ“ง Email: [dksanjay39@gmail.com](mailto:dksanjay39@gmail.com)
๐Ÿ”— Portfolio: [https://sandyie.in](https://sandyie.in)
๐Ÿ PyPI: [https://pypi.org/project/sandyie-read](https://pypi.org/project/sandyie-read)

```

---

Let me know if you'd like:
- A `docs/` folder setup with `mkdocs` or `Sphinx`
- GitHub Actions for automated PyPI deployment
- Jupyter notebooks or Colab demos linked
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SanjayDK3669/sandyie_read",
    "name": "sandyie-read",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "Sanju (Sandyie)",
    "author_email": "dksanjay39@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/8c/ba/7015fac80d6b299852b6072bde80a38b101426c2a70499a35cbc62645293/sandyie_read-0.4.7.tar.gz",
    "platform": null,
    "description": "Great! Here's your updated `README.md` with:\r\n\r\n* \u2705 Shields.io badges\r\n* \ud83d\udd17 PyPI link\r\n* \ud83d\udcc4 Auto-generated Docs section\r\n* \ud83d\udcec Contribution and License section\r\n\r\n---\r\n\r\n````markdown\r\n# Sandyie Read \ud83d\udcda\r\n\r\n[![PyPI version](https://img.shields.io/pypi/v/sandyie_read?color=blue)](https://pypi.org/project/sandyie-read/)\r\n[![Downloads](https://img.shields.io/pypi/dm/sandyie_read)](https://pypi.org/project/sandyie-read/)\r\n[![License](https://img.shields.io/github/license/sandyie/sandyie-read)](LICENSE)\r\n[![Python](https://img.shields.io/badge/Python-3.7%2B-blue.svg)](https://www.python.org/downloads/)\r\n\r\n**Sandyie Read** is a lightweight Python library that helps you effortlessly read and extract data from a variety of file formats including PDF, images (JPG, PNG), YAML, and more \u2014 all with clean logging and custom exception handling.\r\n\r\n---\r\n\r\n## \ud83d\udd27 Features\r\n\r\n- \u2705 Read and extract content from:\r\n  - PDF (text-based and scanned with OCR)\r\n  - Image files (JPG, PNG)\r\n  - YAML files\r\n  - Text files\r\n  - CSV, Excel (if supported)\r\n- \ud83e\udde0 OCR support for scanned documents using Tesseract\r\n- \ud83d\udccb Clean, human-readable logging\r\n- \ud83d\udee1\ufe0f Custom exception handling (via `SandyieException`)\r\n\r\n---\r\n\r\n## \ud83d\udce6 Installation\r\n\r\n```bash\r\npip install sandyie_read\r\n````\r\n\r\n---\r\n\r\n## \ud83d\ude80 Quick Start\r\n\r\n```python\r\nfrom sandyie_read import read\r\n\r\ndata = read(\"example.pdf\")\r\nprint(data)\r\n```\r\n\r\n---\r\n\r\n## \ud83d\udcc1 Supported File Types & Examples\r\n\r\n### 1. \ud83d\udcc4 PDF (Text-based or Scanned)\r\n\r\n```python\r\ndata = read(\"sample.pdf\")\r\nprint(data)\r\n```\r\n\r\n\ud83d\udfe2 **Returns:**\r\nA `string` containing all extracted text. OCR is auto-applied to scanned PDFs.\r\n\r\n---\r\n\r\n### 2. \ud83d\uddbc\ufe0f Image Files (PNG, JPG)\r\n\r\n```python\r\ndata = read(\"photo.jpg\")\r\nprint(data)\r\n```\r\n\r\n\ud83d\udfe2 **Returns:**\r\nA `string` of extracted text using OCR (via Tesseract).\r\n\r\n---\r\n\r\n### 3. \u2699\ufe0f YAML Files\r\n\r\n```python\r\ndata = read(\"config.yaml\")\r\nprint(data)\r\n```\r\n\r\n\ud83d\udfe2 **Returns:**\r\nA `dictionary` representing the parsed YAML structure.\r\n\r\n---\r\n\r\n### 4. \ud83d\udcc4 Text Files (.txt)\r\n\r\n```python\r\ndata = read(\"notes.txt\")\r\nprint(data)\r\n```\r\n\r\n\ud83d\udfe2 **Returns:**\r\nA `string` containing the full content of the file.\r\n\r\n---\r\n\r\n### 5. \ud83d\udcca CSV Files\r\n\r\n```python\r\ndata = read(\"data.csv\")\r\nprint(data)\r\n```\r\n\r\n\ud83d\udfe2 **Returns:**\r\nA `pandas.DataFrame` of structured tabular data.\r\n\r\n---\r\n\r\n### 6. \ud83d\udcc8 Excel Files (.xlsx, .xls)\r\n\r\n```python\r\ndata = read(\"report.xlsx\")\r\nprint(data)\r\n```\r\n\r\n\ud83d\udfe2 **Returns:**\r\nA `pandas.DataFrame` or a dictionary of DataFrames (if multiple sheets exist).\r\n\r\n---\r\n\r\n## \u26a0\ufe0f Error Handling\r\n\r\nAll exceptions are wrapped in a custom `SandyieException` class, providing clean and traceable messages.\r\n\r\n---\r\n\r\n## \ud83e\uddea Logging\r\n\r\nLogs show:\r\n\r\n* File type detection\r\n* Successful/failed read attempts\r\n* Detailed file handling info\r\n\r\n---\r\n\r\n## \ud83d\udcda Auto-Generated Docs\r\n\r\nComing soon at: **[https://sandyie.in/docs](https://sandyie.in/docs)**\r\nWill include:\r\n\r\n* API Reference\r\n* Exception documentation\r\n* Usage notebooks\r\n\r\n---\r\n\r\n## \ud83e\udd1d Contribution\r\n\r\nFound a bug or want a new feature? Feel free to [create an issue](https://github.com/sandyie/sandyie-read/issues) or submit a PR.\r\n\r\n---\r\n\r\n## \ud83d\udcc4 License\r\n\r\nThis project is licensed under the MIT License \u2013 see the [LICENSE](LICENSE) file for details.\r\n\r\n---\r\n\r\n## \ud83d\udcec Author\r\n\r\n**Sanju (aka Sandyie)**\r\n\ud83d\udce7 Email: [dksanjay39@gmail.com](mailto:dksanjay39@gmail.com)\r\n\ud83d\udd17 Portfolio: [https://sandyie.in](https://sandyie.in)\r\n\ud83d\udc0d PyPI: [https://pypi.org/project/sandyie-read](https://pypi.org/project/sandyie-read)\r\n\r\n```\r\n\r\n---\r\n\r\nLet me know if you'd like:\r\n- A `docs/` folder setup with `mkdocs` or `Sphinx`\r\n- GitHub Actions for automated PyPI deployment\r\n- Jupyter notebooks or Colab demos linked\r\n```\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A lightweight Python library to read various data formats including PDF, images, YAML, and more.",
    "version": "0.4.7",
    "project_urls": {
        "Homepage": "https://github.com/SanjayDK3669/sandyie_read"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "aa14c3590f84430789eddab6d3d6ea40e4b96705df24ccb451669d346dfe1558",
                "md5": "71358dc32ca337a90484e3e54c6723bb",
                "sha256": "b5755aaca31100c7af372478cccbdd229a925ecd50033d0f6a3bb82f58ec1af8"
            },
            "downloads": -1,
            "filename": "sandyie_read-0.4.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "71358dc32ca337a90484e3e54c6723bb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 11074,
            "upload_time": "2025-07-26T13:38:26",
            "upload_time_iso_8601": "2025-07-26T13:38:26.526843Z",
            "url": "https://files.pythonhosted.org/packages/aa/14/c3590f84430789eddab6d3d6ea40e4b96705df24ccb451669d346dfe1558/sandyie_read-0.4.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8cba7015fac80d6b299852b6072bde80a38b101426c2a70499a35cbc62645293",
                "md5": "d3631af7aeb2f9fbacd4c686da2e342a",
                "sha256": "bcefe549008063ff7f59eeeddaf8addedf0efbc64c15d6512b258bbe580a0538"
            },
            "downloads": -1,
            "filename": "sandyie_read-0.4.7.tar.gz",
            "has_sig": false,
            "md5_digest": "d3631af7aeb2f9fbacd4c686da2e342a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 9425,
            "upload_time": "2025-07-26T13:38:31",
            "upload_time_iso_8601": "2025-07-26T13:38:31.481762Z",
            "url": "https://files.pythonhosted.org/packages/8c/ba/7015fac80d6b299852b6072bde80a38b101426c2a70499a35cbc62645293/sandyie_read-0.4.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-26 13:38:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SanjayDK3669",
    "github_project": "sandyie_read",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    ">=",
                    "3.1.2"
                ]
            ]
        },
        {
            "name": "xlrd",
            "specs": [
                [
                    ">=",
                    "2.0.1"
                ]
            ]
        },
        {
            "name": "PyPDF2",
            "specs": [
                [
                    ">=",
                    "3.0.1"
                ]
            ]
        },
        {
            "name": "js2py",
            "specs": [
                [
                    ">=",
                    "0.71"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "8.0.0"
                ]
            ]
        },
        {
            "name": "anyio",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "PyMuPDF",
            "specs": [
                [
                    ">=",
                    "1.22.0"
                ]
            ]
        },
        {
            "name": "opencv-python",
            "specs": [
                [
                    ">=",
                    "4.9.0"
                ]
            ]
        }
    ],
    "lcname": "sandyie-read"
}
        
Elapsed time: 2.30575s