data-harvest-reader


Namedata-harvest-reader JSON
Version 0.0.11 PyPI version JSON
download
home_pagehttps://github.com/Jeferson-Peter/data-harvest-reader
SummaryA class to handle and process multiple files with identical structures within a directory.
upload_time2024-12-05 23:26:24
maintainerNone
docs_urlNone
authorJeferson-Peter (Jeferson Peter)
requires_pythonNone
licenseNone
keywords python file reading multiple file handler
VCS
bugtrack_url
requirements colorama loguru polars
Travis-CI No Travis.
coveralls test coverage No coveralls.
            


# data-harvest-reader



## Features



1. **Reading Various File Formats**: Suporta leitura de arquivos CSV, JSON, Parquet e Excel.

2. **Directory and ZIP File Handling**: Capacidade de ler dados de diretórios e arquivos ZIP, além de bytes e objetos `zipfile.ZipFile`.

3. **Data Joining**: União de DataFrames que possuem colunas semelhantes.

4. **Deduplication**: Remoção de duplicatas com base em colunas específicas.

5. **Custom Filters**: Aplicação de filtros personalizados aos DataFrames.

6. **Logging**: Registro detalhado das operações de leitura e manipulação de dados.



## Installation Requirements



```bash

pip install polars loguru

```



## Usage



### Initialization



```python

from data_harvest_reader import DataReader



data_reader = DataReader(log_to_file=True, log_file="data_reader.log")

```



### Reading Data



#### From Directory



```python

data = data_reader.read_data('path/to/directory', join_similar=True)

```



#### From ZIP File



```python

data = data_reader.read_data('path/to/zipfile.zip', join_similar=False)

```



#### From Bytes



```python

with open('path/to/zipfile.zip', 'rb') as f:

    zip_bytes = f.read()

data = data_reader.read_data(zip_bytes, join_similar=False)

```



#### From \`zipfile.ZipFile\` Object



```python

with zipfile.ZipFile('path/to/zipfile.zip', 'r') as zip_file:

    data = data_reader.read_data(zip_file, join_similar=False)

```



### Applying Deduplication



```python

duplicated_subset_dict = {'file1': ['column1', 'column2']}

data = data_reader.read_data('path/to/source', duplicated_subset_dict=duplicated_subset_dict)

```



### Applying Filters



```python

filter_subset = {

    'file1': [{'column': 'Col1', 'operation': '>', 'values': 100},

              {'column': 'Col2', 'operation': '==', 'values': 'Value'}]

}



data = data_reader.read_data('path/to/source', filter_subset=filter_subset)

```



### Handling Exceptions



```python

try:

    data = data_reader.read_data('path/to/source')

except UnsupportedFormatError:

    print("Unsupported file format provided")

except FilterConfigurationError:

    print("Error in filter configuration")

```



## Example



```python

data_reader = DataReader()



data = data_reader.read_data(r'C:\path	o\data', join_similar=True,

                             filter_subset={'example_file': [{'column': 'Age', 'operation': '>', 'values': 30}]})

```



## Contributing to DataReader



### Getting Started



1. **Fork the Repository**: Start by forking the main repository. This creates your own copy of the project where you can make changes.

2. **Clone the Forked Repository**: Clone your fork to your local machine. This step allows you to work on the codebase directly.

3. **Set Up the Development Environment**: Ensure you have all necessary dependencies installed. It's recommended to use a virtual environment.

4. **Create a New Branch**: Always create a new branch for your changes. This keeps the main branch stable and makes reviewing changes easier.



### Making Contributions



1. **Make Your Changes**: Implement your feature, fix a bug, or make your proposed changes. Ensure your code adheres to the project's coding standards and guidelines.

2. **Test Your Changes**: Before submitting, test your changes thoroughly. Write unit tests if applicable, and ensure all existing tests pass.

3. **Document Your Changes**: Update the documentation to reflect your changes. If you're adding a new feature, include usage examples.

4. **Commit Your Changes**: Make concise and clear commit messages, describing what each commit does.

5. **Push to Your Fork**: Push your changes to your fork on GitHub.

6. **Create a Pull Request (PR)**: Go to the original \`DataReader\` repository and create a pull request from your fork. Ensure you describe your changes in detail and link any relevant issues.



### Review Process



After submitting your PR, the maintainers will review your changes. Be responsive to feedback:



1. **Respond to Comments**: If the reviewers ask for changes, make them promptly. Discuss any suggestions or concerns.

2. **Update Your PR**: If needed, update your PR based on feedback. This may involve adding more tests or tweaking your approach.



### Final Steps



Once your PR is approved:



1. **Merge**: The maintainers will merge your changes into the main codebase.

2. **Stay Engaged**: Continue to stay involved in the project. Look out for feedback from users on your new feature or fix.



## Conclusion



Contributing to \`DataReader\` is a rewarding experience that benefits the entire user community. Your contributions help make \`DataReader\` a more robust and versatile tool. We welcome developers of all skill levels and appreciate every form of contribution, from code to documentation. Thank you for considering contributing to \`DataReader\`!


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Jeferson-Peter/data-harvest-reader",
    "name": "data-harvest-reader",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "Python, File Reading, Multiple File Handler",
    "author": "Jeferson-Peter (Jeferson Peter)",
    "author_email": "jeferson.peter@pm.me",
    "download_url": "https://files.pythonhosted.org/packages/d3/7c/d3d1e93f6be02107033e056fa2233a5cdaee094bddc39d3e18c7fcf6f77e/data_harvest_reader-0.0.11.tar.gz",
    "platform": null,
    "description": "\r\n\r\r\n# data-harvest-reader\r\r\n\r\r\n## Features\r\r\n\r\r\n1. **Reading Various File Formats**: Suporta leitura de arquivos CSV, JSON, Parquet e Excel.\r\r\n2. **Directory and ZIP File Handling**: Capacidade de ler dados de diret\u00f3rios e arquivos ZIP, al\u00e9m de bytes e objetos `zipfile.ZipFile`.\r\r\n3. **Data Joining**: Uni\u00e3o de DataFrames que possuem colunas semelhantes.\r\r\n4. **Deduplication**: Remo\u00e7\u00e3o de duplicatas com base em colunas espec\u00edficas.\r\r\n5. **Custom Filters**: Aplica\u00e7\u00e3o de filtros personalizados aos DataFrames.\r\r\n6. **Logging**: Registro detalhado das opera\u00e7\u00f5es de leitura e manipula\u00e7\u00e3o de dados.\r\r\n\r\r\n## Installation Requirements\r\r\n\r\r\n```bash\r\r\npip install polars loguru\r\r\n```\r\r\n\r\r\n## Usage\r\r\n\r\r\n### Initialization\r\r\n\r\r\n```python\r\r\nfrom data_harvest_reader import DataReader\r\r\n\r\r\ndata_reader = DataReader(log_to_file=True, log_file=\"data_reader.log\")\r\r\n```\r\r\n\r\r\n### Reading Data\r\r\n\r\r\n#### From Directory\r\r\n\r\r\n```python\r\r\ndata = data_reader.read_data('path/to/directory', join_similar=True)\r\r\n```\r\r\n\r\r\n#### From ZIP File\r\r\n\r\r\n```python\r\r\ndata = data_reader.read_data('path/to/zipfile.zip', join_similar=False)\r\r\n```\r\r\n\r\r\n#### From Bytes\r\r\n\r\r\n```python\r\r\nwith open('path/to/zipfile.zip', 'rb') as f:\r\r\n    zip_bytes = f.read()\r\r\ndata = data_reader.read_data(zip_bytes, join_similar=False)\r\r\n```\r\r\n\r\r\n#### From \\`zipfile.ZipFile\\` Object\r\r\n\r\r\n```python\r\r\nwith zipfile.ZipFile('path/to/zipfile.zip', 'r') as zip_file:\r\r\n    data = data_reader.read_data(zip_file, join_similar=False)\r\r\n```\r\r\n\r\r\n### Applying Deduplication\r\r\n\r\r\n```python\r\r\nduplicated_subset_dict = {'file1': ['column1', 'column2']}\r\r\ndata = data_reader.read_data('path/to/source', duplicated_subset_dict=duplicated_subset_dict)\r\r\n```\r\r\n\r\r\n### Applying Filters\r\r\n\r\r\n```python\r\r\nfilter_subset = {\r\r\n    'file1': [{'column': 'Col1', 'operation': '>', 'values': 100},\r\r\n              {'column': 'Col2', 'operation': '==', 'values': 'Value'}]\r\r\n}\r\r\n\r\r\ndata = data_reader.read_data('path/to/source', filter_subset=filter_subset)\r\r\n```\r\r\n\r\r\n### Handling Exceptions\r\r\n\r\r\n```python\r\r\ntry:\r\r\n    data = data_reader.read_data('path/to/source')\r\r\nexcept UnsupportedFormatError:\r\r\n    print(\"Unsupported file format provided\")\r\r\nexcept FilterConfigurationError:\r\r\n    print(\"Error in filter configuration\")\r\r\n```\r\r\n\r\r\n## Example\r\r\n\r\r\n```python\r\r\ndata_reader = DataReader()\r\r\n\r\r\ndata = data_reader.read_data(r'C:\\path\to\\data', join_similar=True,\r\r\n                             filter_subset={'example_file': [{'column': 'Age', 'operation': '>', 'values': 30}]})\r\r\n```\r\r\n\r\r\n## Contributing to DataReader\r\r\n\r\r\n### Getting Started\r\r\n\r\r\n1. **Fork the Repository**: Start by forking the main repository. This creates your own copy of the project where you can make changes.\r\r\n2. **Clone the Forked Repository**: Clone your fork to your local machine. This step allows you to work on the codebase directly.\r\r\n3. **Set Up the Development Environment**: Ensure you have all necessary dependencies installed. It's recommended to use a virtual environment.\r\r\n4. **Create a New Branch**: Always create a new branch for your changes. This keeps the main branch stable and makes reviewing changes easier.\r\r\n\r\r\n### Making Contributions\r\r\n\r\r\n1. **Make Your Changes**: Implement your feature, fix a bug, or make your proposed changes. Ensure your code adheres to the project's coding standards and guidelines.\r\r\n2. **Test Your Changes**: Before submitting, test your changes thoroughly. Write unit tests if applicable, and ensure all existing tests pass.\r\r\n3. **Document Your Changes**: Update the documentation to reflect your changes. If you're adding a new feature, include usage examples.\r\r\n4. **Commit Your Changes**: Make concise and clear commit messages, describing what each commit does.\r\r\n5. **Push to Your Fork**: Push your changes to your fork on GitHub.\r\r\n6. **Create a Pull Request (PR)**: Go to the original \\`DataReader\\` repository and create a pull request from your fork. Ensure you describe your changes in detail and link any relevant issues.\r\r\n\r\r\n### Review Process\r\r\n\r\r\nAfter submitting your PR, the maintainers will review your changes. Be responsive to feedback:\r\r\n\r\r\n1. **Respond to Comments**: If the reviewers ask for changes, make them promptly. Discuss any suggestions or concerns.\r\r\n2. **Update Your PR**: If needed, update your PR based on feedback. This may involve adding more tests or tweaking your approach.\r\r\n\r\r\n### Final Steps\r\r\n\r\r\nOnce your PR is approved:\r\r\n\r\r\n1. **Merge**: The maintainers will merge your changes into the main codebase.\r\r\n2. **Stay Engaged**: Continue to stay involved in the project. Look out for feedback from users on your new feature or fix.\r\r\n\r\r\n## Conclusion\r\r\n\r\r\nContributing to \\`DataReader\\` is a rewarding experience that benefits the entire user community. Your contributions help make \\`DataReader\\` a more robust and versatile tool. We welcome developers of all skill levels and appreciate every form of contribution, from code to documentation. Thank you for considering contributing to \\`DataReader\\`!\r\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A class to handle and process multiple files with identical structures within a directory.",
    "version": "0.0.11",
    "project_urls": {
        "Homepage": "https://github.com/Jeferson-Peter/data-harvest-reader"
    },
    "split_keywords": [
        "python",
        " file reading",
        " multiple file handler"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "50a4927dd4ad6b8f6bdfbb9891956af05e848a2e2cd2e449414f0d40e44bf433",
                "md5": "a3ff4dd7568732cc0c242de20feae1c2",
                "sha256": "caf229956b11be051d44bcfb26fe447cbfde74a7ccd93214d74550a55450aa1c"
            },
            "downloads": -1,
            "filename": "data_harvest_reader-0.0.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a3ff4dd7568732cc0c242de20feae1c2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 7021,
            "upload_time": "2024-12-05T23:26:22",
            "upload_time_iso_8601": "2024-12-05T23:26:22.765881Z",
            "url": "https://files.pythonhosted.org/packages/50/a4/927dd4ad6b8f6bdfbb9891956af05e848a2e2cd2e449414f0d40e44bf433/data_harvest_reader-0.0.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d37cd3d1e93f6be02107033e056fa2233a5cdaee094bddc39d3e18c7fcf6f77e",
                "md5": "e7c731c8341979f29d206838c75cacaf",
                "sha256": "2abccf9ff54d85e00b4dd45bbaadd24b5ed42e2be25218a368a7baea95aa5f5e"
            },
            "downloads": -1,
            "filename": "data_harvest_reader-0.0.11.tar.gz",
            "has_sig": false,
            "md5_digest": "e7c731c8341979f29d206838c75cacaf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 6943,
            "upload_time": "2024-12-05T23:26:24",
            "upload_time_iso_8601": "2024-12-05T23:26:24.370774Z",
            "url": "https://files.pythonhosted.org/packages/d3/7c/d3d1e93f6be02107033e056fa2233a5cdaee094bddc39d3e18c7fcf6f77e/data_harvest_reader-0.0.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-05 23:26:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Jeferson-Peter",
    "github_project": "data-harvest-reader",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "colorama",
            "specs": [
                [
                    "==",
                    "0.4.6"
                ]
            ]
        },
        {
            "name": "loguru",
            "specs": [
                [
                    "==",
                    "0.7.2"
                ]
            ]
        },
        {
            "name": "polars",
            "specs": [
                [
                    "==",
                    "0.20.1"
                ]
            ]
        }
    ],
    "lcname": "data-harvest-reader"
}
        
Elapsed time: 0.45359s