etl-dataprocess


Nameetl-dataprocess JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/botlorien/dataprocess
SummaryFunctions utils to perform data processing
upload_time2024-12-12 16:35:04
maintainerNone
docs_urlNone
authorBen-Hur P. B. Santos
requires_python>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # dataprocess

**dataprocess** é um pacote Python que oferece utilitários simples e eficientes para o processamento e a limpeza de dados.

## Recursos

- **Processamento de dados**: Transforme dados utilizando funções dedicadas.
- **Limpeza de dados**: Remova valores nulos e prepare dados para análise.
- Estrutura modular para fácil extensão.

## Instalação

Instale o pacote diretamente do repositório GitHub:

```bash
pip install etl-dataprocess
```
ou
``` bash
git clone https://github.com/botlorien/dataprocess.git
cd dataprocess
pip install .
```

## Exemplo de uso

```python
from dataprocess import dataprocessing as hd


if __name__ == '__main__':

    def process_something_here():
        """Only a single example to use dataprocess"""
        # handle importation files verifying if .xlsx, .csv, .xls, .json, .txt
        # and returning its content as 'DataFrame' to (.xlsx, .csv, .xls), 'dict' to (.json) and 'str' to .txt
        # if only the directory folder was passed as argument it get the first file in that folder
        table = hd.import_file(PATH_DOWNLOADS)

        # clear all table removing white spaces and another trashes
        # and return a 'DataFrame' with all columns astype('str')
        table = hd.clear_table(table)

        # Now after the cleaning convert the columns to the apropriate types
        # it accepts a mapping argument "dtypes" to list columns to be cast to
        # 'datetime' and 'time'. Another common types as 'int', 'float' and 'str' are
        # handled automatically analysing its values.
        dtype = {
            'datetime':[
                'date_name_column' # replace it with the name of the column to be cast do 'datetime'
            ],
            'time':[
                'hour_and_minute_name_column' # replace it with the name of the column to be cast do 'time'
            ]
        }
        table = hd.convert_table_types(
            table,
            dtypes=dtype
        )
        print(table)
        print(table.info())
        return table

    process_something_here()
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/botlorien/dataprocess",
    "name": "etl-dataprocess",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Ben-Hur P. B. Santos",
    "author_email": "botlorien@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e3/2e/439f82b32ba20cc3b0d34a64504c2d8674dd13a42cc00ba982f92d0d72f2/etl_dataprocess-0.1.2.tar.gz",
    "platform": null,
    "description": "# dataprocess\r\n\r\n**dataprocess** \u00e9 um pacote Python que oferece utilit\u00e1rios simples e eficientes para o processamento e a limpeza de dados.\r\n\r\n## Recursos\r\n\r\n- **Processamento de dados**: Transforme dados utilizando fun\u00e7\u00f5es dedicadas.\r\n- **Limpeza de dados**: Remova valores nulos e prepare dados para an\u00e1lise.\r\n- Estrutura modular para f\u00e1cil extens\u00e3o.\r\n\r\n## Instala\u00e7\u00e3o\r\n\r\nInstale o pacote diretamente do reposit\u00f3rio GitHub:\r\n\r\n```bash\r\npip install etl-dataprocess\r\n```\r\nou\r\n``` bash\r\ngit clone https://github.com/botlorien/dataprocess.git\r\ncd dataprocess\r\npip install .\r\n```\r\n\r\n## Exemplo de uso\r\n\r\n```python\r\nfrom dataprocess import dataprocessing as hd\r\n\r\n\r\nif __name__ == '__main__':\r\n\r\n    def process_something_here():\r\n        \"\"\"Only a single example to use dataprocess\"\"\"\r\n        # handle importation files verifying if .xlsx, .csv, .xls, .json, .txt\r\n        # and returning its content as 'DataFrame' to (.xlsx, .csv, .xls), 'dict' to (.json) and 'str' to .txt\r\n        # if only the directory folder was passed as argument it get the first file in that folder\r\n        table = hd.import_file(PATH_DOWNLOADS)\r\n\r\n        # clear all table removing white spaces and another trashes\r\n        # and return a 'DataFrame' with all columns astype('str')\r\n        table = hd.clear_table(table)\r\n\r\n        # Now after the cleaning convert the columns to the apropriate types\r\n        # it accepts a mapping argument \"dtypes\" to list columns to be cast to\r\n        # 'datetime' and 'time'. Another common types as 'int', 'float' and 'str' are\r\n        # handled automatically analysing its values.\r\n        dtype = {\r\n            'datetime':[\r\n                'date_name_column' # replace it with the name of the column to be cast do 'datetime'\r\n            ],\r\n            'time':[\r\n                'hour_and_minute_name_column' # replace it with the name of the column to be cast do 'time'\r\n            ]\r\n        }\r\n        table = hd.convert_table_types(\r\n            table,\r\n            dtypes=dtype\r\n        )\r\n        print(table)\r\n        print(table.info())\r\n        return table\r\n\r\n    process_something_here()\r\n```\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Functions utils to perform data processing",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/botlorien/dataprocess"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e32e439f82b32ba20cc3b0d34a64504c2d8674dd13a42cc00ba982f92d0d72f2",
                "md5": "d7a827126e3c05bef19c2a707ac950da",
                "sha256": "1b7925e50d64fbe25b0d43be5eaf9667bab181a294ff8df682ed032d2e9f9809"
            },
            "downloads": -1,
            "filename": "etl_dataprocess-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "d7a827126e3c05bef19c2a707ac950da",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 16708,
            "upload_time": "2024-12-12T16:35:04",
            "upload_time_iso_8601": "2024-12-12T16:35:04.153933Z",
            "url": "https://files.pythonhosted.org/packages/e3/2e/439f82b32ba20cc3b0d34a64504c2d8674dd13a42cc00ba982f92d0d72f2/etl_dataprocess-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-12 16:35:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "botlorien",
    "github_project": "dataprocess",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "etl-dataprocess"
}
        
Elapsed time: 0.33889s