# dataprocess
**dataprocess** é um pacote Python que oferece utilitários simples e eficientes para o processamento e a limpeza de dados.
## Recursos
- **Processamento de dados**: Transforme dados utilizando funções dedicadas.
- **Limpeza de dados**: Remova valores nulos e prepare dados para análise.
- Estrutura modular para fácil extensão.
## Instalação
Instale o pacote diretamente do repositório GitHub:
```bash
pip install etl-dataprocess
```
ou
``` bash
git clone https://github.com/botlorien/dataprocess.git
cd dataprocess
pip install .
```
## Exemplo de uso
```python
from dataprocess import dataprocessing as hd
if __name__ == '__main__':
def process_something_here():
"""Only a single example to use dataprocess"""
# handle importation files verifying if .xlsx, .csv, .xls, .json, .txt
# and returning its content as 'DataFrame' to (.xlsx, .csv, .xls), 'dict' to (.json) and 'str' to .txt
# if only the directory folder was passed as argument it get the first file in that folder
table = hd.import_file(PATH_DOWNLOADS)
# clear all table removing white spaces and another trashes
# and return a 'DataFrame' with all columns astype('str')
table = hd.clear_table(table)
# Now after the cleaning convert the columns to the apropriate types
# it accepts a mapping argument "dtypes" to list columns to be cast to
# 'datetime' and 'time'. Another common types as 'int', 'float' and 'str' are
# handled automatically analysing its values.
dtype = {
'datetime':[
'date_name_column' # replace it with the name of the column to be cast do 'datetime'
],
'time':[
'hour_and_minute_name_column' # replace it with the name of the column to be cast do 'time'
]
}
table = hd.convert_table_types(
table,
dtypes=dtype
)
print(table)
print(table.info())
return table
process_something_here()
```
Raw data
{
"_id": null,
"home_page": "https://github.com/botlorien/dataprocess",
"name": "etl-dataprocess",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": "Ben-Hur P. B. Santos",
"author_email": "botlorien@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e3/2e/439f82b32ba20cc3b0d34a64504c2d8674dd13a42cc00ba982f92d0d72f2/etl_dataprocess-0.1.2.tar.gz",
"platform": null,
"description": "# dataprocess\r\n\r\n**dataprocess** \u00e9 um pacote Python que oferece utilit\u00e1rios simples e eficientes para o processamento e a limpeza de dados.\r\n\r\n## Recursos\r\n\r\n- **Processamento de dados**: Transforme dados utilizando fun\u00e7\u00f5es dedicadas.\r\n- **Limpeza de dados**: Remova valores nulos e prepare dados para an\u00e1lise.\r\n- Estrutura modular para f\u00e1cil extens\u00e3o.\r\n\r\n## Instala\u00e7\u00e3o\r\n\r\nInstale o pacote diretamente do reposit\u00f3rio GitHub:\r\n\r\n```bash\r\npip install etl-dataprocess\r\n```\r\nou\r\n``` bash\r\ngit clone https://github.com/botlorien/dataprocess.git\r\ncd dataprocess\r\npip install .\r\n```\r\n\r\n## Exemplo de uso\r\n\r\n```python\r\nfrom dataprocess import dataprocessing as hd\r\n\r\n\r\nif __name__ == '__main__':\r\n\r\n def process_something_here():\r\n \"\"\"Only a single example to use dataprocess\"\"\"\r\n # handle importation files verifying if .xlsx, .csv, .xls, .json, .txt\r\n # and returning its content as 'DataFrame' to (.xlsx, .csv, .xls), 'dict' to (.json) and 'str' to .txt\r\n # if only the directory folder was passed as argument it get the first file in that folder\r\n table = hd.import_file(PATH_DOWNLOADS)\r\n\r\n # clear all table removing white spaces and another trashes\r\n # and return a 'DataFrame' with all columns astype('str')\r\n table = hd.clear_table(table)\r\n\r\n # Now after the cleaning convert the columns to the apropriate types\r\n # it accepts a mapping argument \"dtypes\" to list columns to be cast to\r\n # 'datetime' and 'time'. Another common types as 'int', 'float' and 'str' are\r\n # handled automatically analysing its values.\r\n dtype = {\r\n 'datetime':[\r\n 'date_name_column' # replace it with the name of the column to be cast do 'datetime'\r\n ],\r\n 'time':[\r\n 'hour_and_minute_name_column' # replace it with the name of the column to be cast do 'time'\r\n ]\r\n }\r\n table = hd.convert_table_types(\r\n table,\r\n dtypes=dtype\r\n )\r\n print(table)\r\n print(table.info())\r\n return table\r\n\r\n process_something_here()\r\n```\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Functions utils to perform data processing",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/botlorien/dataprocess"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e32e439f82b32ba20cc3b0d34a64504c2d8674dd13a42cc00ba982f92d0d72f2",
"md5": "d7a827126e3c05bef19c2a707ac950da",
"sha256": "1b7925e50d64fbe25b0d43be5eaf9667bab181a294ff8df682ed032d2e9f9809"
},
"downloads": -1,
"filename": "etl_dataprocess-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "d7a827126e3c05bef19c2a707ac950da",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 16708,
"upload_time": "2024-12-12T16:35:04",
"upload_time_iso_8601": "2024-12-12T16:35:04.153933Z",
"url": "https://files.pythonhosted.org/packages/e3/2e/439f82b32ba20cc3b0d34a64504c2d8674dd13a42cc00ba982f92d0d72f2/etl_dataprocess-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-12 16:35:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "botlorien",
"github_project": "dataprocess",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "etl-dataprocess"
}