Name | sabia-utils JSON |
Version |
0.2.0
JSON |
| download |
home_page | |
Summary | Group of utilities for Sabia |
upload_time | 2023-06-07 12:51:06 |
maintainer | |
docs_url | None |
author | AI Lab Unb |
requires_python | |
license | MIT License |
keywords |
sabia
utils
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Sabia Utils
This is a collection of utilities for Sabia.
## Concat Module
This module is used to concatenate files.
### Concatenate all files from a path
* Returns a concatenated dataframe from all files in a path.
* Can save the concatenated dataframe to a file.
```python
from sabia_utils import concat
concat.concatenate_all_from_path(
path='path\\to\\files',
output_file='output\\file\\path', # optional
fine_name='file_name' # optional
)
```
### Concatenate some files from a path
* Returns a concatenated dataframe from some files in a path.
* Can save the concatenated dataframe to a file.
```python
from sabia_utils import concat
concat.concatenate_files(
path='path\\to\\files',
files=['file1', 'file2'],
output_file='output\\file\\path', # optional
fine_name='file_name' # optional
)
```
### Copy files from a path to another
* Verify if the files exist in the path before copy.
```python
from sabia_utils import group
group.copy_new_files(
PATH_IN='path\\to\\files1',
PATH_OUT='path\\to\\files2'
)
```
## Group Module
This module is used to group files.
### Process files in both paths
* Verify if the files exist in the path before process.
```python
from sabia_utils import group
group.process_existent_files(
PATH_IN='path\\to\\files1',
PATH_OUT='path\\to\\files2'
)
```
### Process all files
* apply the function of copy and process files between the paths.
```python
from sabia_utils import group
group.process_all_files(
PATH_IN='path\\to\\files1',
PATH_OUT='path\\to\\files2'
)
```
## Pre_process Module
This module is used to pre_process parquet files.
### Process all parquet files
* Define a class that inherit sabia_utils.pre_process.Processing
* Override method apply_to_df(self, df, column), defining the pre-processing to be applied
* Apply this function on a folder containing parquet files to process them.
```python
from sabia_utils.pre_process import Processing
from sabia_utils import pre_process
class MyProcessor(Processing):
def apply_to_df(self, df, column):
# Your pre-processing steps
pre_process.pre_process_parquets(
folder_path='path\\to\\folder',
colomun_to_pre_process='column_name_to_be_processed',
pre_processed_column='processed_column_name',
processor=MyProcessor()
)
```
Raw data
{
"_id": null,
"home_page": "",
"name": "sabia-utils",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "sabia utils",
"author": "AI Lab Unb",
"author_email": "ailabunb@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/30/68/1fea725967184c6f722b6316bb891fadb0248388834441991b05293ab689/sabia-utils-0.2.0.tar.gz",
"platform": null,
"description": "# Sabia Utils\n\nThis is a collection of utilities for Sabia.\n\n## Concat Module\n\nThis module is used to concatenate files.\n\n\n### Concatenate all files from a path\n\n* Returns a concatenated dataframe from all files in a path.\n* Can save the concatenated dataframe to a file.\n\n```python\nfrom sabia_utils import concat\n\nconcat.concatenate_all_from_path(\n path='path\\\\to\\\\files',\n output_file='output\\\\file\\\\path', # optional\n fine_name='file_name' # optional \n)\n```\n\n### Concatenate some files from a path\n\n* Returns a concatenated dataframe from some files in a path.\n* Can save the concatenated dataframe to a file.\n\n```python\nfrom sabia_utils import concat\n\nconcat.concatenate_files(\n path='path\\\\to\\\\files',\n files=['file1', 'file2'],\n output_file='output\\\\file\\\\path', # optional\n fine_name='file_name' # optional \n)\n```\n\n\n### Copy files from a path to another\n\n* Verify if the files exist in the path before copy.\n\n```python\nfrom sabia_utils import group\n\ngroup.copy_new_files(\n PATH_IN='path\\\\to\\\\files1',\n PATH_OUT='path\\\\to\\\\files2'\n)\n```\n\n## Group Module\n\nThis module is used to group files.\n\n### Process files in both paths\n\n* Verify if the files exist in the path before process.\n\n```python\nfrom sabia_utils import group\n\ngroup.process_existent_files(\n PATH_IN='path\\\\to\\\\files1',\n PATH_OUT='path\\\\to\\\\files2'\n)\n```\n\n### Process all files\n\n* apply the function of copy and process files between the paths.\n\n```python\n\nfrom sabia_utils import group\n\ngroup.process_all_files(\n PATH_IN='path\\\\to\\\\files1',\n PATH_OUT='path\\\\to\\\\files2'\n)\n```\n\n## Pre_process Module\n\nThis module is used to pre_process parquet files.\n\n### Process all parquet files\n\n* Define a class that inherit sabia_utils.pre_process.Processing\n* Override method apply_to_df(self, df, column), defining the pre-processing to be applied\n\n* Apply this function on a folder containing parquet files to process them.\n\n```python\nfrom sabia_utils.pre_process import Processing\nfrom sabia_utils import pre_process\n\nclass MyProcessor(Processing):\n def apply_to_df(self, df, column): \n # Your pre-processing steps\n\npre_process.pre_process_parquets(\n folder_path='path\\\\to\\\\folder',\n colomun_to_pre_process='column_name_to_be_processed',\n pre_processed_column='processed_column_name',\n processor=MyProcessor()\n)\n```\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Group of utilities for Sabia",
"version": "0.2.0",
"project_urls": null,
"split_keywords": [
"sabia",
"utils"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "aeb48df09b46837a86370213d685c7b1fe7f3c3ed3891fbabca4d1e7dec0dc14",
"md5": "6c471cb2e348633944a6e862e8532a61",
"sha256": "b09f1c3e3fffaa103fa581291fc6a8ce784b7176f0d78d50d5325db4389a909d"
},
"downloads": -1,
"filename": "sabia_utils-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6c471cb2e348633944a6e862e8532a61",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 12945,
"upload_time": "2023-06-07T12:51:04",
"upload_time_iso_8601": "2023-06-07T12:51:04.255461Z",
"url": "https://files.pythonhosted.org/packages/ae/b4/8df09b46837a86370213d685c7b1fe7f3c3ed3891fbabca4d1e7dec0dc14/sabia_utils-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "30681fea725967184c6f722b6316bb891fadb0248388834441991b05293ab689",
"md5": "713b69d0ae209906db4885e8312fa142",
"sha256": "f994cfd1ca250d0a82b4e7cd35915dfcc607da96486e5bfc2633c77f0a0b5b77"
},
"downloads": -1,
"filename": "sabia-utils-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "713b69d0ae209906db4885e8312fa142",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10053,
"upload_time": "2023-06-07T12:51:06",
"upload_time_iso_8601": "2023-06-07T12:51:06.285509Z",
"url": "https://files.pythonhosted.org/packages/30/68/1fea725967184c6f722b6316bb891fadb0248388834441991b05293ab689/sabia-utils-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-07 12:51:06",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "sabia-utils"
}