# osc-ingest-tools
python tools to assist with standardized data ingestion workflows
### Install from PyPi
```
pip install osc-ingest-tools
```
### Examples
```python
>>> from osc_ingest_trino import *
>>> import pandas as pd
>>> data = [['tom', 10], ['nick', 15], ['juli', 14]]
>>> df = pd.DataFrame(data, columns = ['First Name', 'Age In Years']).convert_dtypes()
>>> df
First Name Age In Years
0 tom 10
1 nick 15
2 juli 14
>>> enforce_sql_column_names(df)
first_name age_in_years
0 tom 10
1 nick 15
2 juli 14
>>> enforce_sql_column_names(df, inplace=True)
>>> df
first_name age_in_years
0 tom 10
1 nick 15
2 juli 14
>>> df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 first_name 3 non-null string
1 age_in_years 3 non-null Int64
dtypes: Int64(1), string(1)
memory usage: 179.0 bytes
>>> p = create_table_schema_pairs(df)
>>> print(p)
first_name varchar,
age_in_years bigint
>>>
```
#### Adding custom type mappings to `create_table_schema_pairs`
```python
>>> df = pd.DataFrame(data, columns = ['First Name', 'Age In Years'])
>>> enforce_sql_column_names(df, inplace=True)
>>> df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 first_name 3 non-null object
1 age_in_years 3 non-null int64
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes
>>> p = create_table_schema_pairs(df, typemap={'object':'varchar'})
>>> print(p)
first_name varchar,
age_in_years bigint
>>>
```
### Development
Patches may be contributed via pull requests to
https://github.com/os-climate/osc-ingest-tools.
All changes must pass the automated test suite, along with various static
checks.
[Black](https://black.readthedocs.io/) code style and
[isort](https://pycqa.github.io/isort/) import ordering are enforced.
Enabling automatic formatting via [pre-commit](https://pre-commit.com/) is
recommended:
```
pip install black isort pre-commit
pre-commit install
```
To ensure compliance with static check tools, developers may wish to run;
```
pip install black isort
# auto-sort imports
isort .
# auto-format code
black .
```
Code can then be tested using tox.
```
# run static checks and tests
tox
# run only tests
tox -e py3
# run only static checks
tox -e static
# run tests and produce a code coverage report
tox -e cov
```
### Releasing
To release a new version of this library, authorized developers should;
- Prepare a signed release commit updating `version` in setup.py
- Tag the commit using [Semantic Versioning](https://semver.org/spec/v2.0.0.html)
prepended with "v"
- Push the tag
E.g.,
```
git commit -sm "Release v0.3.4"
git tag v0.3.4
git push --follow-tags
```
A Github workflow will then automatically release the version to PyPI.
Raw data
{
"_id": null,
"home_page": "https://github.com/os-climate/osc-ingest-tools",
"name": "osc-ingest-tools",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "OS-Climate",
"author_email": "eje@redhat.com",
"download_url": "https://files.pythonhosted.org/packages/cc/29/4d0826d5ed9df019b6410a96b0a3b66369c552519f29d1ac2594fb2d12e3/osc-ingest-tools-0.5.2.tar.gz",
"platform": null,
"description": "# osc-ingest-tools\npython tools to assist with standardized data ingestion workflows\n\n### Install from PyPi\n\n```\npip install osc-ingest-tools\n```\n\n### Examples\n\n```python\n>>> from osc_ingest_trino import *\n\n>>> import pandas as pd\n\n>>> data = [['tom', 10], ['nick', 15], ['juli', 14]]\n\n>>> df = pd.DataFrame(data, columns = ['First Name', 'Age In Years']).convert_dtypes()\n\n>>> df\n First Name Age In Years\n0 tom 10\n1 nick 15\n2 juli 14\n\n>>> enforce_sql_column_names(df)\n first_name age_in_years\n0 tom 10\n1 nick 15\n2 juli 14\n\n>>> enforce_sql_column_names(df, inplace=True)\n\n>>> df\n first_name age_in_years\n0 tom 10\n1 nick 15\n2 juli 14\n\n>>> df.info(verbose=True)\n<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 3 entries, 0 to 2\nData columns (total 2 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 first_name 3 non-null string\n 1 age_in_years 3 non-null Int64 \ndtypes: Int64(1), string(1)\nmemory usage: 179.0 bytes\n\n>>> p = create_table_schema_pairs(df)\n\n>>> print(p)\n first_name varchar,\n age_in_years bigint\n\n>>> \n```\n\n#### Adding custom type mappings to `create_table_schema_pairs`\n```python\n>>> df = pd.DataFrame(data, columns = ['First Name', 'Age In Years'])\n\n>>> enforce_sql_column_names(df, inplace=True)\n\n>>> df.info(verbose=True)\n<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 3 entries, 0 to 2\nData columns (total 2 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 first_name 3 non-null object\n 1 age_in_years 3 non-null int64 \ndtypes: int64(1), object(1)\nmemory usage: 176.0+ bytes\n\n>>> p = create_table_schema_pairs(df, typemap={'object':'varchar'})\n\n>>> print(p)\n first_name varchar,\n age_in_years bigint\n\n>>>\n```\n\n### Development\n\nPatches may be contributed via pull requests to\nhttps://github.com/os-climate/osc-ingest-tools.\n\nAll changes must pass the automated test suite, along with various static\nchecks.\n\n[Black](https://black.readthedocs.io/) code style and\n[isort](https://pycqa.github.io/isort/) import ordering are enforced.\n\nEnabling automatic formatting via [pre-commit](https://pre-commit.com/) is\nrecommended:\n```\npip install black isort pre-commit\npre-commit install\n```\n\nTo ensure compliance with static check tools, developers may wish to run;\n```\npip install black isort\n# auto-sort imports\nisort .\n# auto-format code\nblack .\n```\n\nCode can then be tested using tox.\n```\n# run static checks and tests\ntox\n# run only tests\ntox -e py3\n# run only static checks\ntox -e static\n# run tests and produce a code coverage report\ntox -e cov\n```\n\n### Releasing\nTo release a new version of this library, authorized developers should;\n- Prepare a signed release commit updating `version` in setup.py\n- Tag the commit using [Semantic Versioning](https://semver.org/spec/v2.0.0.html)\nprepended with \"v\"\n- Push the tag\n\nE.g.,\n```\ngit commit -sm \"Release v0.3.4\"\ngit tag v0.3.4\ngit push --follow-tags\n```\n\nA Github workflow will then automatically release the version to PyPI.\n\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "python tools to assist with standardized data ingestion workflows for the OS-Climate project",
"version": "0.5.2",
"project_urls": {
"Homepage": "https://github.com/os-climate/osc-ingest-tools"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2ff175ff6aa2eb7ebc54a318bc5410db38c4e32c93d69c066f44dfa792274129",
"md5": "eaf62d9d63fbe43772d6f5f9b2f21c22",
"sha256": "6a2b2eec61b723a5a127b41a90a55445f9a11cfb5f4eee48e5875232b6ec3107"
},
"downloads": -1,
"filename": "osc_ingest_tools-0.5.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eaf62d9d63fbe43772d6f5f9b2f21c22",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 14659,
"upload_time": "2023-10-21T22:24:36",
"upload_time_iso_8601": "2023-10-21T22:24:36.289070Z",
"url": "https://files.pythonhosted.org/packages/2f/f1/75ff6aa2eb7ebc54a318bc5410db38c4e32c93d69c066f44dfa792274129/osc_ingest_tools-0.5.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cc294d0826d5ed9df019b6410a96b0a3b66369c552519f29d1ac2594fb2d12e3",
"md5": "fa93aa9bf4c802818d39f08ee18e5948",
"sha256": "bf10ade896b05a9dcf2e8f7ab0c2e4b94013ed32045e9474321b9522a8b659a3"
},
"downloads": -1,
"filename": "osc-ingest-tools-0.5.2.tar.gz",
"has_sig": false,
"md5_digest": "fa93aa9bf4c802818d39f08ee18e5948",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 12927,
"upload_time": "2023-10-21T22:24:38",
"upload_time_iso_8601": "2023-10-21T22:24:38.252015Z",
"url": "https://files.pythonhosted.org/packages/cc/29/4d0826d5ed9df019b6410a96b0a3b66369c552519f29d1ac2594fb2d12e3/osc-ingest-tools-0.5.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-21 22:24:38",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "os-climate",
"github_project": "osc-ingest-tools",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "osc-ingest-tools"
}