[![Language](https://img.shields.io/pypi/pyversions/db2ixf?color=ffde57&logo=python&style=for-the-badge)](https://www.python.org/)
[![License](https://img.shields.io/pypi/l/db2ixf?color=3775A9&style=for-the-badge&logo=unlicense)](https://www.gnu.org/licenses/agpl-3.0)
[![Pipeline](https://img.shields.io/github/actions/workflow/status/ismailhammounou/db2ixf/db2ixf.yml?branch=main&style=for-the-badge&logo=gitHub-actions)](https://github.com/ismailhammounou/db2ixf/actions/workflows/db2ixf.yml)
[![Release](https://img.shields.io/github/v/release/ismailhammounou/db2ixf?display_name=tag&sort=semver&style=for-the-badge&logo=semver)](https://github.com/ismailhammounou/db2ixf/releases/latest)
[![Pypi](https://img.shields.io/pypi/v/db2ixf?color=3775A9&logo=pypi&style=for-the-badge)](https://pypi.org/project/db2ixf/)
[![Downloads](https://img.shields.io/pypi/dm/db2ixf?style=for-the-badge)](https://pypi.org/project/db2ixf/)
[![Contributors](https://img.shields.io/github/contributors/ismailhammounou/db2ixf?style=for-the-badge)](https://github.com/ismailhammounou/db2ixf/graphs/contributors)
[![Documentation](https://img.shields.io/badge/here-here?style=for-the-badge&logo=book&label=documentation&color=purple
)](https://ismailhammounou.github.io/db2ixf/)
# DB2IXF Parser
<div align="center">
<img src="https://github.com/ismailhammounou/db2ixf/blob/main/resources/images/db2ixf-logo.png?raw=true" alt="Logo" width="300" height="300">
</div>
DB2IXF parser is an open-source python package that simplifies the parsing and
processing of IBM Integration eXchange Format (IXF) files. IXF is a file format
used by IBM's DB2 database system for data import and export operations. This
package provides a streamlined solution for extracting data from IXF files and
converting it to various formats, including JSON, JSONLINE, CSV, Parquet and
Deltalake.
## Features
- **Parse IXF files**: The package allows you to parse IXF files and extract the
rows of data stored within them.
- **Convert to multiple formats**: The parsed data can be easily converted to
_JSON_, _JSONLINE_, _CSV_, _Parquet_, or _Deltalake_
format, providing flexibility for further analysis and integration with other
systems.
- **Support for file-like objects**: IXF Parser supports file-like objects as
input, enabling direct parsing of IXF data from file objects, making it
convenient for handling large datasets without the need for intermediate file
storage.
- **Minimal dependencies**: The package has few dependencies (ebcdic, pyarrow,
deltalake, chardet, typer) which are automatically installed alongside the
package.
- **CLI**: command line tool called ``db2ixf`` comes with the package. (_Does
not support Deltalake format_)
## Hypothesis
- 1O1: **One** IXF file contains **One** table.
## Getting Started
### Installation
You can install DB2 IXF Parser using pip:
```bash
pip install db2ixf
```
### Usage
Here are some examples of how to use DB2 IXF Parser:
#### CLI
Start with this:
```bash
db2ixf --help
```
Result:
```
Usage: db2ixf [OPTIONS] COMMAND [ARGS]...
A command-line tool (CLI) for parsing and converting IXF (IBM DB2
Import/Export Format) files to various formats such as JSON, JSONLINE, CSV and
Parquet. Easily parse and convert IXF files to meet your data processing needs.
+- Options -------------------------------------------------------------------+
| --version -v Show the version of the CLI. |
| --install-completion Install completion for the current shell. |
| --show-completion Show completion for the current shell, to |
| copy it or customize the installation. |
| --help Show this message and exit. |
+-----------------------------------------------------------------------------+
+- Commands ------------------------------------------------------------------+
| csv Parse ixf FILE and convert it to a csv OUTPUT. |
| json Parse ixf FILE and convert it to a json OUTPUT. |
| jsonline Parse ixf FILE and convert it to a jsonline OUTPUT. |
| parquet Parse ixf FILE and convert it to a parquet OUTPUT. |
+-----------------------------------------------------------------------------+
Made with heart :D
```
#### Parsing an IXF file
```python
# coding=utf-8
from pathlib import Path
from db2ixf import IXFParser
path = Path('path/to/IXF/file.XXX.IXF')
with open(path, mode='rb') as f:
parser = IXFParser(f)
# rows = parser.parse() # Deprecated !
rows = parser.get_row() # Python generator
for row in rows:
print(row)
with open(path, mode='rb') as f:
parser = IXFParser(f)
rows = parser.get_all_rows() # Loads into memory !
for row in rows:
print(row)
```
#### Converting to JSON
```python
# coding=utf-8
from pathlib import Path
from db2ixf import IXFParser
path = Path('path/to/IXF/file.XXX.IXF')
with open(path, mode='rb') as f:
parser = IXFParser(f)
output_path = Path('path/to/output/file.json')
with open(output_path, mode='w', encoding='utf-8') as output_file:
parser.to_json(output_file)
```
#### Converting to JSONLINE
```python
# coding=utf-8
from pathlib import Path
from db2ixf import IXFParser
path = Path('path/to/IXF/file.XXX.IXF')
with open(path, mode='rb') as f:
parser = IXFParser(f)
output_path = Path('path/to/output/file.jsonl')
with open(output_path, mode='w', encoding='utf-8') as output_file:
parser.to_jsonline(output_file)
```
#### Converting to CSV
```python
# coding=utf-8
from pathlib import Path
from db2ixf import IXFParser
path = Path('path/to/IXF/file.XXX.IXF')
with open(path, mode='rb') as f:
parser = IXFParser(f)
output_path = Path('path/to/output/file.csv')
with open(output_path, mode='w', encoding='utf-8') as output_file:
parser.to_csv(output_file)
```
#### Converting to Parquet
```python
# coding=utf-8
from pathlib import Path
from db2ixf import IXFParser
path = Path('path/to/IXF/file.XXX.IXF')
with open(path, mode='rb') as f:
parser = IXFParser(f)
output_path = Path('path/to/output/file.parquet')
with open(output_path, mode='wb') as output_file:
parser.to_parquet(output_file)
```
#### Converting to Deltalake
```python
# coding=utf-8
from pathlib import Path
from db2ixf import IXFParser
path = Path('path/to/IXF/file.XXX.IXF')
with open(path, mode='rb') as f:
parser = IXFParser(f)
output_path = 'path/to/output/'
parser.to_deltalake(output_path)
```
For a detailed story and usage, please refer to the
[documentation](https://ismailhammounou.github.io/db2ixf/).
#### Precautions
There are cases where the parsing can fail and sometimes can lead to data loss:
1. Completely corrupted ixf file: It is usually an extraction issue.
2. Partially corrupted ixf file, it contains some corrupted Rows/Lines that the
parser can not parse.
1. Parser calculates rate of corrupted rows then compares it to an accepted
rate of corrupted rows which you can set by this environment variable
`DB2IXF_ACCEPTED_CORRUPTION_RATE`(int = 1)%.
2. If the rate of corrupted rows is bigger than the accepted rate the parser
raises an exception.
3. Unsupported data type : please contact the owners/maintainers/contributors so
you can get help otherwise any PR is welcomed.
###### Case: encoding issues
````text
Parsing can lead to data loss in case the found or the detected encoding is
not able to decode some extracted fields/columns.
Parser tries to decode using:
1. The found encoding (found in the column record)
2. Other encodings like cp437
3. The detected encoding using a third party package (chardet)
4. Encodings like utf-8 and utf-32
5. Ignore errors which can lead to data loss !
Before using the package in production, try to test in debug mode so you can
detect data loss.
````
## Contributing
IXF Parser is actively seeking contributions to enhance its features and
reliability. Your participation is valuable in shaping the future of the
project.
We appreciate your feedback, bug reports, and feature requests. If you encounter
any issues or have ideas for improvement, please open an issue on the
[GitHub repository](https://github.com/ismailhammounou/db2ixf/issues).
For any questions or assistance during the contribution process, feel free to
reach out by opening an issue on the
[GitHub repository](https://github.com/ismailhammounou/db2ixf/issues).
Thank you for considering contributing to IXF Parser. Let's work together to
create a powerful and dependable tool for working with DB2's IXF files.
### Todo
- [ ] Search for contributors/maintainers/sponsors.
- [x] Add tests (Manual testing was done but need write unit tests).
- [x] Adding new collector for the floating point data type.
- [x] Adding new collectors for other ixf data types: binary ...etc.
- [ ] Improve documentation.
- [x] Add a CLI.
- [x] Improve CLI: output can be optional.
- [x] Add better ci-cd.
- [x] Improve Makefile.
- [x] ~~Support multiprocessing.~~
- [x] ~~Support archived inputs: only python not CLI ?~~
- [x] Add logging.
- [x] Add support for deltalake
- [x] Add support for pyarrow
## License
IXF Parser is released under the
[AGPL-3.0 License](https://github.com/ismailhammounou/db2ixf/blob/main/LICENSE).
## Support
If you encounter any issues or have questions about using IXF Parser, please
open an issue on the
[GitHub repository](https://github.com/ismailhammounou/db2ixf/issues). We will
do our best to address them promptly.
## Conclusion
IXF Parser offers a convenient solution for parsing and processing IBM DB2's IXF
files. With its ease of use and support for various output formats, it provides
a valuable tool for working with DB2 data. We hope that you find this package
useful in your data analysis and integration workflows.
Give it a try and let us know your feedback. Happy parsing!
Raw data
{
"_id": null,
"home_page": "",
"name": "db2ixf",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Ismail Hammounou <ismail.hammounou@gmail.com>",
"keywords": "PC,IXF,IBM,DB2,Development,Tools,Package,Parsing,Format,Data,Analysis",
"author": "",
"author_email": "Ismail Hammounou <ismail.hammounou@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ea/62/d1712a2d6dd7381139e519a880301ae8cd1a24e9cee5e326af8b2c8e7b02/db2ixf-0.16.1.tar.gz",
"platform": null,
"description": "[![Language](https://img.shields.io/pypi/pyversions/db2ixf?color=ffde57&logo=python&style=for-the-badge)](https://www.python.org/)\n[![License](https://img.shields.io/pypi/l/db2ixf?color=3775A9&style=for-the-badge&logo=unlicense)](https://www.gnu.org/licenses/agpl-3.0)\n\n[![Pipeline](https://img.shields.io/github/actions/workflow/status/ismailhammounou/db2ixf/db2ixf.yml?branch=main&style=for-the-badge&logo=gitHub-actions)](https://github.com/ismailhammounou/db2ixf/actions/workflows/db2ixf.yml)\n[![Release](https://img.shields.io/github/v/release/ismailhammounou/db2ixf?display_name=tag&sort=semver&style=for-the-badge&logo=semver)](https://github.com/ismailhammounou/db2ixf/releases/latest)\n[![Pypi](https://img.shields.io/pypi/v/db2ixf?color=3775A9&logo=pypi&style=for-the-badge)](https://pypi.org/project/db2ixf/)\n\n[![Downloads](https://img.shields.io/pypi/dm/db2ixf?style=for-the-badge)](https://pypi.org/project/db2ixf/)\n[![Contributors](https://img.shields.io/github/contributors/ismailhammounou/db2ixf?style=for-the-badge)](https://github.com/ismailhammounou/db2ixf/graphs/contributors)\n\n[![Documentation](https://img.shields.io/badge/here-here?style=for-the-badge&logo=book&label=documentation&color=purple\n)](https://ismailhammounou.github.io/db2ixf/)\n\n# DB2IXF Parser\n\n<div align=\"center\">\n <img src=\"https://github.com/ismailhammounou/db2ixf/blob/main/resources/images/db2ixf-logo.png?raw=true\" alt=\"Logo\" width=\"300\" height=\"300\">\n</div>\n\nDB2IXF parser is an open-source python package that simplifies the parsing and\nprocessing of IBM Integration eXchange Format (IXF) files. IXF is a file format\nused by IBM's DB2 database system for data import and export operations. This\npackage provides a streamlined solution for extracting data from IXF files and\nconverting it to various formats, including JSON, JSONLINE, CSV, Parquet and\nDeltalake.\n\n## Features\n\n- **Parse IXF files**: The package allows you to parse IXF files and extract the\n rows of data stored within them.\n- **Convert to multiple formats**: The parsed data can be easily converted to\n _JSON_, _JSONLINE_, _CSV_, _Parquet_, or _Deltalake_\n format, providing flexibility for further analysis and integration with other\n systems.\n- **Support for file-like objects**: IXF Parser supports file-like objects as\n input, enabling direct parsing of IXF data from file objects, making it\n convenient for handling large datasets without the need for intermediate file\n storage.\n- **Minimal dependencies**: The package has few dependencies (ebcdic, pyarrow,\n deltalake, chardet, typer) which are automatically installed alongside the\n package.\n- **CLI**: command line tool called ``db2ixf`` comes with the package. (_Does\n not support Deltalake format_)\n\n## Hypothesis\n\n- 1O1: **One** IXF file contains **One** table.\n\n## Getting Started\n\n### Installation\n\nYou can install DB2 IXF Parser using pip:\n\n```bash\npip install db2ixf\n```\n\n### Usage\n\nHere are some examples of how to use DB2 IXF Parser:\n\n#### CLI\n\nStart with this:\n\n```bash\ndb2ixf --help\n```\n\nResult:\n\n```\n Usage: db2ixf [OPTIONS] COMMAND [ARGS]...\n\n A command-line tool (CLI) for parsing and converting IXF (IBM DB2 \n Import/Export Format) files to various formats such as JSON, JSONLINE, CSV and \n Parquet. Easily parse and convert IXF files to meet your data processing needs.\n\n+- Options -------------------------------------------------------------------+\n| --version -v Show the version of the CLI. |\n| --install-completion Install completion for the current shell. |\n| --show-completion Show completion for the current shell, to |\n| copy it or customize the installation. |\n| --help Show this message and exit. |\n+-----------------------------------------------------------------------------+\n+- Commands ------------------------------------------------------------------+\n| csv Parse ixf FILE and convert it to a csv OUTPUT. |\n| json Parse ixf FILE and convert it to a json OUTPUT. |\n| jsonline Parse ixf FILE and convert it to a jsonline OUTPUT. |\n| parquet Parse ixf FILE and convert it to a parquet OUTPUT. |\n+-----------------------------------------------------------------------------+\n\n Made with heart :D\n\n```\n\n#### Parsing an IXF file\n\n```python\n# coding=utf-8\nfrom pathlib import Path\nfrom db2ixf import IXFParser\n\npath = Path('path/to/IXF/file.XXX.IXF')\nwith open(path, mode='rb') as f:\n parser = IXFParser(f)\n # rows = parser.parse() # Deprecated !\n rows = parser.get_row() # Python generator\n for row in rows:\n print(row)\n\nwith open(path, mode='rb') as f:\n parser = IXFParser(f)\n rows = parser.get_all_rows() # Loads into memory !\n for row in rows:\n print(row)\n```\n\n#### Converting to JSON\n\n```python\n# coding=utf-8\nfrom pathlib import Path\nfrom db2ixf import IXFParser\n\npath = Path('path/to/IXF/file.XXX.IXF')\nwith open(path, mode='rb') as f:\n parser = IXFParser(f)\n output_path = Path('path/to/output/file.json')\n with open(output_path, mode='w', encoding='utf-8') as output_file:\n parser.to_json(output_file)\n```\n\n#### Converting to JSONLINE\n\n```python\n# coding=utf-8\nfrom pathlib import Path\nfrom db2ixf import IXFParser\n\npath = Path('path/to/IXF/file.XXX.IXF')\nwith open(path, mode='rb') as f:\n parser = IXFParser(f)\n output_path = Path('path/to/output/file.jsonl')\n with open(output_path, mode='w', encoding='utf-8') as output_file:\n parser.to_jsonline(output_file)\n```\n\n#### Converting to CSV\n\n```python\n# coding=utf-8\nfrom pathlib import Path\nfrom db2ixf import IXFParser\n\npath = Path('path/to/IXF/file.XXX.IXF')\nwith open(path, mode='rb') as f:\n parser = IXFParser(f)\n output_path = Path('path/to/output/file.csv')\n with open(output_path, mode='w', encoding='utf-8') as output_file:\n parser.to_csv(output_file)\n```\n\n#### Converting to Parquet\n\n```python\n# coding=utf-8\nfrom pathlib import Path\nfrom db2ixf import IXFParser\n\npath = Path('path/to/IXF/file.XXX.IXF')\nwith open(path, mode='rb') as f:\n parser = IXFParser(f)\n output_path = Path('path/to/output/file.parquet')\n with open(output_path, mode='wb') as output_file:\n parser.to_parquet(output_file)\n```\n\n#### Converting to Deltalake\n\n```python\n# coding=utf-8\nfrom pathlib import Path\nfrom db2ixf import IXFParser\n\npath = Path('path/to/IXF/file.XXX.IXF')\nwith open(path, mode='rb') as f:\n parser = IXFParser(f)\n output_path = 'path/to/output/'\n parser.to_deltalake(output_path)\n```\n\nFor a detailed story and usage, please refer to the\n[documentation](https://ismailhammounou.github.io/db2ixf/).\n\n#### Precautions\n\nThere are cases where the parsing can fail and sometimes can lead to data loss:\n\n1. Completely corrupted ixf file: It is usually an extraction issue.\n2. Partially corrupted ixf file, it contains some corrupted Rows/Lines that the\n parser can not parse.\n 1. Parser calculates rate of corrupted rows then compares it to an accepted\n rate of corrupted rows which you can set by this environment variable\n `DB2IXF_ACCEPTED_CORRUPTION_RATE`(int = 1)%.\n 2. If the rate of corrupted rows is bigger than the accepted rate the parser\n raises an exception.\n3. Unsupported data type : please contact the owners/maintainers/contributors so\n you can get help otherwise any PR is welcomed.\n\n###### Case: encoding issues\n\n````text\nParsing can lead to data loss in case the found or the detected encoding is \nnot able to decode some extracted fields/columns. \n\nParser tries to decode using:\n \n 1. The found encoding (found in the column record)\n \n 2. Other encodings like cp437\n \n 3. The detected encoding using a third party package (chardet)\n \n 4. Encodings like utf-8 and utf-32\n \n 5. Ignore errors which can lead to data loss !\n\nBefore using the package in production, try to test in debug mode so you can\ndetect data loss.\n````\n\n## Contributing\n\nIXF Parser is actively seeking contributions to enhance its features and\nreliability. Your participation is valuable in shaping the future of the\nproject.\n\nWe appreciate your feedback, bug reports, and feature requests. If you encounter\nany issues or have ideas for improvement, please open an issue on the\n[GitHub repository](https://github.com/ismailhammounou/db2ixf/issues).\n\nFor any questions or assistance during the contribution process, feel free to\nreach out by opening an issue on the\n[GitHub repository](https://github.com/ismailhammounou/db2ixf/issues).\n\nThank you for considering contributing to IXF Parser. Let's work together to\ncreate a powerful and dependable tool for working with DB2's IXF files.\n\n### Todo\n\n- [ ] Search for contributors/maintainers/sponsors.\n- [x] Add tests (Manual testing was done but need write unit tests).\n- [x] Adding new collector for the floating point data type.\n- [x] Adding new collectors for other ixf data types: binary ...etc.\n- [ ] Improve documentation.\n- [x] Add a CLI.\n- [x] Improve CLI: output can be optional.\n- [x] Add better ci-cd.\n- [x] Improve Makefile.\n- [x] ~~Support multiprocessing.~~\n- [x] ~~Support archived inputs: only python not CLI ?~~\n- [x] Add logging.\n- [x] Add support for deltalake\n- [x] Add support for pyarrow\n\n## License\n\nIXF Parser is released under the\n[AGPL-3.0 License](https://github.com/ismailhammounou/db2ixf/blob/main/LICENSE).\n\n## Support\n\nIf you encounter any issues or have questions about using IXF Parser, please\nopen an issue on the\n[GitHub repository](https://github.com/ismailhammounou/db2ixf/issues). We will\ndo our best to address them promptly.\n\n## Conclusion\n\nIXF Parser offers a convenient solution for parsing and processing IBM DB2's IXF\nfiles. With its ease of use and support for various output formats, it provides\na valuable tool for working with DB2 data. We hope that you find this package\nuseful in your data analysis and integration workflows.\n\nGive it a try and let us know your feedback. Happy parsing!\n",
"bugtrack_url": null,
"license": "AGPL-3.0",
"summary": "Parsing and processing of IBM eXchange format (IXF)",
"version": "0.16.1",
"project_urls": {
"Changelog": "https://github.com/ismailhammounou/db2ixf/blob/main/CHANGELOG.md",
"Documentation": "https://ismailhammounou.github.io/db2ixf/",
"Homepage": "https://pypi.org/project/db2ixf/",
"Repository": "https://github.com/ismailhammounou/db2ixf.git"
},
"split_keywords": [
"pc",
"ixf",
"ibm",
"db2",
"development",
"tools",
"package",
"parsing",
"format",
"data",
"analysis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "13484c62b4526046a615cc50becdd2a0cda91a1e4a6aabfa466965b88faa6980",
"md5": "f1a47def24046a14e7edc64bc7509c5e",
"sha256": "726c74323f0868184d09d695047c0d357c5b550fbf299c230c774bccf36f6334"
},
"downloads": -1,
"filename": "db2ixf-0.16.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f1a47def24046a14e7edc64bc7509c5e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 35941,
"upload_time": "2024-03-12T00:07:20",
"upload_time_iso_8601": "2024-03-12T00:07:20.935398Z",
"url": "https://files.pythonhosted.org/packages/13/48/4c62b4526046a615cc50becdd2a0cda91a1e4a6aabfa466965b88faa6980/db2ixf-0.16.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ea62d1712a2d6dd7381139e519a880301ae8cd1a24e9cee5e326af8b2c8e7b02",
"md5": "162582b79c8be190bf4fb8fd066d534a",
"sha256": "495c69fe2cd9133a821f0ebcc1fedfceacefd1058f73b6bbbf65c15a392d9bdf"
},
"downloads": -1,
"filename": "db2ixf-0.16.1.tar.gz",
"has_sig": false,
"md5_digest": "162582b79c8be190bf4fb8fd066d534a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 42964,
"upload_time": "2024-03-12T00:07:23",
"upload_time_iso_8601": "2024-03-12T00:07:23.029700Z",
"url": "https://files.pythonhosted.org/packages/ea/62/d1712a2d6dd7381139e519a880301ae8cd1a24e9cee5e326af8b2c8e7b02/db2ixf-0.16.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-12 00:07:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ismailhammounou",
"github_project": "db2ixf",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "ebcdic",
"specs": []
},
{
"name": "pyarrow",
"specs": []
},
{
"name": "deltalake",
"specs": []
},
{
"name": "chardet",
"specs": []
},
{
"name": "typer",
"specs": []
}
],
"lcname": "db2ixf"
}