Custom XML to Dict Parser
==============================
## Table of Contents
* [Overview](#overview)
* [Library Installalion](#library-installalion)
* [Library Usage](#library-usage)
* [Config file](#config-file)
## Overview
This package allows you to parse XML files. The tool uses the `xml2dict` package to parse XML files in raw format and returns data as a python dictionary and builds on that to provide custom tailoring of what information to return from the XML file. In other words, with a configuration file, you can return specific data from the XML file in a specific format.
## Library Installalion
To install the library simply run the following command in a cmd, shell or whatever...
```bash
# It's recommended to create a virtual environment
# Windows
pip install CustomXMLParser
# Linux
pip3 install CustomXMLParser
```
## Library usage?
### Example usage
If you wish to read the XML file as is and simply convert it to a python dictionary, then do the following:
```python
from CustomXMLParser import XmlParser
xml_parser = XmlParser(parser_type='raw')
xml_file = 'path_to_xml_file'
xml_dict = xml_parser.parse(xml_file)
```
If you wish to dump a dict to XML file or a string, then do the following:
```python
from CustomXMLParser import XmlParser
xml_parser = XmlParser(parser_type='raw')
xml_file = 'path_to_xml_file'
xml_dict = xml_parser.parse(xml_file)
my_dict = manipulate(xml_dict) # Manipulate raw dict, but MUST maintain structure
out_xml_file = 'path_to_out_xml_file'
xml_parser.dump(out_xml_file, my_dict, pretty=True) # This dump dict to xml file
my_xml_string = xml_parser.dumps(data, pretty=True) # This dumps dict to a string
```
If you wish to read specific portions of the XML file and format them in a particular way, then do the following:
```python
from CustomXMLParser import XmlParser
config_file = 'path_to_config_file'
xml_parser = XmlParser(config_file=config_file, parser_type='custom')
xml_file = 'path_to_xml_file'
xml_dict = xml_parser.parse(xml_file)
```
If you wish to dump a any dict to XML file or a string, then do the following:
```python
from CustomXMLParser import XmlParser
xml_parser = XmlParser(parser_type='raw')
xml_file = 'path_to_xml_file'
xml_dict = xml_parser.parse(xml_file)
my_dict = manipulate(xml_dict)... # Manipulate raw dict, but MUST maintain structure
out_xml_file = 'path_to_out_xml_file'
xml_parser.dump(my_dict, out_xml_file, input_format='custom', root='root', pretty=True) # This dump dict to xml file
my_xml_string = xml_parser.dumps(my_dict, input_format='custom', root='root', pretty=True) # This dumps dict to a string
```
Note, the `XmlParser` class uses the following default XML attributes
```python
'''
name_key (str, optional): this is a custom/xml configuration parameter, and it is the name of primary tag. Defaults to "@name".
table_key (str, optional): this is a custom/xml configuration parameter, and it is the table identifier. Defaults to "th".
header_key (str, optional): this is a custom/xml configuration parameter, and it is the header identifier. Defaults to 'header'.
data_key (str, optional): this is a custom/xml configuration parameter, and it is the data identifier. Defaults to "rows".
header_text_key (str, optional): this is a custom/xml configuration parameter, and it is the table's key identifier. Defaults to "#text".
'''
```
You can override those attributes by passing them to the constructor of the `XmlParser` class as follows:
```python
from CustomXMLParser import XmlParser
config_file = 'path_to_config_file'
xml_parser = XmlParser(config_file=config_file, parser_type='custom', encoding='utf-8',
name_key='<desired_name_key>', table_key='<desired_table_key>', header_key='<desired_header_keyr>',
data_key='<desired_data_key>', header_text_key='<desired_header_text_key>')
xml_file = 'path_to_xml_file'
xml_dict = xml_parser.parse(xml_file)
```
## Config file
Below shows an example of configurations for custom parsing of XML.
```json
{
"TREE":{
"TABLE_A": {},
"TABLE_B": {"TABLE_C": {"KEYS": "key1,key2"}}
},
"TABLE_A":
[
"element0_tag,element0_name",
"element1_tag,element1_name"
],
"TABLE_B":
[
"element0_tag*,element1_tag*,element2_tag,element2_name"
],
"TABLE_C":
[
"element0_tag,element0_name"
]
}
```
### General Rules
- Capitalize all dictionary keys.
- \* is wildcard notation: returns data for all available elements
### Tree structure
The structure can be flat or nested. If you wish to return child data for a particular parent, then you have to include the child as value for the parent. For example, parent **TABLE_B** has child **TABLE_C**. If **TABLE_C** has a child of its own, then we add it to **TABLE_C** in the same way.
```yaml
REQUESTING_SPECIFIC_KEYS:
notice that **TABLE_C** specifies a key called `KEYS` and a value of `key1,key2`.
This configuration allows you to only return matching keys `key1` and `key2` for *TABLE_C*.
If the `KEYS` key is not specified, then all keys are returned by default.
The `KEYS` key must be unique and is not present in the XML file. If it is present,
then user can change the default key name through class attributes.
```
### Data structure
Let's make some assumptions about elements to make this example easy to follow.
- For **TABLE_A**, assume element0_tag and element1_tag map to `table`, element0_name to `info`, and element1_name to `metadata`.
- For **TABLE_B**, assume element0_tag maps to `container`, element1_tag to `node`, and element2_tag to `table`, and element2_name to `info`.
- For **TABLE_C**, assume element0_tag maps to `table` and element0_name to `images`
In the above config example, we are interested in returning data for **TABLE_A**, **TABLE_B**, AND **TABLE_C**.
For each key, a path or a list of paths (xpath) is/are required to be provided in order to retrieve data from the XML file. For example:
- **TABLE_A** has two paths ["table,info", "table,metadata"], data under `info` and `metadata` tables will be returned and stored in *TABLE_A*
- **TABLE_B** has single path ["container*,node*,table,images"], data under `info` table for all nodes and all containers will be returned and stored in *TABLE_B*.
- **TABLE_C** has single path ["table,images"], data under `images` table for all parent nodes and containers will be returned and stored in *TABLE_C*.
```yaml
NOTICE:
full path isn't required for **TABLE_C** and the *GFC* (greatest common factor) between the child **TABLE_C**,
and the parent **TABLE_B** is only required in the parent table. Since **TABLE_C** is a child of **TABLE_B**,
it falls under the same path, but **TABLE_C** breaks away at "table,images" and that's why it is the only specified path.
In other words, since **TABLE_C** is a child of **TABLE_B**, all *TABLE_B* rules carry over to *TABLE_C*.
```
----------------------------------------
Author: Hamdan, Muhammad (@mhamdan - ©)
Raw data
{
"_id": null,
"home_page": "https://github.com/mhamdan91/CustomXMLParser",
"name": "CustomXMLParser",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "python, xml, XML, parsing, mapping, dictionary, configurable, custom, formatting",
"author": "mhamdan91 (Hamdan, Muhammad)",
"author_email": "<mhamdan.dev@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/b2/e4/c32fa2cf5ebcbe7e80d920cc093f7fab72280d0b1fc4b91b734dc311ebc7/CustomXMLParser-1.1.1.tar.gz",
"platform": null,
"description": "Custom XML to Dict Parser\r\n==============================\r\n## Table of Contents\r\n\r\n * [Overview](#overview)\r\n * [Library Installalion](#library-installalion)\r\n * [Library Usage](#library-usage)\r\n * [Config file](#config-file)\r\n\r\n\r\n## Overview\r\nThis package allows you to parse XML files. The tool uses the `xml2dict` package to parse XML files in raw format and returns data as a python dictionary and builds on that to provide custom tailoring of what information to return from the XML file. In other words, with a configuration file, you can return specific data from the XML file in a specific format.\r\n\r\n## Library Installalion\r\nTo install the library simply run the following command in a cmd, shell or whatever...\r\n\r\n```bash\r\n# It's recommended to create a virtual environment\r\n\r\n# Windows\r\npip install CustomXMLParser\r\n\r\n# Linux\r\npip3 install CustomXMLParser\r\n```\r\n\r\n## Library usage?\r\n\r\n### Example usage\r\nIf you wish to read the XML file as is and simply convert it to a python dictionary, then do the following:\r\n```python\r\nfrom CustomXMLParser import XmlParser\r\n\r\nxml_parser = XmlParser(parser_type='raw')\r\nxml_file = 'path_to_xml_file'\r\nxml_dict = xml_parser.parse(xml_file)\r\n```\r\n\r\nIf you wish to dump a dict to XML file or a string, then do the following:\r\n```python\r\nfrom CustomXMLParser import XmlParser\r\nxml_parser = XmlParser(parser_type='raw')\r\nxml_file = 'path_to_xml_file'\r\nxml_dict = xml_parser.parse(xml_file)\r\nmy_dict = manipulate(xml_dict) # Manipulate raw dict, but MUST maintain structure\r\nout_xml_file = 'path_to_out_xml_file'\r\nxml_parser.dump(out_xml_file, my_dict, pretty=True) # This dump dict to xml file\r\nmy_xml_string = xml_parser.dumps(data, pretty=True) # This dumps dict to a string\r\n```\r\n\r\n\r\nIf you wish to read specific portions of the XML file and format them in a particular way, then do the following:\r\n```python\r\nfrom CustomXMLParser import XmlParser\r\n\r\nconfig_file = 'path_to_config_file'\r\nxml_parser = XmlParser(config_file=config_file, parser_type='custom')\r\nxml_file = 'path_to_xml_file'\r\nxml_dict = xml_parser.parse(xml_file)\r\n```\r\n\r\nIf you wish to dump a any dict to XML file or a string, then do the following:\r\n```python\r\nfrom CustomXMLParser import XmlParser\r\nxml_parser = XmlParser(parser_type='raw')\r\nxml_file = 'path_to_xml_file'\r\nxml_dict = xml_parser.parse(xml_file)\r\nmy_dict = manipulate(xml_dict)... # Manipulate raw dict, but MUST maintain structure\r\nout_xml_file = 'path_to_out_xml_file'\r\nxml_parser.dump(my_dict, out_xml_file, input_format='custom', root='root', pretty=True) # This dump dict to xml file\r\nmy_xml_string = xml_parser.dumps(my_dict, input_format='custom', root='root', pretty=True) # This dumps dict to a string\r\n```\r\n\r\n\r\nNote, the `XmlParser` class uses the following default XML attributes\r\n\r\n```python\r\n'''\r\nname_key (str, optional): this is a custom/xml configuration parameter, and it is the name of primary tag. Defaults to \"@name\".\r\ntable_key (str, optional): this is a custom/xml configuration parameter, and it is the table identifier. Defaults to \"th\".\r\nheader_key (str, optional): this is a custom/xml configuration parameter, and it is the header identifier. Defaults to 'header'.\r\ndata_key (str, optional): this is a custom/xml configuration parameter, and it is the data identifier. Defaults to \"rows\".\r\nheader_text_key (str, optional): this is a custom/xml configuration parameter, and it is the table's key identifier. Defaults to \"#text\".\r\n'''\r\n```\r\n\r\nYou can override those attributes by passing them to the constructor of the `XmlParser` class as follows:\r\n\r\n```python\r\nfrom CustomXMLParser import XmlParser\r\n\r\nconfig_file = 'path_to_config_file'\r\nxml_parser = XmlParser(config_file=config_file, parser_type='custom', encoding='utf-8',\r\n name_key='<desired_name_key>', table_key='<desired_table_key>', header_key='<desired_header_keyr>',\r\n data_key='<desired_data_key>', header_text_key='<desired_header_text_key>')\r\nxml_file = 'path_to_xml_file'\r\nxml_dict = xml_parser.parse(xml_file)\r\n```\r\n\r\n## Config file\r\n\r\nBelow shows an example of configurations for custom parsing of XML.\r\n\r\n```json\r\n{\r\n \"TREE\":{\r\n \"TABLE_A\": {},\r\n \"TABLE_B\": {\"TABLE_C\": {\"KEYS\": \"key1,key2\"}}\r\n },\r\n\r\n \"TABLE_A\":\r\n [\r\n \"element0_tag,element0_name\",\r\n \"element1_tag,element1_name\"\r\n ],\r\n \"TABLE_B\":\r\n [\r\n \"element0_tag*,element1_tag*,element2_tag,element2_name\"\r\n ],\r\n \"TABLE_C\":\r\n [\r\n \"element0_tag,element0_name\"\r\n ]\r\n}\r\n\r\n```\r\n\r\n### General Rules\r\n- Capitalize all dictionary keys.\r\n- \\* is wildcard notation: returns data for all available elements\r\n\r\n### Tree structure\r\nThe structure can be flat or nested. If you wish to return child data for a particular parent, then you have to include the child as value for the parent. For example, parent **TABLE_B** has child **TABLE_C**. If **TABLE_C** has a child of its own, then we add it to **TABLE_C** in the same way.\r\n\r\n```yaml\r\nREQUESTING_SPECIFIC_KEYS:\r\n notice that **TABLE_C** specifies a key called `KEYS` and a value of `key1,key2`.\r\n This configuration allows you to only return matching keys `key1` and `key2` for *TABLE_C*.\r\n If the `KEYS` key is not specified, then all keys are returned by default.\r\n The `KEYS` key must be unique and is not present in the XML file. If it is present,\r\n then user can change the default key name through class attributes.\r\n\r\n```\r\n\r\n### Data structure\r\nLet's make some assumptions about elements to make this example easy to follow.\r\n- For **TABLE_A**, assume element0_tag and element1_tag map to `table`, element0_name to `info`, and element1_name to `metadata`.\r\n- For **TABLE_B**, assume element0_tag maps to `container`, element1_tag to `node`, and element2_tag to `table`, and element2_name to `info`.\r\n- For **TABLE_C**, assume element0_tag maps to `table` and element0_name to `images`\r\n\r\nIn the above config example, we are interested in returning data for **TABLE_A**, **TABLE_B**, AND **TABLE_C**.\r\nFor each key, a path or a list of paths (xpath) is/are required to be provided in order to retrieve data from the XML file. For example:\r\n- **TABLE_A** has two paths [\"table,info\", \"table,metadata\"], data under `info` and `metadata` tables will be returned and stored in *TABLE_A*\r\n- **TABLE_B** has single path [\"container*,node*,table,images\"], data under `info` table for all nodes and all containers will be returned and stored in *TABLE_B*.\r\n- **TABLE_C** has single path [\"table,images\"], data under `images` table for all parent nodes and containers will be returned and stored in *TABLE_C*. \r\n\r\n```yaml\r\nNOTICE:\r\n full path isn't required for **TABLE_C** and the *GFC* (greatest common factor) between the child **TABLE_C**,\r\n and the parent **TABLE_B** is only required in the parent table. Since **TABLE_C** is a child of **TABLE_B**,\r\n it falls under the same path, but **TABLE_C** breaks away at \"table,images\" and that's why it is the only specified path.\r\n In other words, since **TABLE_C** is a child of **TABLE_B**, all *TABLE_B* rules carry over to *TABLE_C*. \r\n```\r\n\r\n----------------------------------------\r\nAuthor: Hamdan, Muhammad (@mhamdan - \u00c2\u00a9)\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Python Libary that allows for customized parsing of XML files using a set of configurations. Output is a dictonary. This library builds on the xml2dict library.",
"version": "1.1.1",
"project_urls": {
"Homepage": "https://github.com/mhamdan91/CustomXMLParser"
},
"split_keywords": [
"python",
" xml",
" xml",
" parsing",
" mapping",
" dictionary",
" configurable",
" custom",
" formatting"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6231bce0f53011a0b482e68245efe8ffcb38228fe1fd1964d0f2f7946f0666a4",
"md5": "0ce9d3206b5c5609ce6fb3a0a4dddc89",
"sha256": "6539a6f2800b45565732d538d6b1a0b172bf30cf4f3b26229310a1d19a662330"
},
"downloads": -1,
"filename": "CustomXMLParser-1.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0ce9d3206b5c5609ce6fb3a0a4dddc89",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 12348,
"upload_time": "2024-08-05T23:16:43",
"upload_time_iso_8601": "2024-08-05T23:16:43.258973Z",
"url": "https://files.pythonhosted.org/packages/62/31/bce0f53011a0b482e68245efe8ffcb38228fe1fd1964d0f2f7946f0666a4/CustomXMLParser-1.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b2e4c32fa2cf5ebcbe7e80d920cc093f7fab72280d0b1fc4b91b734dc311ebc7",
"md5": "a567c53a2ab8cd29925608604b764fa6",
"sha256": "a6722d1eedc03bca340c6a02edeb9819a0414549b11497e1caee68f947096890"
},
"downloads": -1,
"filename": "CustomXMLParser-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "a567c53a2ab8cd29925608604b764fa6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 8732,
"upload_time": "2024-08-05T23:16:44",
"upload_time_iso_8601": "2024-08-05T23:16:44.611784Z",
"url": "https://files.pythonhosted.org/packages/b2/e4/c32fa2cf5ebcbe7e80d920cc093f7fab72280d0b1fc4b91b734dc311ebc7/CustomXMLParser-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-05 23:16:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mhamdan91",
"github_project": "CustomXMLParser",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "xmltodict",
"specs": [
[
"==",
"0.13.0"
]
]
},
{
"name": "moecolor",
"specs": []
}
],
"lcname": "customxmlparser"
}