CustomXMLParser


NameCustomXMLParser JSON
Version 1.1.1 PyPI version JSON
download
home_pagehttps://github.com/mhamdan91/CustomXMLParser
SummaryPython Libary that allows for customized parsing of XML files using a set of configurations. Output is a dictonary. This library builds on the xml2dict library.
upload_time2024-08-05 23:16:44
maintainerNone
docs_urlNone
authormhamdan91 (Hamdan, Muhammad)
requires_pythonNone
licenseNone
keywords python xml xml parsing mapping dictionary configurable custom formatting
VCS
bugtrack_url
requirements xmltodict moecolor
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Custom XML to Dict Parser
==============================
## Table of Contents

 * [Overview](#overview)
 * [Library Installalion](#library-installalion)
 * [Library Usage](#library-usage)
 * [Config file](#config-file)


## Overview
This package allows you to parse XML files. The tool uses the `xml2dict` package to parse XML files in raw format and returns data as a python dictionary and builds on that to provide custom tailoring of what information to return from the XML file. In other words, with a configuration file, you can return specific data from the XML file in a specific format.

## Library Installalion
To install the library simply run the following command in a cmd, shell or whatever...

```bash
# It's recommended to create a virtual environment

# Windows
pip install CustomXMLParser

# Linux
pip3 install CustomXMLParser
```

## Library usage?

### Example usage
If you wish to read the XML file as is and simply convert it to a python dictionary, then do the following:
```python
from CustomXMLParser import XmlParser

xml_parser = XmlParser(parser_type='raw')
xml_file = 'path_to_xml_file'
xml_dict = xml_parser.parse(xml_file)
```

If you wish to dump a dict to XML file or a string, then do the following:
```python
from CustomXMLParser import XmlParser
xml_parser = XmlParser(parser_type='raw')
xml_file = 'path_to_xml_file'
xml_dict = xml_parser.parse(xml_file)
my_dict = manipulate(xml_dict) # Manipulate raw dict, but MUST maintain structure
out_xml_file = 'path_to_out_xml_file'
xml_parser.dump(out_xml_file, my_dict, pretty=True) # This dump dict to xml file
my_xml_string = xml_parser.dumps(data, pretty=True) # This dumps dict to a string
```


If you wish to read specific portions of the XML file and format them in a particular way, then do the following:
```python
from CustomXMLParser import XmlParser

config_file = 'path_to_config_file'
xml_parser = XmlParser(config_file=config_file, parser_type='custom')
xml_file = 'path_to_xml_file'
xml_dict = xml_parser.parse(xml_file)
```

If you wish to dump a any dict to XML file or a string, then do the following:
```python
from CustomXMLParser import XmlParser
xml_parser = XmlParser(parser_type='raw')
xml_file = 'path_to_xml_file'
xml_dict = xml_parser.parse(xml_file)
my_dict = manipulate(xml_dict)... # Manipulate raw dict, but MUST maintain structure
out_xml_file = 'path_to_out_xml_file'
xml_parser.dump(my_dict, out_xml_file, input_format='custom', root='root', pretty=True) # This dump dict to xml file
my_xml_string = xml_parser.dumps(my_dict, input_format='custom', root='root', pretty=True) # This dumps dict to a string
```


Note, the `XmlParser` class uses the following default XML attributes

```python
'''
name_key (str, optional): this is a custom/xml configuration parameter, and it is the name of primary tag. Defaults to "@name".
table_key (str, optional): this is a custom/xml configuration parameter, and it is the table identifier. Defaults to "th".
header_key (str, optional): this is a custom/xml configuration parameter, and it is the header identifier. Defaults to 'header'.
data_key (str, optional): this is a custom/xml configuration parameter, and it is the data identifier. Defaults to "rows".
header_text_key (str, optional): this is a custom/xml configuration parameter, and it is the table's key identifier. Defaults to "#text".
'''
```

You can override those attributes by passing them to the constructor of the `XmlParser` class as follows:

```python
from CustomXMLParser import XmlParser

config_file = 'path_to_config_file'
xml_parser = XmlParser(config_file=config_file, parser_type='custom', encoding='utf-8',
                       name_key='<desired_name_key>', table_key='<desired_table_key>', header_key='<desired_header_keyr>',
                       data_key='<desired_data_key>', header_text_key='<desired_header_text_key>')
xml_file = 'path_to_xml_file'
xml_dict = xml_parser.parse(xml_file)
```

## Config file

Below shows an example of configurations for custom parsing of XML.

```json
{
  "TREE":{
    "TABLE_A": {},
    "TABLE_B": {"TABLE_C": {"KEYS": "key1,key2"}}
  },

  "TABLE_A":
    [
      "element0_tag,element0_name",
      "element1_tag,element1_name"
    ],
  "TABLE_B":
    [
      "element0_tag*,element1_tag*,element2_tag,element2_name"
    ],
  "TABLE_C":
    [
      "element0_tag,element0_name"
    ]
}

```

### General Rules
- Capitalize all dictionary keys.
- \* is wildcard notation: returns data for all available elements

### Tree structure
The structure can be flat or nested. If you wish to return child data for a particular parent, then you have to include the child as value for the parent. For example, parent **TABLE_B** has child **TABLE_C**. If **TABLE_C** has a child of its own, then we add it to **TABLE_C** in the same way.

```yaml
REQUESTING_SPECIFIC_KEYS:
  notice that **TABLE_C** specifies a key called `KEYS` and a value of `key1,key2`.
  This configuration allows you to only return matching keys `key1` and `key2` for *TABLE_C*.
  If the `KEYS` key is not specified, then all keys are returned by default.
  The `KEYS` key must be unique and is not present in the XML file. If it is present,
  then user can change the default key name through class attributes.

```

### Data structure
Let's make some assumptions about elements to make this example easy to follow.
- For **TABLE_A**, assume element0_tag and element1_tag map to `table`, element0_name to `info`, and element1_name to `metadata`.
- For **TABLE_B**, assume element0_tag maps to `container`, element1_tag to `node`, and element2_tag to `table`, and element2_name to `info`.
- For **TABLE_C**, assume element0_tag maps to `table` and element0_name to `images`

In the above config example, we are interested in returning data for **TABLE_A**, **TABLE_B**, AND **TABLE_C**.
For each key, a path or a list of paths (xpath) is/are required to be provided in order to retrieve data from the XML file. For example:
- **TABLE_A** has two paths ["table,info", "table,metadata"], data under `info` and `metadata` tables will be returned and stored in *TABLE_A*
- **TABLE_B** has single path ["container*,node*,table,images"], data under `info` table for all nodes and all containers will be returned and stored in *TABLE_B*.
- **TABLE_C** has single path ["table,images"], data under `images` table for all parent nodes and containers will be returned and stored in *TABLE_C*. 

```yaml
NOTICE:
  full path isn't required for **TABLE_C** and the *GFC* (greatest common factor) between the child **TABLE_C**,
  and the parent **TABLE_B** is only required in the parent table. Since **TABLE_C** is a child of **TABLE_B**,
  it falls under the same path, but **TABLE_C** breaks away at "table,images" and that's why it is the only specified path.
  In other words, since **TABLE_C** is a child of **TABLE_B**, all *TABLE_B* rules carry over to *TABLE_C*. 
```

----------------------------------------
Author: Hamdan, Muhammad (@mhamdan - ©)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mhamdan91/CustomXMLParser",
    "name": "CustomXMLParser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "python, xml, XML, parsing, mapping, dictionary, configurable, custom, formatting",
    "author": "mhamdan91 (Hamdan, Muhammad)",
    "author_email": "<mhamdan.dev@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/b2/e4/c32fa2cf5ebcbe7e80d920cc093f7fab72280d0b1fc4b91b734dc311ebc7/CustomXMLParser-1.1.1.tar.gz",
    "platform": null,
    "description": "Custom XML to Dict Parser\r\n==============================\r\n## Table of Contents\r\n\r\n * [Overview](#overview)\r\n * [Library Installalion](#library-installalion)\r\n * [Library Usage](#library-usage)\r\n * [Config file](#config-file)\r\n\r\n\r\n## Overview\r\nThis package allows you to parse XML files. The tool uses the `xml2dict` package to parse XML files in raw format and returns data as a python dictionary and builds on that to provide custom tailoring of what information to return from the XML file. In other words, with a configuration file, you can return specific data from the XML file in a specific format.\r\n\r\n## Library Installalion\r\nTo install the library simply run the following command in a cmd, shell or whatever...\r\n\r\n```bash\r\n# It's recommended to create a virtual environment\r\n\r\n# Windows\r\npip install CustomXMLParser\r\n\r\n# Linux\r\npip3 install CustomXMLParser\r\n```\r\n\r\n## Library usage?\r\n\r\n### Example usage\r\nIf you wish to read the XML file as is and simply convert it to a python dictionary, then do the following:\r\n```python\r\nfrom CustomXMLParser import XmlParser\r\n\r\nxml_parser = XmlParser(parser_type='raw')\r\nxml_file = 'path_to_xml_file'\r\nxml_dict = xml_parser.parse(xml_file)\r\n```\r\n\r\nIf you wish to dump a dict to XML file or a string, then do the following:\r\n```python\r\nfrom CustomXMLParser import XmlParser\r\nxml_parser = XmlParser(parser_type='raw')\r\nxml_file = 'path_to_xml_file'\r\nxml_dict = xml_parser.parse(xml_file)\r\nmy_dict = manipulate(xml_dict) # Manipulate raw dict, but MUST maintain structure\r\nout_xml_file = 'path_to_out_xml_file'\r\nxml_parser.dump(out_xml_file, my_dict, pretty=True) # This dump dict to xml file\r\nmy_xml_string = xml_parser.dumps(data, pretty=True) # This dumps dict to a string\r\n```\r\n\r\n\r\nIf you wish to read specific portions of the XML file and format them in a particular way, then do the following:\r\n```python\r\nfrom CustomXMLParser import XmlParser\r\n\r\nconfig_file = 'path_to_config_file'\r\nxml_parser = XmlParser(config_file=config_file, parser_type='custom')\r\nxml_file = 'path_to_xml_file'\r\nxml_dict = xml_parser.parse(xml_file)\r\n```\r\n\r\nIf you wish to dump a any dict to XML file or a string, then do the following:\r\n```python\r\nfrom CustomXMLParser import XmlParser\r\nxml_parser = XmlParser(parser_type='raw')\r\nxml_file = 'path_to_xml_file'\r\nxml_dict = xml_parser.parse(xml_file)\r\nmy_dict = manipulate(xml_dict)... # Manipulate raw dict, but MUST maintain structure\r\nout_xml_file = 'path_to_out_xml_file'\r\nxml_parser.dump(my_dict, out_xml_file, input_format='custom', root='root', pretty=True) # This dump dict to xml file\r\nmy_xml_string = xml_parser.dumps(my_dict, input_format='custom', root='root', pretty=True) # This dumps dict to a string\r\n```\r\n\r\n\r\nNote, the `XmlParser` class uses the following default XML attributes\r\n\r\n```python\r\n'''\r\nname_key (str, optional): this is a custom/xml configuration parameter, and it is the name of primary tag. Defaults to \"@name\".\r\ntable_key (str, optional): this is a custom/xml configuration parameter, and it is the table identifier. Defaults to \"th\".\r\nheader_key (str, optional): this is a custom/xml configuration parameter, and it is the header identifier. Defaults to 'header'.\r\ndata_key (str, optional): this is a custom/xml configuration parameter, and it is the data identifier. Defaults to \"rows\".\r\nheader_text_key (str, optional): this is a custom/xml configuration parameter, and it is the table's key identifier. Defaults to \"#text\".\r\n'''\r\n```\r\n\r\nYou can override those attributes by passing them to the constructor of the `XmlParser` class as follows:\r\n\r\n```python\r\nfrom CustomXMLParser import XmlParser\r\n\r\nconfig_file = 'path_to_config_file'\r\nxml_parser = XmlParser(config_file=config_file, parser_type='custom', encoding='utf-8',\r\n                       name_key='<desired_name_key>', table_key='<desired_table_key>', header_key='<desired_header_keyr>',\r\n                       data_key='<desired_data_key>', header_text_key='<desired_header_text_key>')\r\nxml_file = 'path_to_xml_file'\r\nxml_dict = xml_parser.parse(xml_file)\r\n```\r\n\r\n## Config file\r\n\r\nBelow shows an example of configurations for custom parsing of XML.\r\n\r\n```json\r\n{\r\n  \"TREE\":{\r\n    \"TABLE_A\": {},\r\n    \"TABLE_B\": {\"TABLE_C\": {\"KEYS\": \"key1,key2\"}}\r\n  },\r\n\r\n  \"TABLE_A\":\r\n    [\r\n      \"element0_tag,element0_name\",\r\n      \"element1_tag,element1_name\"\r\n    ],\r\n  \"TABLE_B\":\r\n    [\r\n      \"element0_tag*,element1_tag*,element2_tag,element2_name\"\r\n    ],\r\n  \"TABLE_C\":\r\n    [\r\n      \"element0_tag,element0_name\"\r\n    ]\r\n}\r\n\r\n```\r\n\r\n### General Rules\r\n- Capitalize all dictionary keys.\r\n- \\* is wildcard notation: returns data for all available elements\r\n\r\n### Tree structure\r\nThe structure can be flat or nested. If you wish to return child data for a particular parent, then you have to include the child as value for the parent. For example, parent **TABLE_B** has child **TABLE_C**. If **TABLE_C** has a child of its own, then we add it to **TABLE_C** in the same way.\r\n\r\n```yaml\r\nREQUESTING_SPECIFIC_KEYS:\r\n  notice that **TABLE_C** specifies a key called `KEYS` and a value of `key1,key2`.\r\n  This configuration allows you to only return matching keys `key1` and `key2` for *TABLE_C*.\r\n  If the `KEYS` key is not specified, then all keys are returned by default.\r\n  The `KEYS` key must be unique and is not present in the XML file. If it is present,\r\n  then user can change the default key name through class attributes.\r\n\r\n```\r\n\r\n### Data structure\r\nLet's make some assumptions about elements to make this example easy to follow.\r\n- For **TABLE_A**, assume element0_tag and element1_tag map to `table`, element0_name to `info`, and element1_name to `metadata`.\r\n- For **TABLE_B**, assume element0_tag maps to `container`, element1_tag to `node`, and element2_tag to `table`, and element2_name to `info`.\r\n- For **TABLE_C**, assume element0_tag maps to `table` and element0_name to `images`\r\n\r\nIn the above config example, we are interested in returning data for **TABLE_A**, **TABLE_B**, AND **TABLE_C**.\r\nFor each key, a path or a list of paths (xpath) is/are required to be provided in order to retrieve data from the XML file. For example:\r\n- **TABLE_A** has two paths [\"table,info\", \"table,metadata\"], data under `info` and `metadata` tables will be returned and stored in *TABLE_A*\r\n- **TABLE_B** has single path [\"container*,node*,table,images\"], data under `info` table for all nodes and all containers will be returned and stored in *TABLE_B*.\r\n- **TABLE_C** has single path [\"table,images\"], data under `images` table for all parent nodes and containers will be returned and stored in *TABLE_C*. \r\n\r\n```yaml\r\nNOTICE:\r\n  full path isn't required for **TABLE_C** and the *GFC* (greatest common factor) between the child **TABLE_C**,\r\n  and the parent **TABLE_B** is only required in the parent table. Since **TABLE_C** is a child of **TABLE_B**,\r\n  it falls under the same path, but **TABLE_C** breaks away at \"table,images\" and that's why it is the only specified path.\r\n  In other words, since **TABLE_C** is a child of **TABLE_B**, all *TABLE_B* rules carry over to *TABLE_C*. \r\n```\r\n\r\n----------------------------------------\r\nAuthor: Hamdan, Muhammad (@mhamdan - \u00c2\u00a9)\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python Libary that allows for customized parsing of XML files using a set of configurations. Output is a dictonary. This library builds on the xml2dict library.",
    "version": "1.1.1",
    "project_urls": {
        "Homepage": "https://github.com/mhamdan91/CustomXMLParser"
    },
    "split_keywords": [
        "python",
        " xml",
        " xml",
        " parsing",
        " mapping",
        " dictionary",
        " configurable",
        " custom",
        " formatting"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6231bce0f53011a0b482e68245efe8ffcb38228fe1fd1964d0f2f7946f0666a4",
                "md5": "0ce9d3206b5c5609ce6fb3a0a4dddc89",
                "sha256": "6539a6f2800b45565732d538d6b1a0b172bf30cf4f3b26229310a1d19a662330"
            },
            "downloads": -1,
            "filename": "CustomXMLParser-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0ce9d3206b5c5609ce6fb3a0a4dddc89",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 12348,
            "upload_time": "2024-08-05T23:16:43",
            "upload_time_iso_8601": "2024-08-05T23:16:43.258973Z",
            "url": "https://files.pythonhosted.org/packages/62/31/bce0f53011a0b482e68245efe8ffcb38228fe1fd1964d0f2f7946f0666a4/CustomXMLParser-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b2e4c32fa2cf5ebcbe7e80d920cc093f7fab72280d0b1fc4b91b734dc311ebc7",
                "md5": "a567c53a2ab8cd29925608604b764fa6",
                "sha256": "a6722d1eedc03bca340c6a02edeb9819a0414549b11497e1caee68f947096890"
            },
            "downloads": -1,
            "filename": "CustomXMLParser-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "a567c53a2ab8cd29925608604b764fa6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 8732,
            "upload_time": "2024-08-05T23:16:44",
            "upload_time_iso_8601": "2024-08-05T23:16:44.611784Z",
            "url": "https://files.pythonhosted.org/packages/b2/e4/c32fa2cf5ebcbe7e80d920cc093f7fab72280d0b1fc4b91b734dc311ebc7/CustomXMLParser-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-05 23:16:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mhamdan91",
    "github_project": "CustomXMLParser",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "xmltodict",
            "specs": [
                [
                    "==",
                    "0.13.0"
                ]
            ]
        },
        {
            "name": "moecolor",
            "specs": []
        }
    ],
    "lcname": "customxmlparser"
}
        
Elapsed time: 0.30233s