xmlrecords


Namexmlrecords JSON
Version 0.3.1 PyPI version JSON
download
home_page
SummaryThis package can convert an XML files to a list of records
upload_time2023-06-15 12:17:36
maintainer
docs_urlNone
authorYaroslav Kopotilov
requires_python>=3.7
licenseApache License, Version 2.0
keywords xml parsing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # XML Records

`xmlrecords` is a user-friendly wrapper of `lxml` package for extraction of tabular data from XML files.

>>> This data provider sends all his data in... XML. You know nothing about XML, except that it looks kind of weird and you would *definitely* never use it for tabular data. How could you just transform all this XML nightmare into a sensible tabular format, like a DataFrame? Don't worry: you are in the right place!


# Installation

```shell script
pip install xmlrecords
```

The package requires `python 3.7+` and one external dependency `lxml`.

# Usage

## Basic example

Usually, you only need to specify path to table rows; optionally, you can specify paths to any extra data you'd like to add to your table:

```python
# XML object
xml_bytes = b"""\
<?xml version="1.0" encoding="utf-8"?>
<Catalog>
    <Library>
        <Name>Virtual Shore</Name>
    </Library>
    <Shelf>
        <Timestamp>2020-02-02T05:12:22</Timestamp>
        <Book>
            <Title>Sunny Night</Title>
            <Author alive="no" name="Mysterious Mark"/>
            <Year>2017</Year>
            <Price>112.34</Price>
        </Book>
        <Book>
            <Title>Babel-17</Title>
            <Author alive="yes" name="Samuel R. Delany"/>
            <Year>1963</Year>
            <Price>10</Price>
        </Book>
    </Shelf>
</Catalog>
"""

# Transform XML to records (= a list of key-value pairs)
import xmlrecords
records = xmlrecords.parse(
    xml=xml_bytes, 
    records_path=['Shelf', 'Book'],  # The rows are XML nodes with the repeating tag <Book>
    meta_paths=[['Library', 'Name'], ['Shelf', 'Timestamp']],  # Add additional "meta" nodes
)
for r in records:
    print(r)

# Output:
# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Sunny Night', 'alive': 'no', 'name': 'Mysterious Mark', 'Year': '2017', 'Price': '112.34'}
# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Babel-17', 'alive': 'yes', 'name': 'Samuel R. Delany', 'Year': '1963', 'Price': '10'}

# Validate record keys
xmlrecords.validate(
    records, 
    expected_keys=['Name', 'Timestamp', 'Title', 'alive', 'name', 'Year', 'Price'],
)
``` 

## With Pandas

You can easily transform records to a pandas DataFrame:

```python
import pandas as pd
df = pd.DataFrame(records)
```

## With SQL

You can use records directly with INSERT statements if your SQL database is [PEP 249 compliant](https://www.python.org/dev/peps/pep-0249/). Most SQL databases are.

SQLite is an exception. There, you'll have to transform records (= a list of dictionaries) into a list of lists:

```python
import sqlite3
with sqlite3.connect('maindev.db') as conn:
    c = conn.cursor()
    c.execute("""\
        CREATE TABLE BOOKS (
           LIBRARY_NAME TEXT,
           SHELF_TIMESTAMP TEXT,
           TITLE TEXT,
           AUTHOR_ALIVE TEXT,
           AUTHOR_NAME TEXT,
           YEAR INT,
           PRICE FLOAT,
           PRIMARY KEY (TITLE, AUTHOR_NAME)
        )
        """
    )
    c.executemany(
        """INSERT INTO BOOKS VALUES (?,?,?,?,?,?,?)""",
        [list(x.values()) for x in records],
    )
    conn.commit()
```


# FAQ

1. **Why not `xmltodict`?** `xmltodict` can convert arbitrary XML to a python dict. However, it is 2-3 times slower than `xmlrecords` and does not support some features specific for tablular data.

2. **Why not `xml` or `lxml`**? `xmlrecords` uses `lxml` under the hood. Using `xml` or `lxml` directly is a viable option too - in case this package doesn't cover your particular use case.



            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "xmlrecords",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "xml parsing",
    "author": "Yaroslav Kopotilov",
    "author_email": "datascience@tuta.io",
    "download_url": "https://files.pythonhosted.org/packages/a8/3b/a4c758b0f3e10e358989223bdb14b84b83b99f242c8de93c802dbccead3e/xmlrecords-0.3.1.tar.gz",
    "platform": null,
    "description": "# XML Records\n\n`xmlrecords` is a user-friendly wrapper of `lxml` package for extraction of tabular data from XML files.\n\n>>> This data provider sends all his data in... XML. You know nothing about XML, except that it looks kind of weird and you would *definitely* never use it for tabular data. How could you just transform all this XML nightmare into a sensible tabular format, like a DataFrame? Don't worry: you are in the right place!\n\n\n# Installation\n\n```shell script\npip install xmlrecords\n```\n\nThe package requires `python 3.7+` and one external dependency `lxml`.\n\n# Usage\n\n## Basic example\n\nUsually, you only need to specify path to table rows; optionally, you can specify paths to any extra data you'd like to add to your table:\n\n```python\n# XML object\nxml_bytes = b\"\"\"\\\n<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<Catalog>\n    <Library>\n        <Name>Virtual Shore</Name>\n    </Library>\n    <Shelf>\n        <Timestamp>2020-02-02T05:12:22</Timestamp>\n        <Book>\n            <Title>Sunny Night</Title>\n            <Author alive=\"no\" name=\"Mysterious Mark\"/>\n            <Year>2017</Year>\n            <Price>112.34</Price>\n        </Book>\n        <Book>\n            <Title>Babel-17</Title>\n            <Author alive=\"yes\" name=\"Samuel R. Delany\"/>\n            <Year>1963</Year>\n            <Price>10</Price>\n        </Book>\n    </Shelf>\n</Catalog>\n\"\"\"\n\n# Transform XML to records (= a list of key-value pairs)\nimport xmlrecords\nrecords = xmlrecords.parse(\n    xml=xml_bytes, \n    records_path=['Shelf', 'Book'],  # The rows are XML nodes with the repeating tag <Book>\n    meta_paths=[['Library', 'Name'], ['Shelf', 'Timestamp']],  # Add additional \"meta\" nodes\n)\nfor r in records:\n    print(r)\n\n# Output:\n# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Sunny Night', 'alive': 'no', 'name': 'Mysterious Mark', 'Year': '2017', 'Price': '112.34'}\n# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Babel-17', 'alive': 'yes', 'name': 'Samuel R. Delany', 'Year': '1963', 'Price': '10'}\n\n# Validate record keys\nxmlrecords.validate(\n    records, \n    expected_keys=['Name', 'Timestamp', 'Title', 'alive', 'name', 'Year', 'Price'],\n)\n``` \n\n## With Pandas\n\nYou can easily transform records to a pandas DataFrame:\n\n```python\nimport pandas as pd\ndf = pd.DataFrame(records)\n```\n\n## With SQL\n\nYou can use records directly with INSERT statements if your SQL database is [PEP 249 compliant](https://www.python.org/dev/peps/pep-0249/). Most SQL databases are.\n\nSQLite is an exception. There, you'll have to transform records (= a list of dictionaries) into a list of lists:\n\n```python\nimport sqlite3\nwith sqlite3.connect('maindev.db') as conn:\n    c = conn.cursor()\n    c.execute(\"\"\"\\\n        CREATE TABLE BOOKS (\n           LIBRARY_NAME TEXT,\n           SHELF_TIMESTAMP TEXT,\n           TITLE TEXT,\n           AUTHOR_ALIVE TEXT,\n           AUTHOR_NAME TEXT,\n           YEAR INT,\n           PRICE FLOAT,\n           PRIMARY KEY (TITLE, AUTHOR_NAME)\n        )\n        \"\"\"\n    )\n    c.executemany(\n        \"\"\"INSERT INTO BOOKS VALUES (?,?,?,?,?,?,?)\"\"\",\n        [list(x.values()) for x in records],\n    )\n    conn.commit()\n```\n\n\n# FAQ\n\n1. **Why not `xmltodict`?** `xmltodict` can convert arbitrary XML to a python dict. However, it is 2-3 times slower than `xmlrecords` and does not support some features specific for tablular data.\n\n2. **Why not `xml` or `lxml`**? `xmlrecords` uses `lxml` under the hood. Using `xml` or `lxml` directly is a viable option too - in case this package doesn't cover your particular use case.\n\n\n",
    "bugtrack_url": null,
    "license": "Apache License, Version 2.0",
    "summary": "This package can convert an XML files to a list of records",
    "version": "0.3.1",
    "project_urls": null,
    "split_keywords": [
        "xml",
        "parsing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1a44d3a79c7b5b6992567706e9ebaefb274525a9ab0784bc25ee20b40984c235",
                "md5": "8a29925384248bad03d13b9e3e21d69f",
                "sha256": "c5d48ff035bd076b11105372fc2aa1a00a12aa39ea7306ca73fd9a0478baaddc"
            },
            "downloads": -1,
            "filename": "xmlrecords-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8a29925384248bad03d13b9e3e21d69f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 11871,
            "upload_time": "2023-06-15T12:17:34",
            "upload_time_iso_8601": "2023-06-15T12:17:34.453874Z",
            "url": "https://files.pythonhosted.org/packages/1a/44/d3a79c7b5b6992567706e9ebaefb274525a9ab0784bc25ee20b40984c235/xmlrecords-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a83ba4c758b0f3e10e358989223bdb14b84b83b99f242c8de93c802dbccead3e",
                "md5": "a16ed2b609c24d057a0ca1b91fd9293b",
                "sha256": "dc00f49c811e3b87d663fedea79aa95213afac48283443e4f93e7b8378910ffc"
            },
            "downloads": -1,
            "filename": "xmlrecords-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "a16ed2b609c24d057a0ca1b91fd9293b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 13508,
            "upload_time": "2023-06-15T12:17:36",
            "upload_time_iso_8601": "2023-06-15T12:17:36.204059Z",
            "url": "https://files.pythonhosted.org/packages/a8/3b/a4c758b0f3e10e358989223bdb14b84b83b99f242c8de93c802dbccead3e/xmlrecords-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-15 12:17:36",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "xmlrecords"
}
        
Elapsed time: 0.07712s