# XML Records
`xmlrecords` is a user-friendly wrapper of `lxml` package for extraction of tabular data from XML files.
>>> This data provider sends all his data in... XML. You know nothing about XML, except that it looks kind of weird and you would *definitely* never use it for tabular data. How could you just transform all this XML nightmare into a sensible tabular format, like a DataFrame? Don't worry: you are in the right place!
# Installation
```shell script
pip install xmlrecords
```
The package requires `python 3.7+` and one external dependency `lxml`.
# Usage
## Basic example
Usually, you only need to specify path to table rows; optionally, you can specify paths to any extra data you'd like to add to your table:
```python
# XML object
xml_bytes = b"""\
<?xml version="1.0" encoding="utf-8"?>
<Catalog>
<Library>
<Name>Virtual Shore</Name>
</Library>
<Shelf>
<Timestamp>2020-02-02T05:12:22</Timestamp>
<Book>
<Title>Sunny Night</Title>
<Author alive="no" name="Mysterious Mark"/>
<Year>2017</Year>
<Price>112.34</Price>
</Book>
<Book>
<Title>Babel-17</Title>
<Author alive="yes" name="Samuel R. Delany"/>
<Year>1963</Year>
<Price>10</Price>
</Book>
</Shelf>
</Catalog>
"""
# Transform XML to records (= a list of key-value pairs)
import xmlrecords
records = xmlrecords.parse(
xml=xml_bytes,
records_path=['Shelf', 'Book'], # The rows are XML nodes with the repeating tag <Book>
meta_paths=[['Library', 'Name'], ['Shelf', 'Timestamp']], # Add additional "meta" nodes
)
for r in records:
print(r)
# Output:
# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Sunny Night', 'alive': 'no', 'name': 'Mysterious Mark', 'Year': '2017', 'Price': '112.34'}
# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Babel-17', 'alive': 'yes', 'name': 'Samuel R. Delany', 'Year': '1963', 'Price': '10'}
# Validate record keys
xmlrecords.validate(
records,
expected_keys=['Name', 'Timestamp', 'Title', 'alive', 'name', 'Year', 'Price'],
)
```
## With Pandas
You can easily transform records to a pandas DataFrame:
```python
import pandas as pd
df = pd.DataFrame(records)
```
## With SQL
You can use records directly with INSERT statements if your SQL database is [PEP 249 compliant](https://www.python.org/dev/peps/pep-0249/). Most SQL databases are.
SQLite is an exception. There, you'll have to transform records (= a list of dictionaries) into a list of lists:
```python
import sqlite3
with sqlite3.connect('maindev.db') as conn:
c = conn.cursor()
c.execute("""\
CREATE TABLE BOOKS (
LIBRARY_NAME TEXT,
SHELF_TIMESTAMP TEXT,
TITLE TEXT,
AUTHOR_ALIVE TEXT,
AUTHOR_NAME TEXT,
YEAR INT,
PRICE FLOAT,
PRIMARY KEY (TITLE, AUTHOR_NAME)
)
"""
)
c.executemany(
"""INSERT INTO BOOKS VALUES (?,?,?,?,?,?,?)""",
[list(x.values()) for x in records],
)
conn.commit()
```
# FAQ
1. **Why not `xmltodict`?** `xmltodict` can convert arbitrary XML to a python dict. However, it is 2-3 times slower than `xmlrecords` and does not support some features specific for tablular data.
2. **Why not `xml` or `lxml`**? `xmlrecords` uses `lxml` under the hood. Using `xml` or `lxml` directly is a viable option too - in case this package doesn't cover your particular use case.
Raw data
{
"_id": null,
"home_page": "",
"name": "xmlrecords",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "xml parsing",
"author": "Yaroslav Kopotilov",
"author_email": "datascience@tuta.io",
"download_url": "https://files.pythonhosted.org/packages/a8/3b/a4c758b0f3e10e358989223bdb14b84b83b99f242c8de93c802dbccead3e/xmlrecords-0.3.1.tar.gz",
"platform": null,
"description": "# XML Records\n\n`xmlrecords` is a user-friendly wrapper of `lxml` package for extraction of tabular data from XML files.\n\n>>> This data provider sends all his data in... XML. You know nothing about XML, except that it looks kind of weird and you would *definitely* never use it for tabular data. How could you just transform all this XML nightmare into a sensible tabular format, like a DataFrame? Don't worry: you are in the right place!\n\n\n# Installation\n\n```shell script\npip install xmlrecords\n```\n\nThe package requires `python 3.7+` and one external dependency `lxml`.\n\n# Usage\n\n## Basic example\n\nUsually, you only need to specify path to table rows; optionally, you can specify paths to any extra data you'd like to add to your table:\n\n```python\n# XML object\nxml_bytes = b\"\"\"\\\n<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<Catalog>\n <Library>\n <Name>Virtual Shore</Name>\n </Library>\n <Shelf>\n <Timestamp>2020-02-02T05:12:22</Timestamp>\n <Book>\n <Title>Sunny Night</Title>\n <Author alive=\"no\" name=\"Mysterious Mark\"/>\n <Year>2017</Year>\n <Price>112.34</Price>\n </Book>\n <Book>\n <Title>Babel-17</Title>\n <Author alive=\"yes\" name=\"Samuel R. Delany\"/>\n <Year>1963</Year>\n <Price>10</Price>\n </Book>\n </Shelf>\n</Catalog>\n\"\"\"\n\n# Transform XML to records (= a list of key-value pairs)\nimport xmlrecords\nrecords = xmlrecords.parse(\n xml=xml_bytes, \n records_path=['Shelf', 'Book'], # The rows are XML nodes with the repeating tag <Book>\n meta_paths=[['Library', 'Name'], ['Shelf', 'Timestamp']], # Add additional \"meta\" nodes\n)\nfor r in records:\n print(r)\n\n# Output:\n# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Sunny Night', 'alive': 'no', 'name': 'Mysterious Mark', 'Year': '2017', 'Price': '112.34'}\n# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Babel-17', 'alive': 'yes', 'name': 'Samuel R. Delany', 'Year': '1963', 'Price': '10'}\n\n# Validate record keys\nxmlrecords.validate(\n records, \n expected_keys=['Name', 'Timestamp', 'Title', 'alive', 'name', 'Year', 'Price'],\n)\n``` \n\n## With Pandas\n\nYou can easily transform records to a pandas DataFrame:\n\n```python\nimport pandas as pd\ndf = pd.DataFrame(records)\n```\n\n## With SQL\n\nYou can use records directly with INSERT statements if your SQL database is [PEP 249 compliant](https://www.python.org/dev/peps/pep-0249/). Most SQL databases are.\n\nSQLite is an exception. There, you'll have to transform records (= a list of dictionaries) into a list of lists:\n\n```python\nimport sqlite3\nwith sqlite3.connect('maindev.db') as conn:\n c = conn.cursor()\n c.execute(\"\"\"\\\n CREATE TABLE BOOKS (\n LIBRARY_NAME TEXT,\n SHELF_TIMESTAMP TEXT,\n TITLE TEXT,\n AUTHOR_ALIVE TEXT,\n AUTHOR_NAME TEXT,\n YEAR INT,\n PRICE FLOAT,\n PRIMARY KEY (TITLE, AUTHOR_NAME)\n )\n \"\"\"\n )\n c.executemany(\n \"\"\"INSERT INTO BOOKS VALUES (?,?,?,?,?,?,?)\"\"\",\n [list(x.values()) for x in records],\n )\n conn.commit()\n```\n\n\n# FAQ\n\n1. **Why not `xmltodict`?** `xmltodict` can convert arbitrary XML to a python dict. However, it is 2-3 times slower than `xmlrecords` and does not support some features specific for tablular data.\n\n2. **Why not `xml` or `lxml`**? `xmlrecords` uses `lxml` under the hood. Using `xml` or `lxml` directly is a viable option too - in case this package doesn't cover your particular use case.\n\n\n",
"bugtrack_url": null,
"license": "Apache License, Version 2.0",
"summary": "This package can convert an XML files to a list of records",
"version": "0.3.1",
"project_urls": null,
"split_keywords": [
"xml",
"parsing"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1a44d3a79c7b5b6992567706e9ebaefb274525a9ab0784bc25ee20b40984c235",
"md5": "8a29925384248bad03d13b9e3e21d69f",
"sha256": "c5d48ff035bd076b11105372fc2aa1a00a12aa39ea7306ca73fd9a0478baaddc"
},
"downloads": -1,
"filename": "xmlrecords-0.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8a29925384248bad03d13b9e3e21d69f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 11871,
"upload_time": "2023-06-15T12:17:34",
"upload_time_iso_8601": "2023-06-15T12:17:34.453874Z",
"url": "https://files.pythonhosted.org/packages/1a/44/d3a79c7b5b6992567706e9ebaefb274525a9ab0784bc25ee20b40984c235/xmlrecords-0.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a83ba4c758b0f3e10e358989223bdb14b84b83b99f242c8de93c802dbccead3e",
"md5": "a16ed2b609c24d057a0ca1b91fd9293b",
"sha256": "dc00f49c811e3b87d663fedea79aa95213afac48283443e4f93e7b8378910ffc"
},
"downloads": -1,
"filename": "xmlrecords-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "a16ed2b609c24d057a0ca1b91fd9293b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 13508,
"upload_time": "2023-06-15T12:17:36",
"upload_time_iso_8601": "2023-06-15T12:17:36.204059Z",
"url": "https://files.pythonhosted.org/packages/a8/3b/a4c758b0f3e10e358989223bdb14b84b83b99f242c8de93c802dbccead3e/xmlrecords-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-15 12:17:36",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "xmlrecords"
}