[![yankee_logo](https://raw.githubusercontent.com/parkerhancock/yankee/master/docs/_static/yankee_logo.svg)](https://patent-client.readthedocs.io)
[![Documentation](https://img.shields.io/readthedocs/yankee/stable)](https://yankee.readthedocs.io/en/stable/)
[![PyPI](https://img.shields.io/pypi/v/yankee?color=blue)](https://pypi.org/project/yankee)
[![PyPI - Python Versions](https://img.shields.io/pypi/pyversions/yankee)](https://pypi.org/project/yankee)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/yankee?color=blue)](https://pypi.org/project/yankee)
# Summary
Simple declarative data extraction and loading in Python, featuring:
- 🍰 **Ease of use:** Data extraction is performed in a simple, declarative types.
- ⚙ **XML / HTML / JSON Extraction:** Extraction can be performed across a wide array of structured data
- 🐼 **Pandas Integration:** Results are easily castable to [Pandas Dataframes and Series][pandas].
- 😀 **Custom Output Classes:** Results can be automatically loaded into autogenerated dataclasses, or custom model types.
- 🚀 **Performance:** XML loading is supported by the excellent and fast [lxml] library, JSON is supported by [UltraJSON][ujson] for fast parsing, and [jsonpath_ng] for flexible data extraction.
[lxml]: https://lxml.de/
[ujson]:https://github.com/ultrajson/ultrajson
[jsonpath_ng]: https://github.com/h2non/jsonpath-ng
[pandas]: https://pandas.pydata.org/pandas-docs/stable/
## Quick Start
To extract data from **XML**, use this import statement, and see the example below:
```python
from yankee.xml.schema import Schema, fields as f, CSSSelector
```
To extract data from **JSON**, use this import statement, and see the example below:
```python
from yankee.xml.schema import Schema, fields as f, JSONPath
```
To extract data from **HTML**, use this import statement:
```python
from yankee.html.schema import Schema, fields as f, CSSSelector
```
To extract data from **Python objects** (either objects or dictionaries), use this import statement:
```python
from yankee.base.schema import Schema, fields as f
```
<!-- RTD-IGNORE -->
## Documentation
Complete documentation is available on [Read The Docs]
[Read The Docs]: https://yankee.readthedocs.io/en/latest/
<!-- END-RTD-IGNORE -->
## Examples
### Extract data from XML
Data extraction from XML. By default, data keys are XPath expressions, but can also be CSS selectors.
Take this:
```xml
<xmlObject>
<name>Johnny Appleseed</name>
<birthdate>2000-01-01</birthdate>
<something>
<many>
<levels>
<deep>123</deep>
</levels>
</many>
</something>
</xmlObject>
```
Do this:
```python
from yankee.xml.schema import Schema, fields as f, CSSSelector
class XmlExample(Schema):
name = f.String("./name")
birthday = f.Date(CSSSelector("birthdate"))
deep_data = f.Int("./something/many/levels/deep")
XmlExample().load(xml_doc)
```
Get this:
```python
{
"name": "Johnny Appleseed",
"birthday": datetime.date(2000, 1, 1),
"deep_data": 123
}
```
### Extract data from JSON
Data extraction from JSON. By default, data keys are implied from the field names, but can also be JSONPath expressions
Take this:
```json
{
"name": "Johnny Appleseed",
"birthdate": "2000-01-01",
"something": [
{"many": {
"levels": {
"deep": 123
}
}}
]
}
```
Do this:
```python
from yankee.json.schema import Schema, fields as f
class JsonExample(Schema):
name = f.String()
birthday = f.Date("birthdate")
deep_data = f.Int("something.0.many.levels.deep")
```
Get this:
```python
{
"name": "Johnny Appleseed",
"birthday": datetime.date(2000, 1, 1),
"deep_data": 123
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/parkerhancock/gelatin_extract",
"name": "yankee",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.9",
"maintainer_email": null,
"keywords": "deserialization, xml, json, deserialize",
"author": "Parker Hancock",
"author_email": "633163+parkerhancock@users.noreply.github.com",
"download_url": "https://files.pythonhosted.org/packages/58/2d/0de858d7393eb462c15b8f480abae305d329795e0c2b11ae6cdf058ebdf8/yankee-0.1.46.tar.gz",
"platform": null,
"description": "[![yankee_logo](https://raw.githubusercontent.com/parkerhancock/yankee/master/docs/_static/yankee_logo.svg)](https://patent-client.readthedocs.io)\n[![Documentation](https://img.shields.io/readthedocs/yankee/stable)](https://yankee.readthedocs.io/en/stable/)\n\n\n[![PyPI](https://img.shields.io/pypi/v/yankee?color=blue)](https://pypi.org/project/yankee)\n[![PyPI - Python Versions](https://img.shields.io/pypi/pyversions/yankee)](https://pypi.org/project/yankee)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/yankee?color=blue)](https://pypi.org/project/yankee)\n\n\n# Summary\n\nSimple declarative data extraction and loading in Python, featuring:\n\n- \ud83c\udf70 **Ease of use:** Data extraction is performed in a simple, declarative types.\n- \u2699 **XML / HTML / JSON Extraction:** Extraction can be performed across a wide array of structured data\n- \ud83d\udc3c **Pandas Integration:** Results are easily castable to [Pandas Dataframes and Series][pandas].\n- \ud83d\ude00 **Custom Output Classes:** Results can be automatically loaded into autogenerated dataclasses, or custom model types.\n- \ud83d\ude80 **Performance:** XML loading is supported by the excellent and fast [lxml] library, JSON is supported by [UltraJSON][ujson] for fast parsing, and [jsonpath_ng] for flexible data extraction. \n\n[lxml]: https://lxml.de/\n[ujson]:https://github.com/ultrajson/ultrajson\n[jsonpath_ng]: https://github.com/h2non/jsonpath-ng\n[pandas]: https://pandas.pydata.org/pandas-docs/stable/\n\n## Quick Start\n\nTo extract data from **XML**, use this import statement, and see the example below:\n```python\nfrom yankee.xml.schema import Schema, fields as f, CSSSelector\n```\n\nTo extract data from **JSON**, use this import statement, and see the example below:\n```python\nfrom yankee.xml.schema import Schema, fields as f, JSONPath\n```\n\nTo extract data from **HTML**, use this import statement:\n```python\nfrom yankee.html.schema import Schema, fields as f, CSSSelector\n```\n\nTo extract data from **Python objects** (either objects or dictionaries), use this import statement:\n```python\nfrom yankee.base.schema import Schema, fields as f\n```\n<!-- RTD-IGNORE -->\n## Documentation\n\nComplete documentation is available on [Read The Docs]\n\n[Read The Docs]: https://yankee.readthedocs.io/en/latest/\n\n<!-- END-RTD-IGNORE -->\n## Examples\n\n### Extract data from XML\n\nData extraction from XML. By default, data keys are XPath expressions, but can also be CSS selectors.\n\nTake this:\n```xml\n <xmlObject>\n <name>Johnny Appleseed</name>\n <birthdate>2000-01-01</birthdate>\n <something>\n <many>\n <levels>\n <deep>123</deep>\n </levels>\n </many>\n </something>\n </xmlObject>\n```\n\nDo this:\n```python\nfrom yankee.xml.schema import Schema, fields as f, CSSSelector\n\nclass XmlExample(Schema):\n name = f.String(\"./name\")\n birthday = f.Date(CSSSelector(\"birthdate\"))\n deep_data = f.Int(\"./something/many/levels/deep\")\n\nXmlExample().load(xml_doc)\n```\n\nGet this:\n```python\n{\n \"name\": \"Johnny Appleseed\",\n \"birthday\": datetime.date(2000, 1, 1),\n \"deep_data\": 123\n}\n```\n\n### Extract data from JSON\n\nData extraction from JSON. By default, data keys are implied from the field names, but can also be JSONPath expressions\n\nTake this:\n```json\n{\n \"name\": \"Johnny Appleseed\",\n \"birthdate\": \"2000-01-01\",\n \"something\": [\n {\"many\": {\n \"levels\": {\n \"deep\": 123\n }\n }}\n ]\n }\n```\nDo this:\n```python\nfrom yankee.json.schema import Schema, fields as f\n\nclass JsonExample(Schema):\n name = f.String()\n birthday = f.Date(\"birthdate\")\n deep_data = f.Int(\"something.0.many.levels.deep\")\n```\nGet this:\n```python\n{\n \"name\": \"Johnny Appleseed\",\n \"birthday\": datetime.date(2000, 1, 1),\n \"deep_data\": 123\n}\n```\n\n\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "lightweight, simple, and fast declarative XML and JSON data extraction",
"version": "0.1.46",
"project_urls": {
"Homepage": "https://github.com/parkerhancock/gelatin_extract",
"Repository": "https://github.com/parkerhancock/gelatin_extract"
},
"split_keywords": [
"deserialization",
" xml",
" json",
" deserialize"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b61663f4bbaa035ce9a599031cc20fbd2e39d39d7a5004b605e148973b1b3f7a",
"md5": "02374bef140b5110b7f237c159f66aa5",
"sha256": "9932ce72e8fc5146ec9f429f8efb2d204be48996b13fadcfc2df3cffe4520444"
},
"downloads": -1,
"filename": "yankee-0.1.46-py3-none-any.whl",
"has_sig": false,
"md5_digest": "02374bef140b5110b7f237c159f66aa5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.9",
"size": 103099,
"upload_time": "2024-05-22T15:46:00",
"upload_time_iso_8601": "2024-05-22T15:46:00.067142Z",
"url": "https://files.pythonhosted.org/packages/b6/16/63f4bbaa035ce9a599031cc20fbd2e39d39d7a5004b605e148973b1b3f7a/yankee-0.1.46-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "582d0de858d7393eb462c15b8f480abae305d329795e0c2b11ae6cdf058ebdf8",
"md5": "4ee9c7fe24cfca3b48fe8c607eb75b39",
"sha256": "49fe7255152e8a7766470962e55b002962c016c370e13c4cbfc15f09744ace58"
},
"downloads": -1,
"filename": "yankee-0.1.46.tar.gz",
"has_sig": false,
"md5_digest": "4ee9c7fe24cfca3b48fe8c607eb75b39",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.9",
"size": 88908,
"upload_time": "2024-05-22T15:46:07",
"upload_time_iso_8601": "2024-05-22T15:46:07.607938Z",
"url": "https://files.pythonhosted.org/packages/58/2d/0de858d7393eb462c15b8f480abae305d329795e0c2b11ae6cdf058ebdf8/yankee-0.1.46.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-22 15:46:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "parkerhancock",
"github_project": "gelatin_extract",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "yankee"
}