yankee


Nameyankee JSON
Version 0.1.46 PyPI version JSON
download
home_pagehttps://github.com/parkerhancock/gelatin_extract
Summarylightweight, simple, and fast declarative XML and JSON data extraction
upload_time2024-05-22 15:46:07
maintainerNone
docs_urlNone
authorParker Hancock
requires_python<3.13,>=3.9
licenseApache Software License 2.0
keywords deserialization xml json deserialize
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![yankee_logo](https://raw.githubusercontent.com/parkerhancock/yankee/master/docs/_static/yankee_logo.svg)](https://patent-client.readthedocs.io)
[![Documentation](https://img.shields.io/readthedocs/yankee/stable)](https://yankee.readthedocs.io/en/stable/)


[![PyPI](https://img.shields.io/pypi/v/yankee?color=blue)](https://pypi.org/project/yankee)
[![PyPI - Python Versions](https://img.shields.io/pypi/pyversions/yankee)](https://pypi.org/project/yankee)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/yankee?color=blue)](https://pypi.org/project/yankee)


# Summary

Simple declarative data extraction and loading in Python, featuring:

- 🍰 **Ease of use:** Data extraction is performed in a simple, declarative types.
- ⚙ **XML / HTML / JSON Extraction:** Extraction can be performed across a wide array of structured data
- 🐼 **Pandas Integration:** Results are easily castable to [Pandas Dataframes and Series][pandas].
- 😀 **Custom Output Classes:** Results can be automatically loaded into autogenerated dataclasses, or custom model types.
- 🚀 **Performance:** XML loading is supported by the excellent and fast [lxml] library, JSON is supported by [UltraJSON][ujson] for fast parsing, and [jsonpath_ng] for flexible data extraction.  

[lxml]: https://lxml.de/
[ujson]:https://github.com/ultrajson/ultrajson
[jsonpath_ng]: https://github.com/h2non/jsonpath-ng
[pandas]: https://pandas.pydata.org/pandas-docs/stable/

## Quick Start

To extract data from **XML**, use this import statement, and see the example below:
```python
from yankee.xml.schema import Schema, fields as f, CSSSelector
```

To extract data from **JSON**, use this import statement, and see the example below:
```python
from yankee.xml.schema import Schema, fields as f, JSONPath
```

To extract data from **HTML**, use this import statement:
```python
from yankee.html.schema import Schema, fields as f, CSSSelector
```

To extract data from **Python objects** (either objects or dictionaries), use this import statement:
```python
from yankee.base.schema import Schema, fields as f
```
<!-- RTD-IGNORE -->
## Documentation

Complete documentation is available on [Read The Docs]

[Read The Docs]: https://yankee.readthedocs.io/en/latest/

<!-- END-RTD-IGNORE -->
## Examples

### Extract data from XML

Data extraction from XML. By default, data keys are XPath expressions, but can also be CSS selectors.

Take this:
```xml
    <xmlObject>
        <name>Johnny Appleseed</name>
        <birthdate>2000-01-01</birthdate>
        <something>
            <many>
                <levels>
                    <deep>123</deep>
                </levels>
            </many>
        </something>
    </xmlObject>
```

Do this:
```python
from yankee.xml.schema import Schema, fields as f, CSSSelector

class XmlExample(Schema):
    name = f.String("./name")
    birthday = f.Date(CSSSelector("birthdate"))
    deep_data = f.Int("./something/many/levels/deep")

XmlExample().load(xml_doc)
```

Get this:
```python
{
    "name": "Johnny Appleseed",
    "birthday": datetime.date(2000, 1, 1),
    "deep_data": 123
}
```

### Extract data from JSON

Data extraction from JSON. By default, data keys are implied from the field names, but can also be JSONPath expressions

Take this:
```json
{
        "name": "Johnny Appleseed",
        "birthdate": "2000-01-01",
        "something": [
            {"many": {
                "levels": {
                    "deep": 123
                }
            }}
        ]
    }
```
Do this:
```python
from yankee.json.schema import Schema, fields as f

class JsonExample(Schema):
    name = f.String()
    birthday = f.Date("birthdate")
    deep_data = f.Int("something.0.many.levels.deep")
```
Get this:
```python
{
    "name": "Johnny Appleseed",
    "birthday": datetime.date(2000, 1, 1),
    "deep_data": 123
}
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/parkerhancock/gelatin_extract",
    "name": "yankee",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": "deserialization, xml, json, deserialize",
    "author": "Parker Hancock",
    "author_email": "633163+parkerhancock@users.noreply.github.com",
    "download_url": "https://files.pythonhosted.org/packages/58/2d/0de858d7393eb462c15b8f480abae305d329795e0c2b11ae6cdf058ebdf8/yankee-0.1.46.tar.gz",
    "platform": null,
    "description": "[![yankee_logo](https://raw.githubusercontent.com/parkerhancock/yankee/master/docs/_static/yankee_logo.svg)](https://patent-client.readthedocs.io)\n[![Documentation](https://img.shields.io/readthedocs/yankee/stable)](https://yankee.readthedocs.io/en/stable/)\n\n\n[![PyPI](https://img.shields.io/pypi/v/yankee?color=blue)](https://pypi.org/project/yankee)\n[![PyPI - Python Versions](https://img.shields.io/pypi/pyversions/yankee)](https://pypi.org/project/yankee)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/yankee?color=blue)](https://pypi.org/project/yankee)\n\n\n# Summary\n\nSimple declarative data extraction and loading in Python, featuring:\n\n- \ud83c\udf70 **Ease of use:** Data extraction is performed in a simple, declarative types.\n- \u2699 **XML / HTML / JSON Extraction:** Extraction can be performed across a wide array of structured data\n- \ud83d\udc3c **Pandas Integration:** Results are easily castable to [Pandas Dataframes and Series][pandas].\n- \ud83d\ude00 **Custom Output Classes:** Results can be automatically loaded into autogenerated dataclasses, or custom model types.\n- \ud83d\ude80 **Performance:** XML loading is supported by the excellent and fast [lxml] library, JSON is supported by [UltraJSON][ujson] for fast parsing, and [jsonpath_ng] for flexible data extraction.  \n\n[lxml]: https://lxml.de/\n[ujson]:https://github.com/ultrajson/ultrajson\n[jsonpath_ng]: https://github.com/h2non/jsonpath-ng\n[pandas]: https://pandas.pydata.org/pandas-docs/stable/\n\n## Quick Start\n\nTo extract data from **XML**, use this import statement, and see the example below:\n```python\nfrom yankee.xml.schema import Schema, fields as f, CSSSelector\n```\n\nTo extract data from **JSON**, use this import statement, and see the example below:\n```python\nfrom yankee.xml.schema import Schema, fields as f, JSONPath\n```\n\nTo extract data from **HTML**, use this import statement:\n```python\nfrom yankee.html.schema import Schema, fields as f, CSSSelector\n```\n\nTo extract data from **Python objects** (either objects or dictionaries), use this import statement:\n```python\nfrom yankee.base.schema import Schema, fields as f\n```\n<!-- RTD-IGNORE -->\n## Documentation\n\nComplete documentation is available on [Read The Docs]\n\n[Read The Docs]: https://yankee.readthedocs.io/en/latest/\n\n<!-- END-RTD-IGNORE -->\n## Examples\n\n### Extract data from XML\n\nData extraction from XML. By default, data keys are XPath expressions, but can also be CSS selectors.\n\nTake this:\n```xml\n    <xmlObject>\n        <name>Johnny Appleseed</name>\n        <birthdate>2000-01-01</birthdate>\n        <something>\n            <many>\n                <levels>\n                    <deep>123</deep>\n                </levels>\n            </many>\n        </something>\n    </xmlObject>\n```\n\nDo this:\n```python\nfrom yankee.xml.schema import Schema, fields as f, CSSSelector\n\nclass XmlExample(Schema):\n    name = f.String(\"./name\")\n    birthday = f.Date(CSSSelector(\"birthdate\"))\n    deep_data = f.Int(\"./something/many/levels/deep\")\n\nXmlExample().load(xml_doc)\n```\n\nGet this:\n```python\n{\n    \"name\": \"Johnny Appleseed\",\n    \"birthday\": datetime.date(2000, 1, 1),\n    \"deep_data\": 123\n}\n```\n\n### Extract data from JSON\n\nData extraction from JSON. By default, data keys are implied from the field names, but can also be JSONPath expressions\n\nTake this:\n```json\n{\n        \"name\": \"Johnny Appleseed\",\n        \"birthdate\": \"2000-01-01\",\n        \"something\": [\n            {\"many\": {\n                \"levels\": {\n                    \"deep\": 123\n                }\n            }}\n        ]\n    }\n```\nDo this:\n```python\nfrom yankee.json.schema import Schema, fields as f\n\nclass JsonExample(Schema):\n    name = f.String()\n    birthday = f.Date(\"birthdate\")\n    deep_data = f.Int(\"something.0.many.levels.deep\")\n```\nGet this:\n```python\n{\n    \"name\": \"Johnny Appleseed\",\n    \"birthday\": datetime.date(2000, 1, 1),\n    \"deep_data\": 123\n}\n```\n\n\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "lightweight, simple, and fast declarative XML and JSON data extraction",
    "version": "0.1.46",
    "project_urls": {
        "Homepage": "https://github.com/parkerhancock/gelatin_extract",
        "Repository": "https://github.com/parkerhancock/gelatin_extract"
    },
    "split_keywords": [
        "deserialization",
        " xml",
        " json",
        " deserialize"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b61663f4bbaa035ce9a599031cc20fbd2e39d39d7a5004b605e148973b1b3f7a",
                "md5": "02374bef140b5110b7f237c159f66aa5",
                "sha256": "9932ce72e8fc5146ec9f429f8efb2d204be48996b13fadcfc2df3cffe4520444"
            },
            "downloads": -1,
            "filename": "yankee-0.1.46-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "02374bef140b5110b7f237c159f66aa5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 103099,
            "upload_time": "2024-05-22T15:46:00",
            "upload_time_iso_8601": "2024-05-22T15:46:00.067142Z",
            "url": "https://files.pythonhosted.org/packages/b6/16/63f4bbaa035ce9a599031cc20fbd2e39d39d7a5004b605e148973b1b3f7a/yankee-0.1.46-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "582d0de858d7393eb462c15b8f480abae305d329795e0c2b11ae6cdf058ebdf8",
                "md5": "4ee9c7fe24cfca3b48fe8c607eb75b39",
                "sha256": "49fe7255152e8a7766470962e55b002962c016c370e13c4cbfc15f09744ace58"
            },
            "downloads": -1,
            "filename": "yankee-0.1.46.tar.gz",
            "has_sig": false,
            "md5_digest": "4ee9c7fe24cfca3b48fe8c607eb75b39",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.9",
            "size": 88908,
            "upload_time": "2024-05-22T15:46:07",
            "upload_time_iso_8601": "2024-05-22T15:46:07.607938Z",
            "url": "https://files.pythonhosted.org/packages/58/2d/0de858d7393eb462c15b8f480abae305d329795e0c2b11ae6cdf058ebdf8/yankee-0.1.46.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-22 15:46:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "parkerhancock",
    "github_project": "gelatin_extract",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "yankee"
}
        
Elapsed time: 0.31232s