yankee


Nameyankee JSON
Version 0.1.44 PyPI version JSON
download
home_pagehttps://github.com/parkerhancock/gelatin_extract
Summarylightweight, simple, and fast declarative XML and JSON data extraction
upload_time2023-10-31 21:48:58
maintainer
docs_urlNone
authorParker Hancock
requires_python>=3.9,<3.13
licenseApache Software License 2.0
keywords deserialization xml json deserialize
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![yankee_logo](https://raw.githubusercontent.com/parkerhancock/yankee/master/docs/_static/yankee_logo.svg)](https://patent-client.readthedocs.io)
[![Documentation](https://img.shields.io/readthedocs/yankee/stable)](https://yankee.readthedocs.io/en/stable/)


[![PyPI](https://img.shields.io/pypi/v/yankee?color=blue)](https://pypi.org/project/yankee)
[![PyPI - Python Versions](https://img.shields.io/pypi/pyversions/yankee)](https://pypi.org/project/yankee)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/yankee?color=blue)](https://pypi.org/project/yankee)


# Summary

Simple declarative data extraction and loading in Python, featuring:

- 🍰 **Ease of use:** Data extraction is performed in a simple, declarative types.
- ⚙ **XML / HTML / JSON Extraction:** Extraction can be performed across a wide array of structured data
- 🐼 **Pandas Integration:** Results are easily castable to [Pandas Dataframes and Series][pandas].
- 😀 **Custom Output Classes:** Results can be automatically loaded into autogenerated dataclasses, or custom model types.
- 🚀 **Performance:** XML loading is supported by the excellent and fast [lxml] library, JSON is supported by [UltraJSON][ujson] for fast parsing, and [jsonpath_ng] for flexible data extraction.  

[lxml]: https://lxml.de/
[ujson]:https://github.com/ultrajson/ultrajson
[jsonpath_ng]: https://github.com/h2non/jsonpath-ng
[pandas]: https://pandas.pydata.org/pandas-docs/stable/

## Quick Start

To extract data from **XML**, use this import statement, and see the example below:
```python
from yankee.xml.schema import Schema, fields as f, CSSSelector
```

To extract data from **JSON**, use this import statement, and see the example below:
```python
from yankee.xml.schema import Schema, fields as f, JSONPath
```

To extract data from **HTML**, use this import statement:
```python
from yankee.html.schema import Schema, fields as f, CSSSelector
```

To extract data from **Python objects** (either objects or dictionaries), use this import statement:
```python
from yankee.base.schema import Schema, fields as f
```
<!-- RTD-IGNORE -->
## Documentation

Complete documentation is available on [Read The Docs]

[Read The Docs]: https://yankee.readthedocs.io/en/latest/

<!-- END-RTD-IGNORE -->
## Examples

### Extract data from XML

Data extraction from XML. By default, data keys are XPath expressions, but can also be CSS selectors.

Take this:
```xml
    <xmlObject>
        <name>Johnny Appleseed</name>
        <birthdate>2000-01-01</birthdate>
        <something>
            <many>
                <levels>
                    <deep>123</deep>
                </levels>
            </many>
        </something>
    </xmlObject>
```

Do this:
```python
from yankee.xml.schema import Schema, fields as f, CSSSelector

class XmlExample(Schema):
    name = f.String("./name")
    birthday = f.Date(CSSSelector("birthdate"))
    deep_data = f.Int("./something/many/levels/deep")

XmlExample().load(xml_doc)
```

Get this:
```python
{
    "name": "Johnny Appleseed",
    "birthday": datetime.date(2000, 1, 1),
    "deep_data": 123
}
```

### Extract data from JSON

Data extraction from JSON. By default, data keys are implied from the field names, but can also be JSONPath expressions

Take this:
```json
{
        "name": "Johnny Appleseed",
        "birthdate": "2000-01-01",
        "something": [
            {"many": {
                "levels": {
                    "deep": 123
                }
            }}
        ]
    }
```
Do this:
```python
from yankee.json.schema import Schema, fields as f

class JsonExample(Schema):
    name = f.String()
    birthday = f.Date("birthdate")
    deep_data = f.Int("something.0.many.levels.deep")
```
Get this:
```python
{
    "name": "Johnny Appleseed",
    "birthday": datetime.date(2000, 1, 1),
    "deep_data": 123
}
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/parkerhancock/gelatin_extract",
    "name": "yankee",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<3.13",
    "maintainer_email": "",
    "keywords": "deserialization,xml,json,deserialize",
    "author": "Parker Hancock",
    "author_email": "633163+parkerhancock@users.noreply.github.com",
    "download_url": "https://files.pythonhosted.org/packages/f7/a7/752092a7830545b3dcc0e20e0be617d19981021d634b92b12900462f2bd4/yankee-0.1.44.tar.gz",
    "platform": null,
    "description": "[![yankee_logo](https://raw.githubusercontent.com/parkerhancock/yankee/master/docs/_static/yankee_logo.svg)](https://patent-client.readthedocs.io)\n[![Documentation](https://img.shields.io/readthedocs/yankee/stable)](https://yankee.readthedocs.io/en/stable/)\n\n\n[![PyPI](https://img.shields.io/pypi/v/yankee?color=blue)](https://pypi.org/project/yankee)\n[![PyPI - Python Versions](https://img.shields.io/pypi/pyversions/yankee)](https://pypi.org/project/yankee)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/yankee?color=blue)](https://pypi.org/project/yankee)\n\n\n# Summary\n\nSimple declarative data extraction and loading in Python, featuring:\n\n- \ud83c\udf70 **Ease of use:** Data extraction is performed in a simple, declarative types.\n- \u2699 **XML / HTML / JSON Extraction:** Extraction can be performed across a wide array of structured data\n- \ud83d\udc3c **Pandas Integration:** Results are easily castable to [Pandas Dataframes and Series][pandas].\n- \ud83d\ude00 **Custom Output Classes:** Results can be automatically loaded into autogenerated dataclasses, or custom model types.\n- \ud83d\ude80 **Performance:** XML loading is supported by the excellent and fast [lxml] library, JSON is supported by [UltraJSON][ujson] for fast parsing, and [jsonpath_ng] for flexible data extraction.  \n\n[lxml]: https://lxml.de/\n[ujson]:https://github.com/ultrajson/ultrajson\n[jsonpath_ng]: https://github.com/h2non/jsonpath-ng\n[pandas]: https://pandas.pydata.org/pandas-docs/stable/\n\n## Quick Start\n\nTo extract data from **XML**, use this import statement, and see the example below:\n```python\nfrom yankee.xml.schema import Schema, fields as f, CSSSelector\n```\n\nTo extract data from **JSON**, use this import statement, and see the example below:\n```python\nfrom yankee.xml.schema import Schema, fields as f, JSONPath\n```\n\nTo extract data from **HTML**, use this import statement:\n```python\nfrom yankee.html.schema import Schema, fields as f, CSSSelector\n```\n\nTo extract data from **Python objects** (either objects or dictionaries), use this import statement:\n```python\nfrom yankee.base.schema import Schema, fields as f\n```\n<!-- RTD-IGNORE -->\n## Documentation\n\nComplete documentation is available on [Read The Docs]\n\n[Read The Docs]: https://yankee.readthedocs.io/en/latest/\n\n<!-- END-RTD-IGNORE -->\n## Examples\n\n### Extract data from XML\n\nData extraction from XML. By default, data keys are XPath expressions, but can also be CSS selectors.\n\nTake this:\n```xml\n    <xmlObject>\n        <name>Johnny Appleseed</name>\n        <birthdate>2000-01-01</birthdate>\n        <something>\n            <many>\n                <levels>\n                    <deep>123</deep>\n                </levels>\n            </many>\n        </something>\n    </xmlObject>\n```\n\nDo this:\n```python\nfrom yankee.xml.schema import Schema, fields as f, CSSSelector\n\nclass XmlExample(Schema):\n    name = f.String(\"./name\")\n    birthday = f.Date(CSSSelector(\"birthdate\"))\n    deep_data = f.Int(\"./something/many/levels/deep\")\n\nXmlExample().load(xml_doc)\n```\n\nGet this:\n```python\n{\n    \"name\": \"Johnny Appleseed\",\n    \"birthday\": datetime.date(2000, 1, 1),\n    \"deep_data\": 123\n}\n```\n\n### Extract data from JSON\n\nData extraction from JSON. By default, data keys are implied from the field names, but can also be JSONPath expressions\n\nTake this:\n```json\n{\n        \"name\": \"Johnny Appleseed\",\n        \"birthdate\": \"2000-01-01\",\n        \"something\": [\n            {\"many\": {\n                \"levels\": {\n                    \"deep\": 123\n                }\n            }}\n        ]\n    }\n```\nDo this:\n```python\nfrom yankee.json.schema import Schema, fields as f\n\nclass JsonExample(Schema):\n    name = f.String()\n    birthday = f.Date(\"birthdate\")\n    deep_data = f.Int(\"something.0.many.levels.deep\")\n```\nGet this:\n```python\n{\n    \"name\": \"Johnny Appleseed\",\n    \"birthday\": datetime.date(2000, 1, 1),\n    \"deep_data\": 123\n}\n```\n\n\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "lightweight, simple, and fast declarative XML and JSON data extraction",
    "version": "0.1.44",
    "project_urls": {
        "Homepage": "https://github.com/parkerhancock/gelatin_extract",
        "Repository": "https://github.com/parkerhancock/gelatin_extract"
    },
    "split_keywords": [
        "deserialization",
        "xml",
        "json",
        "deserialize"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "65962dff7821c1dee356e6b0632ec639e6acd169b63ca13e9208bc1371d81e6e",
                "md5": "3904360197a0cfef3b5c551116ca3c46",
                "sha256": "53848b7a84220e1b511b8da4ea5852ba63040139fbf662cc508032dd00ccaa25"
            },
            "downloads": -1,
            "filename": "yankee-0.1.44-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3904360197a0cfef3b5c551116ca3c46",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<3.13",
            "size": 103188,
            "upload_time": "2023-10-31T21:48:56",
            "upload_time_iso_8601": "2023-10-31T21:48:56.925269Z",
            "url": "https://files.pythonhosted.org/packages/65/96/2dff7821c1dee356e6b0632ec639e6acd169b63ca13e9208bc1371d81e6e/yankee-0.1.44-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f7a7752092a7830545b3dcc0e20e0be617d19981021d634b92b12900462f2bd4",
                "md5": "aaca6e696b4a45c424dabb05b49851f6",
                "sha256": "abe3159012e2b95a711e3ad9825c4855bd90a603959883ff418672142a987b18"
            },
            "downloads": -1,
            "filename": "yankee-0.1.44.tar.gz",
            "has_sig": false,
            "md5_digest": "aaca6e696b4a45c424dabb05b49851f6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<3.13",
            "size": 89017,
            "upload_time": "2023-10-31T21:48:58",
            "upload_time_iso_8601": "2023-10-31T21:48:58.515461Z",
            "url": "https://files.pythonhosted.org/packages/f7/a7/752092a7830545b3dcc0e20e0be617d19981021d634b92b12900462f2bd4/yankee-0.1.44.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-31 21:48:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "parkerhancock",
    "github_project": "gelatin_extract",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "yankee"
}
        
Elapsed time: 0.30931s