amara3.xml


Nameamara3.xml JSON
Version 3.3.0 PyPI version JSON
download
home_pagehttps://github.com/uogbuji/amara3-xml
SummaryAmara3 project, which offers a variety of data processing tools. This module adds the MicroXML support, and adaptation to classic XML.
upload_time2022-12-19 16:53:03
maintainer
docs_urlNone
authorUche Ogbuji
requires_python
licenseLicense :: OSI Approved :: Apache Software License
keywords xml web data
VCS
bugtrack_url
requirements pytest ply amara3-iri
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Amara 3 XML

Python 3 tools for processing [MicroXML](http://www.w3.org/community/microxml/), a simplification of XML. Amara 3 XML implements the MicroXML data model, and allows you to parse into this from tradiional XML and MicroXML.

The `microx` command line tool is especially useful for quick query and processing of XML.

## Install

Requires Python 3.4+. Just run:

```
pip install amara3.xml
```

## Use

Though Amara 3 is focused on MicroXML rather than full XML, the reality is that
most of the XML-like data you’ll be dealing with is full XML
1.0. his package provides capabilities to parse legacy XML and reduce it to
MicroXML. In many cases the biggest implication of this is that
namespace information is stripped. As long as you know what you’re doing
you can get pretty far by ignoring this, but make sure you know what
you’re doing.

    from amara3.uxml import xml

    MONTY_XML = """<monty xmlns="urn:spam:ignored">
      <python spam="eggs">What do you mean "bleh"</python>
      <python ministry="abuse">But I was looking for argument</python>
    </monty>"""

    builder = xml.treebuilder()
    root = builder.parse(MONTY_XML)
    print(root.xml_name) #"monty"
    child = next(root.xml_children)
    print(child) #First text node: "
  "
    child = next(root.xml_children)
    print(child.xml_value) #"What do you mean "bleh""
    print(child.xml_attributes["spam"]) #"eggs"

There are some utilities to make this a bit easier as well.

    from amara3.uxml import xml
    from amara3.uxml.treeutil import *

    MONTY_XML = """<monty xmlns="urn:spam:ignored">
      <python spam="eggs">What do you mean "bleh"</python>
      <python ministry="abuse">But I was looking for argument</python>
    </monty>"""

    builder = xml.treebuilder()
    root = builder.parse(MONTY_XML)
    py1 = next(select_name(root, "python"))
    print(py1.xml_value) #"What do you mean "bleh""
    py2 = next(select_attribute(root, "ministry", "abuse"))
    print(py2.xml_value) #"But I was looking for argument"

## Experimental MicroXML parser

For this parser the input truly must be MicroXML. Basics:

    >>> from amara3.uxml.parser import parse
    >>> events = parse('<hello><bold>world</bold></hello>')
    >>> for ev in events: print(ev)
    ...
    (<event.start_element: 1>, 'hello', {}, [])
    (<event.start_element: 1>, 'bold', {}, ['hello'])
    (<event.characters: 3>, 'world')
    (<event.end_element: 2>, 'bold', ['hello'])
    (<event.end_element: 2>, 'hello', [])
    >>>

Or…And now for something completely different!…Incremental parsing.

    >>> from amara3.uxml.parser import parsefrags
    >>> events = parsefrags(['<hello', '><bold>world</bold></hello>'])
    >>> for ev in events: print(ev)
    ...
    (<event.start_element: 1>, 'hello', {}, [])
    (<event.start_element: 1>, 'bold', {}, ['hello'])
    (<event.characters: 3>, 'world')
    (<event.end_element: 2>, 'bold

## Implementation notes

Switched to a hand-crafted parser because:

1) Worried about memory consumption of the needed PLY lexer
2) Lack of incremental feed parse for PLY
3) Inspiration from James Clark's JS parser https://github.com/jclark/microxml-js/blob/master/microxml.js

----

Author: [Uche Ogbuji](http://uche.ogbuji.net) <uche@ogbuji.net>
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/uogbuji/amara3-xml",
    "name": "amara3.xml",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "xml,web,data",
    "author": "Uche Ogbuji",
    "author_email": "uche@ogbuji.net",
    "download_url": "https://files.pythonhosted.org/packages/94/86/ca7882c01f98ed4c629a2b1e3322892fbdd50f71ad0595e0c83c1ec8d291/amara3.xml-3.3.0.tar.gz",
    "platform": null,
    "description": "# Amara 3 XML\n\nPython 3 tools for processing [MicroXML](http://www.w3.org/community/microxml/), a simplification of XML. Amara 3 XML implements the MicroXML data model, and allows you to parse into this from tradiional XML and MicroXML.\n\nThe `microx` command line tool is especially useful for quick query and processing of XML.\n\n## Install\n\nRequires Python 3.4+. Just run:\n\n```\npip install amara3.xml\n```\n\n## Use\n\nThough Amara 3 is focused on MicroXML rather than full XML, the reality is that\nmost of the XML-like data you\u2019ll be dealing with is full XML\n1.0. his package provides capabilities to parse legacy XML and reduce it to\nMicroXML. In many cases the biggest implication of this is that\nnamespace information is stripped. As long as you know what you\u2019re doing\nyou can get pretty far by ignoring this, but make sure you know what\nyou\u2019re doing.\n\n    from amara3.uxml import xml\n\n    MONTY_XML = \"\"\"<monty xmlns=\"urn:spam:ignored\">\n      <python spam=\"eggs\">What do you mean \"bleh\"</python>\n      <python ministry=\"abuse\">But I was looking for argument</python>\n    </monty>\"\"\"\n\n    builder = xml.treebuilder()\n    root = builder.parse(MONTY_XML)\n    print(root.xml_name) #\"monty\"\n    child = next(root.xml_children)\n    print(child) #First text node: \"\n  \"\n    child = next(root.xml_children)\n    print(child.xml_value) #\"What do you mean \"bleh\"\"\n    print(child.xml_attributes[\"spam\"]) #\"eggs\"\n\nThere are some utilities to make this a bit easier as well.\n\n    from amara3.uxml import xml\n    from amara3.uxml.treeutil import *\n\n    MONTY_XML = \"\"\"<monty xmlns=\"urn:spam:ignored\">\n      <python spam=\"eggs\">What do you mean \"bleh\"</python>\n      <python ministry=\"abuse\">But I was looking for argument</python>\n    </monty>\"\"\"\n\n    builder = xml.treebuilder()\n    root = builder.parse(MONTY_XML)\n    py1 = next(select_name(root, \"python\"))\n    print(py1.xml_value) #\"What do you mean \"bleh\"\"\n    py2 = next(select_attribute(root, \"ministry\", \"abuse\"))\n    print(py2.xml_value) #\"But I was looking for argument\"\n\n## Experimental MicroXML parser\n\nFor this parser the input truly must be MicroXML. Basics:\n\n    >>> from amara3.uxml.parser import parse\n    >>> events = parse('<hello><bold>world</bold></hello>')\n    >>> for ev in events: print(ev)\n    ...\n    (<event.start_element: 1>, 'hello', {}, [])\n    (<event.start_element: 1>, 'bold', {}, ['hello'])\n    (<event.characters: 3>, 'world')\n    (<event.end_element: 2>, 'bold', ['hello'])\n    (<event.end_element: 2>, 'hello', [])\n    >>>\n\nOr\u2026And now for something completely different!\u2026Incremental parsing.\n\n    >>> from amara3.uxml.parser import parsefrags\n    >>> events = parsefrags(['<hello', '><bold>world</bold></hello>'])\n    >>> for ev in events: print(ev)\n    ...\n    (<event.start_element: 1>, 'hello', {}, [])\n    (<event.start_element: 1>, 'bold', {}, ['hello'])\n    (<event.characters: 3>, 'world')\n    (<event.end_element: 2>, 'bold\n\n## Implementation notes\n\nSwitched to a hand-crafted parser because:\n\n1) Worried about memory consumption of the needed PLY lexer\n2) Lack of incremental feed parse for PLY\n3) Inspiration from James Clark's JS parser https://github.com/jclark/microxml-js/blob/master/microxml.js\n\n----\n\nAuthor: [Uche Ogbuji](http://uche.ogbuji.net) <uche@ogbuji.net>",
    "bugtrack_url": null,
    "license": "License :: OSI Approved :: Apache Software License",
    "summary": "Amara3 project, which offers a variety of data processing tools. This module adds the MicroXML support, and adaptation to classic XML.",
    "version": "3.3.0",
    "split_keywords": [
        "xml",
        "web",
        "data"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "30536f1578c86e3d94c42726111ef454",
                "sha256": "0464035e4ef743d906b35000a418fa46196b6567ce09726721b5c3c20ec5c5d2"
            },
            "downloads": -1,
            "filename": "amara3.xml-3.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "30536f1578c86e3d94c42726111ef454",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 49388,
            "upload_time": "2022-12-19T16:53:03",
            "upload_time_iso_8601": "2022-12-19T16:53:03.985887Z",
            "url": "https://files.pythonhosted.org/packages/94/86/ca7882c01f98ed4c629a2b1e3322892fbdd50f71ad0595e0c83c1ec8d291/amara3.xml-3.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-19 16:53:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "uogbuji",
    "github_project": "amara3-xml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pytest",
            "specs": []
        },
        {
            "name": "ply",
            "specs": []
        },
        {
            "name": "amara3-iri",
            "specs": []
        }
    ],
    "lcname": "amara3.xml"
}
        
Elapsed time: 0.02043s