mwxml


Namemwxml JSON
Version 0.3.4 PyPI version JSON
download
home_pagehttps://github.com/mediawiki-utilities/python-mwxml
SummaryA set of utilities for processing MediaWiki XML dump data.
upload_time2024-07-09 17:32:40
maintainerNone
docs_urlhttps://pythonhosted.org/mwxml/
authorAaron Halfaker
requires_pythonNone
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MediaWiki XML

This library contains a collection of utilities for efficiently 
processing MediaWiki’s XML database dumps. There are two 
important concerns that this module intends to address: 
complexity and performance of streaming XML parsing.  This library
enables memory efficent stream processing of XML dumps with 
a simple [`iterator`](https://pythonhosted.org/mwxml/iteration.html) 
strategy.  This library also implements a distributed
processing strategy (see 
[`map()`](https://pythonhosted.org/mwxml/map.html)) that enables parallel
processing of many XML dump files at the same time. 

* **Installation:** ``pip install mwxml``
* **Documentation:** https://pythonhosted.org/mwxml
* **Repositiory:** https://github.com/mediawiki-utilities/python-mwxml
* **License:** MIT

## Example

    >>> import mwxml
    >>>
    >>> dump = mwxml.Dump.from_file(open("dump.xml"))
    >>> print(dump.site_info.name, dump.site_info.dbname)
    Wikipedia enwiki
    >>>
    >>> for page in dump:
    ...     for revision in page:
    ...        print(revision.id)
    ...
    1
    2
    3
    
## Author
* Aaron Halfaker -- https://github.com/halfak

## See also 
* http://dumps.wikimedia.org/
* http://community.wikia.com/wiki/Help:Database_download

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mediawiki-utilities/python-mwxml",
    "name": "mwxml",
    "maintainer": null,
    "docs_url": "https://pythonhosted.org/mwxml/",
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Aaron Halfaker",
    "author_email": "aaron.halfaker@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f4/45/06b0018fcb876174e0ef996d936c114a5375e23c7121c6eb84ddfc3c5543/mwxml-0.3.4.tar.gz",
    "platform": null,
    "description": "# MediaWiki XML\n\nThis library contains a collection of utilities for efficiently \nprocessing MediaWiki\u2019s XML database dumps. There are two \nimportant concerns that this module intends to address: \ncomplexity and performance of streaming XML parsing.  This library\nenables memory efficent stream processing of XML dumps with \na simple [`iterator`](https://pythonhosted.org/mwxml/iteration.html) \nstrategy.  This library also implements a distributed\nprocessing strategy (see \n[`map()`](https://pythonhosted.org/mwxml/map.html)) that enables parallel\nprocessing of many XML dump files at the same time. \n\n* **Installation:** ``pip install mwxml``\n* **Documentation:** https://pythonhosted.org/mwxml\n* **Repositiory:** https://github.com/mediawiki-utilities/python-mwxml\n* **License:** MIT\n\n## Example\n\n    >>> import mwxml\n    >>>\n    >>> dump = mwxml.Dump.from_file(open(\"dump.xml\"))\n    >>> print(dump.site_info.name, dump.site_info.dbname)\n    Wikipedia enwiki\n    >>>\n    >>> for page in dump:\n    ...     for revision in page:\n    ...        print(revision.id)\n    ...\n    1\n    2\n    3\n    \n## Author\n* Aaron Halfaker -- https://github.com/halfak\n\n## See also \n* http://dumps.wikimedia.org/\n* http://community.wikia.com/wiki/Help:Database_download\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A set of utilities for processing MediaWiki XML dump data.",
    "version": "0.3.4",
    "project_urls": {
        "Homepage": "https://github.com/mediawiki-utilities/python-mwxml"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2a3bdab72fc52e0b89034b2e1bb191024e5415fa9afab93afa1e997295d44c4b",
                "md5": "0dec28f32120d2772976cb465a3ebc62",
                "sha256": "f109225a47f629a1ddf73826c462efb9dc6fa6df7c546c6ac452424cc6034d52"
            },
            "downloads": -1,
            "filename": "mwxml-0.3.4-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0dec28f32120d2772976cb465a3ebc62",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 27456,
            "upload_time": "2024-07-09T17:32:38",
            "upload_time_iso_8601": "2024-07-09T17:32:38.858481Z",
            "url": "https://files.pythonhosted.org/packages/2a/3b/dab72fc52e0b89034b2e1bb191024e5415fa9afab93afa1e997295d44c4b/mwxml-0.3.4-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f44506b0018fcb876174e0ef996d936c114a5375e23c7121c6eb84ddfc3c5543",
                "md5": "93b2430b466dca644003f79612a3d5c3",
                "sha256": "7a37f745f770704a7419efbde9d391b874b9071dbc192b3b1f81c3d4b52775ee"
            },
            "downloads": -1,
            "filename": "mwxml-0.3.4.tar.gz",
            "has_sig": false,
            "md5_digest": "93b2430b466dca644003f79612a3d5c3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 18246,
            "upload_time": "2024-07-09T17:32:40",
            "upload_time_iso_8601": "2024-07-09T17:32:40.547882Z",
            "url": "https://files.pythonhosted.org/packages/f4/45/06b0018fcb876174e0ef996d936c114a5375e23c7121c6eb84ddfc3c5543/mwxml-0.3.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-09 17:32:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mediawiki-utilities",
    "github_project": "python-mwxml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "tox": true,
    "lcname": "mwxml"
}
        
Elapsed time: 0.45289s