# MediaWiki XML
This library contains a collection of utilities for efficiently
processing MediaWiki’s XML database dumps. There are two
important concerns that this module intends to address:
complexity and performance of streaming XML parsing. This library
enables memory efficent stream processing of XML dumps with
a simple [`iterator`](https://pythonhosted.org/mwxml/iteration.html)
strategy. This library also implements a distributed
processing strategy (see
[`map()`](https://pythonhosted.org/mwxml/map.html)) that enables parallel
processing of many XML dump files at the same time.
* **Installation:** ``pip install mwxml``
* **Documentation:** https://pythonhosted.org/mwxml
* **Repositiory:** https://github.com/mediawiki-utilities/python-mwxml
* **License:** MIT
## Example
>>> import mwxml
>>>
>>> dump = mwxml.Dump.from_file(open("dump.xml"))
>>> print(dump.site_info.name, dump.site_info.dbname)
Wikipedia enwiki
>>>
>>> for page in dump:
... for revision in page:
... print(revision.id)
...
1
2
3
## Author
* Aaron Halfaker -- https://github.com/halfak
## See also
* http://dumps.wikimedia.org/
* http://community.wikia.com/wiki/Help:Database_download
Raw data
{
"_id": null,
"home_page": "https://github.com/mediawiki-utilities/python-mwxml",
"name": "mwxml",
"maintainer": null,
"docs_url": "https://pythonhosted.org/mwxml/",
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Aaron Halfaker",
"author_email": "aaron.halfaker@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f4/45/06b0018fcb876174e0ef996d936c114a5375e23c7121c6eb84ddfc3c5543/mwxml-0.3.4.tar.gz",
"platform": null,
"description": "# MediaWiki XML\n\nThis library contains a collection of utilities for efficiently \nprocessing MediaWiki\u2019s XML database dumps. There are two \nimportant concerns that this module intends to address: \ncomplexity and performance of streaming XML parsing. This library\nenables memory efficent stream processing of XML dumps with \na simple [`iterator`](https://pythonhosted.org/mwxml/iteration.html) \nstrategy. This library also implements a distributed\nprocessing strategy (see \n[`map()`](https://pythonhosted.org/mwxml/map.html)) that enables parallel\nprocessing of many XML dump files at the same time. \n\n* **Installation:** ``pip install mwxml``\n* **Documentation:** https://pythonhosted.org/mwxml\n* **Repositiory:** https://github.com/mediawiki-utilities/python-mwxml\n* **License:** MIT\n\n## Example\n\n >>> import mwxml\n >>>\n >>> dump = mwxml.Dump.from_file(open(\"dump.xml\"))\n >>> print(dump.site_info.name, dump.site_info.dbname)\n Wikipedia enwiki\n >>>\n >>> for page in dump:\n ... for revision in page:\n ... print(revision.id)\n ...\n 1\n 2\n 3\n \n## Author\n* Aaron Halfaker -- https://github.com/halfak\n\n## See also \n* http://dumps.wikimedia.org/\n* http://community.wikia.com/wiki/Help:Database_download\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A set of utilities for processing MediaWiki XML dump data.",
"version": "0.3.4",
"project_urls": {
"Homepage": "https://github.com/mediawiki-utilities/python-mwxml"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2a3bdab72fc52e0b89034b2e1bb191024e5415fa9afab93afa1e997295d44c4b",
"md5": "0dec28f32120d2772976cb465a3ebc62",
"sha256": "f109225a47f629a1ddf73826c462efb9dc6fa6df7c546c6ac452424cc6034d52"
},
"downloads": -1,
"filename": "mwxml-0.3.4-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "0dec28f32120d2772976cb465a3ebc62",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 27456,
"upload_time": "2024-07-09T17:32:38",
"upload_time_iso_8601": "2024-07-09T17:32:38.858481Z",
"url": "https://files.pythonhosted.org/packages/2a/3b/dab72fc52e0b89034b2e1bb191024e5415fa9afab93afa1e997295d44c4b/mwxml-0.3.4-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f44506b0018fcb876174e0ef996d936c114a5375e23c7121c6eb84ddfc3c5543",
"md5": "93b2430b466dca644003f79612a3d5c3",
"sha256": "7a37f745f770704a7419efbde9d391b874b9071dbc192b3b1f81c3d4b52775ee"
},
"downloads": -1,
"filename": "mwxml-0.3.4.tar.gz",
"has_sig": false,
"md5_digest": "93b2430b466dca644003f79612a3d5c3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 18246,
"upload_time": "2024-07-09T17:32:40",
"upload_time_iso_8601": "2024-07-09T17:32:40.547882Z",
"url": "https://files.pythonhosted.org/packages/f4/45/06b0018fcb876174e0ef996d936c114a5375e23c7121c6eb84ddfc3c5543/mwxml-0.3.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-09 17:32:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mediawiki-utilities",
"github_project": "python-mwxml",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"tox": true,
"lcname": "mwxml"
}