# MediaWiki XML
This library contains a collection of utilities for efficiently
processing MediaWiki’s XML database dumps. There are two
important concerns that this module intends to address:
complexity and performance of streaming XML parsing. This library
enables memory efficent stream processing of XML dumps with
a simple [`iterator`](https://pythonhosted.org/mwxml/iteration.html)
strategy. This library also implements a distributed
processing strategy (see
[`map()`](https://pythonhosted.org/mwxml/map.html)) that enables parallel
processing of many XML dump files at the same time.
* **Installation:** ``pip install mwxml``
* **Documentation:** https://pythonhosted.org/mwxml
* **Repositiory:** https://github.com/mediawiki-utilities/python-mwxml
* **License:** MIT
## Example
>>> import mwxml
>>>
>>> dump = mwxml.Dump.from_file(open("dump.xml"))
>>> print(dump.site_info.name, dump.site_info.dbname)
Wikipedia enwiki
>>>
>>> for page in dump:
... for revision in page:
... print(revision.id)
...
1
2
3
## Author
* Aaron Halfaker -- https://github.com/halfak
## See also
* http://dumps.wikimedia.org/
* http://community.wikia.com/wiki/Help:Database_download
Raw data
{
"_id": null,
"home_page": "https://github.com/mediawiki-utilities/python-mwxml",
"name": "mwxml",
"maintainer": null,
"docs_url": "https://pythonhosted.org/mwxml/",
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Aaron Halfaker",
"author_email": "aaron.halfaker@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/a8/d3/78e0b7d2ac9a8e5e4af1157e30d3ae575edc0cdb618706ea4eb8649cc099/mwxml-0.3.6.tar.gz",
"platform": null,
"description": "# MediaWiki XML\n\nThis library contains a collection of utilities for efficiently \nprocessing MediaWiki\u2019s XML database dumps. There are two \nimportant concerns that this module intends to address: \ncomplexity and performance of streaming XML parsing. This library\nenables memory efficent stream processing of XML dumps with \na simple [`iterator`](https://pythonhosted.org/mwxml/iteration.html) \nstrategy. This library also implements a distributed\nprocessing strategy (see \n[`map()`](https://pythonhosted.org/mwxml/map.html)) that enables parallel\nprocessing of many XML dump files at the same time. \n\n* **Installation:** ``pip install mwxml``\n* **Documentation:** https://pythonhosted.org/mwxml\n* **Repositiory:** https://github.com/mediawiki-utilities/python-mwxml\n* **License:** MIT\n\n## Example\n\n >>> import mwxml\n >>>\n >>> dump = mwxml.Dump.from_file(open(\"dump.xml\"))\n >>> print(dump.site_info.name, dump.site_info.dbname)\n Wikipedia enwiki\n >>>\n >>> for page in dump:\n ... for revision in page:\n ... print(revision.id)\n ...\n 1\n 2\n 3\n\n## Author\n* Aaron Halfaker -- https://github.com/halfak\n\n## See also \n* http://dumps.wikimedia.org/\n* http://community.wikia.com/wiki/Help:Database_download\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A set of utilities for processing MediaWiki XML dump data.",
"version": "0.3.6",
"project_urls": {
"Homepage": "https://github.com/mediawiki-utilities/python-mwxml"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3be8cf48d7e707faf85e5bcaaa08d02d5ee9dd4b64c45a6e4c11c5d71f798d86",
"md5": "66f7acd05b591b47c82f0228a0397d57",
"sha256": "f5e0cde46c7d4b0d1d921f8f0aa14d691b6eaa6532b901dacc6c7407be26c70a"
},
"downloads": -1,
"filename": "mwxml-0.3.6-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "66f7acd05b591b47c82f0228a0397d57",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 33198,
"upload_time": "2025-02-13T22:28:47",
"upload_time_iso_8601": "2025-02-13T22:28:47.101867Z",
"url": "https://files.pythonhosted.org/packages/3b/e8/cf48d7e707faf85e5bcaaa08d02d5ee9dd4b64c45a6e4c11c5d71f798d86/mwxml-0.3.6-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a8d378e0b7d2ac9a8e5e4af1157e30d3ae575edc0cdb618706ea4eb8649cc099",
"md5": "8de2c5fccf366a4eaa990a01b54aa37e",
"sha256": "5a53181d302152ad03ec513ff186a89e0c7e1fcc50c78330452a0872caa63935"
},
"downloads": -1,
"filename": "mwxml-0.3.6.tar.gz",
"has_sig": false,
"md5_digest": "8de2c5fccf366a4eaa990a01b54aa37e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 18536,
"upload_time": "2025-02-13T22:28:48",
"upload_time_iso_8601": "2025-02-13T22:28:48.307630Z",
"url": "https://files.pythonhosted.org/packages/a8/d3/78e0b7d2ac9a8e5e4af1157e30d3ae575edc0cdb618706ea4eb8649cc099/mwxml-0.3.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-13 22:28:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mediawiki-utilities",
"github_project": "python-mwxml",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "mwtypes",
"specs": [
[
">=",
"0.4.0"
]
]
},
{
"name": "mwcli",
"specs": [
[
">=",
"0.0.2"
]
]
},
{
"name": "para",
"specs": [
[
">=",
"0.0.1"
]
]
},
{
"name": "jsonschema",
"specs": [
[
">=",
"2.5.1"
]
]
}
],
"tox": true,
"lcname": "mwxml"
}