mwxml


Namemwxml JSON
Version 0.3.6 PyPI version JSON
download
home_pagehttps://github.com/mediawiki-utilities/python-mwxml
SummaryA set of utilities for processing MediaWiki XML dump data.
upload_time2025-02-13 22:28:48
maintainerNone
docs_urlhttps://pythonhosted.org/mwxml/
authorAaron Halfaker
requires_pythonNone
licenseMIT
keywords
VCS
bugtrack_url
requirements mwtypes mwcli para jsonschema
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MediaWiki XML

This library contains a collection of utilities for efficiently 
processing MediaWiki’s XML database dumps. There are two 
important concerns that this module intends to address: 
complexity and performance of streaming XML parsing.  This library
enables memory efficent stream processing of XML dumps with 
a simple [`iterator`](https://pythonhosted.org/mwxml/iteration.html) 
strategy.  This library also implements a distributed
processing strategy (see 
[`map()`](https://pythonhosted.org/mwxml/map.html)) that enables parallel
processing of many XML dump files at the same time. 

* **Installation:** ``pip install mwxml``
* **Documentation:** https://pythonhosted.org/mwxml
* **Repositiory:** https://github.com/mediawiki-utilities/python-mwxml
* **License:** MIT

## Example

    >>> import mwxml
    >>>
    >>> dump = mwxml.Dump.from_file(open("dump.xml"))
    >>> print(dump.site_info.name, dump.site_info.dbname)
    Wikipedia enwiki
    >>>
    >>> for page in dump:
    ...     for revision in page:
    ...        print(revision.id)
    ...
    1
    2
    3

## Author
* Aaron Halfaker -- https://github.com/halfak

## See also 
* http://dumps.wikimedia.org/
* http://community.wikia.com/wiki/Help:Database_download



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mediawiki-utilities/python-mwxml",
    "name": "mwxml",
    "maintainer": null,
    "docs_url": "https://pythonhosted.org/mwxml/",
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Aaron Halfaker",
    "author_email": "aaron.halfaker@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a8/d3/78e0b7d2ac9a8e5e4af1157e30d3ae575edc0cdb618706ea4eb8649cc099/mwxml-0.3.6.tar.gz",
    "platform": null,
    "description": "# MediaWiki XML\n\nThis library contains a collection of utilities for efficiently \nprocessing MediaWiki\u2019s XML database dumps. There are two \nimportant concerns that this module intends to address: \ncomplexity and performance of streaming XML parsing.  This library\nenables memory efficent stream processing of XML dumps with \na simple [`iterator`](https://pythonhosted.org/mwxml/iteration.html) \nstrategy.  This library also implements a distributed\nprocessing strategy (see \n[`map()`](https://pythonhosted.org/mwxml/map.html)) that enables parallel\nprocessing of many XML dump files at the same time. \n\n* **Installation:** ``pip install mwxml``\n* **Documentation:** https://pythonhosted.org/mwxml\n* **Repositiory:** https://github.com/mediawiki-utilities/python-mwxml\n* **License:** MIT\n\n## Example\n\n    >>> import mwxml\n    >>>\n    >>> dump = mwxml.Dump.from_file(open(\"dump.xml\"))\n    >>> print(dump.site_info.name, dump.site_info.dbname)\n    Wikipedia enwiki\n    >>>\n    >>> for page in dump:\n    ...     for revision in page:\n    ...        print(revision.id)\n    ...\n    1\n    2\n    3\n\n## Author\n* Aaron Halfaker -- https://github.com/halfak\n\n## See also \n* http://dumps.wikimedia.org/\n* http://community.wikia.com/wiki/Help:Database_download\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A set of utilities for processing MediaWiki XML dump data.",
    "version": "0.3.6",
    "project_urls": {
        "Homepage": "https://github.com/mediawiki-utilities/python-mwxml"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3be8cf48d7e707faf85e5bcaaa08d02d5ee9dd4b64c45a6e4c11c5d71f798d86",
                "md5": "66f7acd05b591b47c82f0228a0397d57",
                "sha256": "f5e0cde46c7d4b0d1d921f8f0aa14d691b6eaa6532b901dacc6c7407be26c70a"
            },
            "downloads": -1,
            "filename": "mwxml-0.3.6-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "66f7acd05b591b47c82f0228a0397d57",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 33198,
            "upload_time": "2025-02-13T22:28:47",
            "upload_time_iso_8601": "2025-02-13T22:28:47.101867Z",
            "url": "https://files.pythonhosted.org/packages/3b/e8/cf48d7e707faf85e5bcaaa08d02d5ee9dd4b64c45a6e4c11c5d71f798d86/mwxml-0.3.6-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a8d378e0b7d2ac9a8e5e4af1157e30d3ae575edc0cdb618706ea4eb8649cc099",
                "md5": "8de2c5fccf366a4eaa990a01b54aa37e",
                "sha256": "5a53181d302152ad03ec513ff186a89e0c7e1fcc50c78330452a0872caa63935"
            },
            "downloads": -1,
            "filename": "mwxml-0.3.6.tar.gz",
            "has_sig": false,
            "md5_digest": "8de2c5fccf366a4eaa990a01b54aa37e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 18536,
            "upload_time": "2025-02-13T22:28:48",
            "upload_time_iso_8601": "2025-02-13T22:28:48.307630Z",
            "url": "https://files.pythonhosted.org/packages/a8/d3/78e0b7d2ac9a8e5e4af1157e30d3ae575edc0cdb618706ea4eb8649cc099/mwxml-0.3.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-13 22:28:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mediawiki-utilities",
    "github_project": "python-mwxml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "mwtypes",
            "specs": [
                [
                    ">=",
                    "0.4.0"
                ]
            ]
        },
        {
            "name": "mwcli",
            "specs": [
                [
                    ">=",
                    "0.0.2"
                ]
            ]
        },
        {
            "name": "para",
            "specs": [
                [
                    ">=",
                    "0.0.1"
                ]
            ]
        },
        {
            "name": "jsonschema",
            "specs": [
                [
                    ">=",
                    "2.5.1"
                ]
            ]
        }
    ],
    "tox": true,
    "lcname": "mwxml"
}
        
Elapsed time: 6.13567s