# Wikipedia article quality classification
This library provides a set of utilities for performing automatic detection of
assessment classes of Wikipedia articles. For more information, see the full
documentation at https://articlequality.readthedocs.io .
**Compatible with Python 3.x only.** Sorry.
* **Install:** ``pip install articlequality``
* **Models:** https://github.com/wikimedia/articlequality/tree/master/models
* **Documentation:** https://articlequality.readthedocs.io
## Basic usage
>>> import articlequality
>>> from revscoring import Model
>>>
>>> scorer_model = Model.load(open("models/enwiki.nettrom_wp10.gradient_boosting.model", "rb"))
>>>
>>> text = "I am the text of a page. I have a <ref>word</ref>"
>>> articlequality.score(scorer_model, text)
{'prediction': 'stub',
'probability': {'stub': 0.27156163795807853,
'b': 0.14707452309674252,
'fa': 0.16844898943510833,
'c': 0.057668704007171959,
'ga': 0.21617801281707663,
'start': 0.13906813268582238}}
## Install
### Requirements
* Python 3.5, 3.6 or 3.7
* All the system requirements of [revscoring](https://github.com/wikimedia/revscoring)
### Installation steps
1. clone this repository
2. install the package itself and its dependencies `python setup.py install`
3. You can verify that your installation worked by running `make enwiki_models` to build the English Wikipedia article quality model or `make wikidatawiki_models` to build the item quality model for Wikidata
### Retraining the models
To retrain a model, run `make -B MODEL` e.g. `make -B wikidatawiki_models`. This will redownload the labels, re-extract the features from the revisions, and then retrain and rescore the model.
To skip re-downloading the training labels and re-extracting the features, it is enough `touch` the files in the `datasets/` directory and run the `make` command without the `-B` flag.
### Running tests
Example:
```
pytest -vv tests/feature_lists/test_wikidatawiki.py
```
## Authors
* Aaron Halfaker -- https://github.com/halfak
* Morten Warncke-Wang -- https://github.com/nettrom
Raw data
{
"_id": null,
"home_page": "https://github.com/wikimedia/articlequality",
"name": "articlequality",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Aaron Halfaker / Morten Warncke-Wang",
"author_email": "ahalfaker@wikimedia.org",
"download_url": "https://files.pythonhosted.org/packages/bb/34/f0817607bff0e4b1f6da7c328cd06db69f6adbc5aec06be0b53ed06f0ca3/articlequality-0.4.4.tar.gz",
"platform": null,
"description": "# Wikipedia article quality classification\n\nThis library provides a set of utilities for performing automatic detection of\nassessment classes of Wikipedia articles. For more information, see the full\ndocumentation at https://articlequality.readthedocs.io .\n\n**Compatible with Python 3.x only.** Sorry.\n\n* **Install:** ``pip install articlequality``\n* **Models:** https://github.com/wikimedia/articlequality/tree/master/models\n* **Documentation:** https://articlequality.readthedocs.io\n\n## Basic usage\n\n >>> import articlequality\n >>> from revscoring import Model\n >>>\n >>> scorer_model = Model.load(open(\"models/enwiki.nettrom_wp10.gradient_boosting.model\", \"rb\"))\n >>>\n >>> text = \"I am the text of a page. I have a <ref>word</ref>\"\n >>> articlequality.score(scorer_model, text)\n {'prediction': 'stub',\n 'probability': {'stub': 0.27156163795807853,\n 'b': 0.14707452309674252,\n 'fa': 0.16844898943510833,\n 'c': 0.057668704007171959,\n 'ga': 0.21617801281707663,\n 'start': 0.13906813268582238}}\n\n## Install\n\n### Requirements\n\n* Python 3.5, 3.6 or 3.7\n* All the system requirements of [revscoring](https://github.com/wikimedia/revscoring)\n\n### Installation steps\n\n1. clone this repository\n2. install the package itself and its dependencies `python setup.py install`\n3. You can verify that your installation worked by running `make enwiki_models` to build the English Wikipedia article quality model or `make wikidatawiki_models` to build the item quality model for Wikidata\n\n### Retraining the models\n\nTo retrain a model, run `make -B MODEL` e.g. `make -B wikidatawiki_models`. This will redownload the labels, re-extract the features from the revisions, and then retrain and rescore the model.\n\nTo skip re-downloading the training labels and re-extracting the features, it is enough `touch` the files in the `datasets/` directory and run the `make` command without the `-B` flag.\n\n### Running tests\n\nExample:\n\n```\npytest -vv tests/feature_lists/test_wikidatawiki.py\n```\n\n## Authors\n* Aaron Halfaker -- https://github.com/halfak\n* Morten Warncke-Wang -- https://github.com/nettrom\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A library for performing automatic detection of assessment classes of Wikipedia articles.",
"version": "0.4.4",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "1a6328818e9f111602d8c26f176f7d40",
"sha256": "eae688b3bf7d1c0b2a7b72e7c1bb92f18e40b604efb3ff138c091f68ed4e3b2d"
},
"downloads": -1,
"filename": "articlequality-0.4.4-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "1a6328818e9f111602d8c26f176f7d40",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 56132,
"upload_time": "2022-12-21T15:00:35",
"upload_time_iso_8601": "2022-12-21T15:00:35.829010Z",
"url": "https://files.pythonhosted.org/packages/c4/71/a732ea3f6296f8906956eaed94aeff6485890a49070528cc2f3088860946/articlequality-0.4.4-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "e2d569caca034ea693310672b4f40ee4",
"sha256": "c2a5b504890e5e41db17e44cdc5b473da73dbaa094b004013af9b4d771717262"
},
"downloads": -1,
"filename": "articlequality-0.4.4.tar.gz",
"has_sig": false,
"md5_digest": "e2d569caca034ea693310672b4f40ee4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 36976,
"upload_time": "2022-12-21T15:00:37",
"upload_time_iso_8601": "2022-12-21T15:00:37.740449Z",
"url": "https://files.pythonhosted.org/packages/bb/34/f0817607bff0e4b1f6da7c328cd06db69f6adbc5aec06be0b53ed06f0ca3/articlequality-0.4.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-12-21 15:00:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "wikimedia",
"github_project": "articlequality",
"travis_ci": true,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "mwapi",
"specs": []
},
{
"name": "mwbase",
"specs": [
[
"<",
"0.1.999"
],
[
">=",
"0.1.0"
]
]
},
{
"name": "mwreverts",
"specs": []
},
{
"name": "mwtypes",
"specs": []
},
{
"name": "mwxml",
"specs": [
[
">=",
"0.3.3"
]
]
},
{
"name": "revscoring",
"specs": [
[
"<",
"2.11.999"
],
[
">=",
"2.11.0"
]
]
}
],
"tox": true,
"lcname": "articlequality"
}