[![Build Status](https://travis-ci.org/wikimedia/revscoring.svg?branch=master)](https://travis-ci.org/wikimedia/revscoring)
[![Test coverage](https://codecov.io/gh/wikimedia/revscoring/branch/master/graph/badge.svg)](https://codecov.io/gh/wikimedia/revscoring)
[![GitHub license](https://img.shields.io/github/license/wikimedia/revscoring.svg)](./LICENSE)
[![PyPI version](https://badge.fury.io/py/revscoring.svg)](https://badge.fury.io/py/revscoring)
# Revision Scoring
A generic, machine learning-based revision scoring system designed to help automate critical wiki-work — for example, vandalism detection and removal. This library powers [ORES](https://ores.wikimedia.org).
## Example
Using a scorer_model to score a revision::
```python
import mwapi
from revscoring import Model
from revscoring.extractors.api.extractor import Extractor
with open("models/enwiki.damaging.linear_svc.model") as f:
scorer_model = Model.load(f)
extractor = Extractor(mwapi.Session(host="https://en.wikipedia.org",
user_agent="revscoring demo"))
feature_values = list(extractor.extract(123456789, scorer_model.features))
print(scorer_model.score(feature_values))
{'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}
```
# Installation
The easiest way to install is via the Python package installer
(pip).
``pip install revscoring``
You may find that some of the dependencies fail to compile (namely
`scipy`, `numpy` and `sklearn`). In that case, you'll need to install some
dependencies in your operating system.
### Ubuntu & Debian:
* Run ``sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev enchant``
* Run ``sudo apt-get install aspell-ar aspell-bn aspell-el aspell-id aspell-is aspell-pl aspell-ro aspell-sv aspell-ta aspell-uk myspell-cs myspell-de-at myspell-de-ch myspell-de-de myspell-es myspell-et myspell-fa myspell-fr myspell-he myspell-hr myspell-hu myspell-lv myspell-nb myspell-nl myspell-pt-pt myspell-pt-br myspell-ru myspell-hr hunspell-bs hunspell-ca hunspell-en-au hunspell-en-us hunspell-en-gb hunspell-eu hunspell-gl hunspell-it hunspell-hi hunspell-sr hunspell-vi voikko-fi``
<!-- ### Windows:
<i>TODO</i>
-->
### MacOS:
Using Homebrew and pip, installing `revscoring` and `enchant` can be accomplished
as follows::
```bash
brew install aspell --with-all-languages
brew install enchant
pip install --no-binary pyenchant revscoring
```
#### Adding languages in aspell (MacOS only)
```bash
cd /tmp
wget http://ftp.gnu.org/gnu/aspell/dict/pt/aspell-pt-0.50-2.tar.bz2
bzip2 -dc aspell-pt-0.50-2.tar.bz2 | tar xvf -
cd aspell-pt-0.50-2
./configure
make
sudo make install
```
Caveats: <br>
<b><u> The differences between the `aspell` and `myspell` dictionaries can cause </b>
<b> <u>some of the tests to fail </b>
Finally, in order to make use of language features, you'll need to download
some NLTK data. The following command will get the necessary corpora.
``python -m nltk.downloader omw sentiwordnet stopwords wordnet``
You'll also need to install [enchant](https://en.wikipedia.org/wiki/Enchant_(software))-compatible
dictionaries of the languages you'd like to use. We recommend the following:
* languages.arabic: aspell-ar
* languages.basque: hunspell-eu
* languages.bengali: aspell-bn
* languages.bosnian: hunspell-bs
* languages.catalan: myspell-ca
* languages.czech: myspell-cs
* languages.croatian: myspell-hr
* languages.dutch: myspell-nl
* languages.english: myspell-en-us myspell-en-gb myspell-en-au
* languages.estonian: myspell-et
* languages.finnish: voikko-fi
* languages.french: myspell-fr
* languages.galician: hunspell-gl
* languages.german: myspell-de-at myspell-de-ch myspell-de-de
* languages.greek: aspell-el
* languages.hebrew: myspell-he
* languages.hindi: aspell-hi
* languages.hungarian: myspell-hu
* languages.icelandic: aspell-is
* languages.indonesian: aspell-id
* languages.italian: myspell-it
* languages.latvian: myspell-lv
* languages.norwegian: myspell-nb
* languages.persian: myspell-fa
* languages.polish: aspell-pl
* languages.portuguese: myspell-pt-pt myspell-pt-br
* languages.serbian: hunspell-sr
* languages.spanish: myspell-es
* languages.swedish: aspell-sv
* languages.tamil: aspell-ta
* languages.russian: myspell-ru
* languages.ukrainian: aspell-uk
* languages.vietnamese: hunspell-vi
# Development
To contribute, ensure to install the dependencies:
```bash
$ pip install -r requirements.txt
```
Install necessary NLTK data:
``python -m nltk.downloader omw sentiwordnet stopwords wordnet``
## Running tests
Make sure you install test dependencies:
```bash
$ pip install -r test-requirements.txt
```
Then run:
```bash
$ pytest . -vv
```
# Reporting bugs
To report a bug, please use [Phabricator](https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=revscoring)
# Authors
* [Aaron Halfaker](http://halfaker.info)
* [Helder](https://github.com/he7d3r)
* [Adam Roses Wight](https://mediawiki.org/wiki/User:Adamw)
* [Amir Sarabadani](https://github.com/Ladsgroup)
Raw data
{
"_id": null,
"home_page": "https://github.com/wikimedia/revscoring",
"name": "revscoring",
"maintainer": "",
"docs_url": "https://pythonhosted.org/revscoring/",
"requires_python": ">=3",
"maintainer_email": "",
"keywords": "",
"author": "Aaron Halfaker",
"author_email": "ahalfaker@wikimedia.org",
"download_url": "https://files.pythonhosted.org/packages/3a/bb/d5e727c4c7731cc98a73709790d9027e7da49fd562fc5a6fca7d3ddc9fa2/revscoring-2.11.13.tar.gz",
"platform": null,
"description": "[![Build Status](https://travis-ci.org/wikimedia/revscoring.svg?branch=master)](https://travis-ci.org/wikimedia/revscoring)\n[![Test coverage](https://codecov.io/gh/wikimedia/revscoring/branch/master/graph/badge.svg)](https://codecov.io/gh/wikimedia/revscoring)\n[![GitHub license](https://img.shields.io/github/license/wikimedia/revscoring.svg)](./LICENSE)\n[![PyPI version](https://badge.fury.io/py/revscoring.svg)](https://badge.fury.io/py/revscoring)\n# Revision Scoring\n\nA generic, machine learning-based revision scoring system designed to help automate critical wiki-work \u2014 for example, vandalism detection and removal. This library powers [ORES](https://ores.wikimedia.org).\n\n## Example\n\n\nUsing a scorer_model to score a revision::\n```python\n import mwapi\n from revscoring import Model\n from revscoring.extractors.api.extractor import Extractor\n\n with open(\"models/enwiki.damaging.linear_svc.model\") as f:\n scorer_model = Model.load(f)\n\n extractor = Extractor(mwapi.Session(host=\"https://en.wikipedia.org\",\n user_agent=\"revscoring demo\"))\n\n feature_values = list(extractor.extract(123456789, scorer_model.features))\n\n print(scorer_model.score(feature_values))\n {'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}\n ```\n\n\n# Installation\n\nThe easiest way to install is via the Python package installer\n(pip).\n\n``pip install revscoring``\n\nYou may find that some of the dependencies fail to compile (namely\n`scipy`, `numpy` and `sklearn`). In that case, you'll need to install some\ndependencies in your operating system.\n\n### Ubuntu & Debian:\n * Run ``sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev enchant``\n * Run ``sudo apt-get install aspell-ar aspell-bn aspell-el aspell-id aspell-is aspell-pl aspell-ro aspell-sv aspell-ta aspell-uk myspell-cs myspell-de-at myspell-de-ch myspell-de-de myspell-es myspell-et myspell-fa myspell-fr myspell-he myspell-hr myspell-hu myspell-lv myspell-nb myspell-nl myspell-pt-pt myspell-pt-br myspell-ru myspell-hr hunspell-bs hunspell-ca hunspell-en-au hunspell-en-us hunspell-en-gb hunspell-eu hunspell-gl hunspell-it hunspell-hi hunspell-sr hunspell-vi voikko-fi``\n<!-- ### Windows:\n<i>TODO</i>\n-->\n### MacOS:\n Using Homebrew and pip, installing `revscoring` and `enchant` can be accomplished\n as follows::\n\n```bash\nbrew install aspell --with-all-languages\nbrew install enchant\npip install --no-binary pyenchant revscoring\n```\n\n#### Adding languages in aspell (MacOS only)\n\n```bash\ncd /tmp\nwget http://ftp.gnu.org/gnu/aspell/dict/pt/aspell-pt-0.50-2.tar.bz2\nbzip2 -dc aspell-pt-0.50-2.tar.bz2 | tar xvf -\ncd aspell-pt-0.50-2\n./configure\nmake\nsudo make install\n```\n\n Caveats: <br>\n <b><u> The differences between the `aspell` and `myspell` dictionaries can cause </b>\n <b> <u>some of the tests to fail </b>\n\n\nFinally, in order to make use of language features, you'll need to download\nsome NLTK data. The following command will get the necessary corpora.\n\n``python -m nltk.downloader omw sentiwordnet stopwords wordnet``\n\nYou'll also need to install [enchant](https://en.wikipedia.org/wiki/Enchant_(software))-compatible\ndictionaries of the languages you'd like to use. We recommend the following:\n\n* languages.arabic: aspell-ar\n* languages.basque: hunspell-eu\n* languages.bengali: aspell-bn\n* languages.bosnian: hunspell-bs\n* languages.catalan: myspell-ca\n* languages.czech: myspell-cs\n* languages.croatian: myspell-hr\n* languages.dutch: myspell-nl\n* languages.english: myspell-en-us myspell-en-gb myspell-en-au\n* languages.estonian: myspell-et\n* languages.finnish: voikko-fi\n* languages.french: myspell-fr\n* languages.galician: hunspell-gl\n* languages.german: myspell-de-at myspell-de-ch myspell-de-de\n* languages.greek: aspell-el\n* languages.hebrew: myspell-he\n* languages.hindi: aspell-hi\n* languages.hungarian: myspell-hu\n* languages.icelandic: aspell-is\n* languages.indonesian: aspell-id\n* languages.italian: myspell-it\n* languages.latvian: myspell-lv\n* languages.norwegian: myspell-nb\n* languages.persian: myspell-fa\n* languages.polish: aspell-pl\n* languages.portuguese: myspell-pt-pt myspell-pt-br\n* languages.serbian: hunspell-sr\n* languages.spanish: myspell-es\n* languages.swedish: aspell-sv\n* languages.tamil: aspell-ta\n* languages.russian: myspell-ru\n* languages.ukrainian: aspell-uk\n* languages.vietnamese: hunspell-vi\n\n# Development\nTo contribute, ensure to install the dependencies:\n```bash\n$ pip install -r requirements.txt\n```\n\nInstall necessary NLTK data:\n\n``python -m nltk.downloader omw sentiwordnet stopwords wordnet``\n\n## Running tests\nMake sure you install test dependencies:\n\n```bash\n$ pip install -r test-requirements.txt\n```\n\nThen run:\n\n```bash\n$ pytest . -vv\n```\n# Reporting bugs\nTo report a bug, please use [Phabricator](https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=revscoring)\n\n# Authors\n * [Aaron Halfaker](http://halfaker.info)\n * [Helder](https://github.com/he7d3r)\n * [Adam Roses Wight](https://mediawiki.org/wiki/User:Adamw)\n * [Amir Sarabadani](https://github.com/Ladsgroup)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A set of utilities for generating quality scores for MediaWiki revisions",
"version": "2.11.13",
"project_urls": {
"Homepage": "https://github.com/wikimedia/revscoring"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b54a5852c461a723e75d5f79efcd1344194867bbc40da54aa15aba6523c76379",
"md5": "8046ed079a74989ca028b406928a4263",
"sha256": "a27ede868393892a491bd9e589ba4866f9a2c459b3d3dd19fbd68c898efe8c5f"
},
"downloads": -1,
"filename": "revscoring-2.11.13-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "8046ed079a74989ca028b406928a4263",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3",
"size": 382252,
"upload_time": "2023-09-20T15:41:17",
"upload_time_iso_8601": "2023-09-20T15:41:17.409454Z",
"url": "https://files.pythonhosted.org/packages/b5/4a/5852c461a723e75d5f79efcd1344194867bbc40da54aa15aba6523c76379/revscoring-2.11.13-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3abbd5e727c4c7731cc98a73709790d9027e7da49fd562fc5a6fca7d3ddc9fa2",
"md5": "ae31a9c5685f8e21f16bc38aa8835dbc",
"sha256": "9263f88c0c4f6723750597ae8daa861caffc4f9d2796e7d659ea327ca1c71d78"
},
"downloads": -1,
"filename": "revscoring-2.11.13.tar.gz",
"has_sig": false,
"md5_digest": "ae31a9c5685f8e21f16bc38aa8835dbc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3",
"size": 270776,
"upload_time": "2023-09-20T15:41:19",
"upload_time_iso_8601": "2023-09-20T15:41:19.775170Z",
"url": "https://files.pythonhosted.org/packages/3a/bb/d5e727c4c7731cc98a73709790d9027e7da49fd562fc5a6fca7d3ddc9fa2/revscoring-2.11.13.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-20 15:41:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "wikimedia",
"github_project": "revscoring",
"travis_ci": true,
"coveralls": false,
"github_actions": true,
"requirements": [],
"tox": true,
"lcname": "revscoring"
}