WikiChangeWatcher 1.0.0
=======================
.. |tests_badge| image:: https://github.com/eriknyquist/wikichangewatcher/actions/workflows/tests.yml/badge.svg
.. |cov_badge| image:: https://github.com/eriknyquist/wikichangewatcher/actions/workflows/coverage.yml/badge.svg
.. |version_badge| image:: https://badgen.net/pypi/v/wikichangewatcher
.. |license_badge| image:: https://badgen.net/pypi/license/wikichangewatcher
.. image:: https://raw.githubusercontent.com/eriknyquist/wikichangewatcher/5f8e204db0af39d0a0ed00e5884a38544e11321a/images/wikiwatcher_github_banner.png
.. contents:: Table of Contents
|tests_badge| |cov_badge| |version_badge| |license_badge|
Introduction
============
Wikipedia provides an `SSE Stream <https://en.wikipedia.org/wiki/Server-sent_events>`_ of
all edits made to any page across Wikipedia, which allows you to watch all edits made to all wikipedia
pages in real time.
``WikiChangeWatcher`` is an SSE client that watches the SSE stream of wikipedia page edits,
with some filtering features that allow you to watch for page edit events with specific attributes
(e.g. `"anonymous" <https://en.wikipedia.org/wiki/Wikipedia:IP_edits_are_not_anonymous>`_
edits with IP addresses in specific ranges, or edits made to a specific page, or edits made by a wikipedia
user whose username matches a specific regular expression).
This package is inspired by `Tom Scott's WikiParliament project <https://www.tomscott.com/wikiparliament/>`_.
Install
=======
Install using ``pip``.
::
pip install wikichangewatcher
Examples
========
Some example scripts illustrating how to use ``WikiChangeWatcher`` are presented in
the following sections.
Monitoring "anonymous" page edits made from any IPv4 or IPv6 address
--------------------------------------------------------------------
The following example code watches for edits made by any IPv4 or IPv6 address.
.. code:: python
# Example script showing how to use WikiChangeWatcher to watch for "anonymous" edits to any
# wikipedia page from any IPv4 or IPv6 address
import time
from wikichangewatcher import WikiChangeWatcher, IpV4Filter, IpV6Filter
# Callback function to run whenever an event matching our IPv4 address pattern is seen
def match_handler(json_data):
"""
json_data is a JSON-encoded event from the WikiMedia "recent changes" event stream,
as described here: https://www.mediawiki.org/wiki/Manual:RCFeed
"""
print("{user} edited {title_url}".format(**json_data))
# Watch for anonymous edits from any IPv4 or IPv6 address
wc = WikiChangeWatcher((IpV4Filter() | IpV6Filter()).on_match(match_handler))
wc.run()
# Watch for page edits forever until KeyboardInterrupt
try:
while wc.is_running():
time.sleep(0.1)
except KeyboardInterrupt:
wc.stop()
Monitoring "anonymous" page edits made from specific IP address ranges
----------------------------------------------------------------------
The following example code watches for edits made by 3 specific IPv4 address ranges.
.. code:: python
# Example script showing how to use WikiChangeWatcher to watch for "anonymous" edits to any
# wikipedia page from specific IP address ranges
import time
from wikichangewatcher import WikiChangeWatcher, IpV4Filter, IpV6Filter
# Callback function to run whenever an event matching our IPv4 address pattern is seen
def match_handler(json_data):
"""
json_data is a JSON-encoded event from the WikiMedia "recent changes" event stream,
as described here: https://www.mediawiki.org/wiki/Manual:RCFeed
"""
print("{user} edited {title_url}".format(**json_data))
# Watch for anonymous edits from some specific IP address ranges
wc = WikiChangeWatcher(IpV4Filter("192.60.38.225-230").on_match(match_handler),
IpV6Filter("2601:205:4882:810:5D1D:BC41:61BB:0-ffff").on_match(match_handler))
# Wildcard '*' character can be used in place of a IPv4 or IP46 address field, to ignore that field entirely.
# IPV6 filter with some fields ignored: IpV6Filter("*:*:*:810:5D1D:BC41:*:0-ffff")
# IPV6 filter with some fields ignored: IpV4Filter("192.*.*.225-230")
wc.run()
# Watch for page edits forever until KeyboardInterrupt
try:
while True:
time.sleep(0.1)
except KeyboardInterrupt:
wc.stop()
Monitoring page edits made by usernames that match a regular expression
-----------------------------------------------------------------------
The following example code watches for edits made by signed-in users with usernames
that contain one or more strings matching a regular expression.
.. code:: python
# Example script showing how to use WikiChangeWatcher to watch for NON-"anonymous" edits to any
# wikipedia page, by usernames that contain a string matching a provided regular expression
import time
from wikichangewatcher import WikiChangeWatcher, UsernameRegexSearchFilter
# Callback function to run whenever an edit by a user with a username containing our regex is seen
def match_handler(json_data):
"""
json_data is a JSON-encoded event from the WikiMedia "recent changes" event stream,
as described here: https://www.mediawiki.org/wiki/Manual:RCFeed
"""
print("{user} edited {title_url}".format(**json_data))
# Watch for edits made by users with "bot" in their username
wc = WikiChangeWatcher(UsernameRegexSearchFilter(r"[Bb]ot|BOT").on_match(match_handler))
wc.run()
# Watch for page edits forever until KeyboardInterrupt
try:
while True:
time.sleep(0.1)
except KeyboardInterrupt:
wc.stop()
Monitoring page edit events based on regular expression match on arbitary JSON fields
-------------------------------------------------------------------------------------
The following example code watches for any page edit events where the specified JSON
field matches contains one or more matches of a regular expression (available
JSON fields and their descriptions can be found `here <https://www.mediawiki.org/wiki/Manual:RCFeed>`_).
.. code:: python
# Example script showing how to use WikiChangeWatcher to filter page edit events
# by a regular expression match in an arbitrary named field from the JSON event
# provided by the SSE stream of wikipedia page edits
import time
from wikichangewatcher import WikiChangeWatcher, FieldRegexSearchFilter
# Callback function to run whenever an edit is made to a page that has a regex match in the page URL
def match_handler(json_data):
"""
json_data is a JSON-encoded event from the WikiMedia "recent changes" event stream,
as described here: https://www.mediawiki.org/wiki/Manual:RCFeed
"""
print("{user} edited {title_url}".format(**json_data))
# Watch for edits made to any page that has the word "publish" in the page URL
# ("title_url" field in the JSON object)
wc = WikiChangeWatcher(FieldRegexSearchFilter("title_url", r"[Pp]ublish").on_match(match_handler))
wc.run()
# Watch for page edits forever until KeyboardInterrupt
try:
while True:
time.sleep(0.1)
except KeyboardInterrupt:
wc.stop()
Combining multiple filter classes with the ``FilterCollection`` class
---------------------------------------------------------------------
The following example watches for anonymous page edits to a specific page URL.
.. code:: python
# Example script showing how to use WikiChangeWatcher to watch for "anonymous" edits to
# a specific wikipedia page
import time
from wikichangewatcher import WikiChangeWatcher, FilterCollection, IpV4Filter, PageUrlFilter
# Callback function to run whenever an event matching our filters is seen
def match_handler(json_data):
"""
json_data is a JSON-encoded event from the WikiMedia "recent changes" event stream,
as described here: https://www.mediawiki.org/wiki/Manual:RCFeed
"""
print("{user} edited {title_url}".format(**json_data))
# Default match type is is MatchType.ALL
filters = FilterCollection(
# Filter for any edits to a specific wikipedia page URL
PageUrlFilter("https://es.wikipedia.org/wiki/Reclus_(La_Rioja)"),
# Filter for any IP address (any anonymous edit)
IpV4Filter("*.*.*.*"),
).on_match(match_handler)
wc = WikiChangeWatcher(filters)
wc.run()
# Watch for page edits forever until KeyboardInterrupt
try:
while True:
time.sleep(0.1)
except KeyboardInterrupt:
wc.stop()
Combining/nesting multiple ``FilterCollection`` classes
-------------------------------------------------------
The following example watches for page edits to several specific page URLs made by
user with the word "bot" in their username.
.. code:: python
# Example script showing how to use WikiChangeWatcher to watch for edit to specific
# wikipedia page URLs by users with the word "bot" in their name
import time
from wikichangewatcher import WikiChangeWatcher, FilterCollection, UsernameRegexSearchFilter, PageUrlFilter, MatchType
# Callback function to run whenever an event matching our filters is seen
def match_handler(json_data):
"""
json_data is a JSON-encoded event from the WikiMedia "recent changes" event stream,
as described here: https://www.mediawiki.org/wiki/Manual:RCFeed
"""
print("{user} edited {title_url}".format(**json_data))
# Make a filter collection that matches any one of several wikipedia pages
page_urls = FilterCollection(
# Filters for any edits to multiple specific wikipedia page URLs
PageUrlFilter("https://en.wikipedia.org/wiki/Python_(programming_language)"),
PageUrlFilter("https://en.wikipedia.org/wiki/CPython"),
PageUrlFilter("https://en.wikipedia.org/wiki/Server-sent_events"),
).set_match_type(MatchType.ANY)
# Make a filter collection that matches one of the page URLs, *and* a specific username regex
main_filter = FilterCollection(
page_urls,
UsernameRegexSearchFilter(r"[Bb][Oo][Tt]")
).set_match_type(MatchType.ALL).on_match(match_handler)
wc = WikiChangeWatcher(main_filter)
wc.run()
# Watch for page edits forever until KeyboardInterrupt
try:
while True:
time.sleep(0.1)
except KeyboardInterrupt:
wc.stop()
Using bitwise AND/OR operators to create ``FilterCollection`` classes
---------------------------------------------------------------------
Instead of creating FilterCollection classes directly, you can instead use bitwise AND ``&``
and bitwise OR ``|`` to combine filter objects.
For example, this code uses the bitwise OR operator to create a filter that matches any
IPv4 address, *or* any IPv6 address:
.. code:: python
from wikichangewatcher import IpV4Filter, IpV6Filter
# Callback function to run whenever an event matching our filters is seen
def match_handler(json_data):
print("{user} edited {title_url}".format(**json_data))
filter_collection = (IpV4Filter() | IpV6Filter()).on_match(match_handler)
And this code creates an equivalent filter, but uses the ``FilterCollection`` class
directly instead:
.. code:: python
from wikichangewatcher import IpV4Filter, IpV6Filter, FilterCollection, MatchType
# Callback function to run whenever an event matching our filters is seen
def match_handler(json_data):
print("{user} edited {title_url}".format(**json_data))
filter_collection = FilterCollection(
IpV4Filter(), IpV6Filter()
).set_match_type(MatchType.ANY).on_match(match_handler)
Finally, here is a slightly more complex example, which uses both bitwise AND / OR
operators together to create a filter that matches any IPv4 or IPv6 address, *and* a specific
page URL:
.. code:: python
from wikichangewatcher import IpV4Filter, IpV6Filter, PageUrlFilter
PAGE_URL = "https://en.wikipedia.org/wiki/Hayaguchi_Station"
# Callback function to run whenever an event matching our filters is seen
def match_handler(json_data):
print("{user} edited {title_url}".format(**json_data))
filter_collection = ((IpV4Filter() | IpV6Filter()) & PageUrlFilter(PAGE_URL)).on_match(match_handler)
Monitoring "anonymous" edits made from IP address ranges owned by US government depts./agencies
-----------------------------------------------------------------------------------------------
The following example watches for anonymous page edits to *any* wikipedia page,
from IP address ranges that were found to be publicly listed as owned by various
US government department and agencies (mostly California, some federal).
If you want to look up some IP addresses owned by your local governments, or companies, it's pretty easy,
I just went to ``https://ip-netblocks.whoisxmlapi.com/`` and searched for "california department of"
as the company name.
.. code:: python
# Example script showing how to use WikiChangeWatcher to watch for "anonymous" edits to any
# wikipedia page from IP address ranges that are publicly listed as being owned by various US government departments
import time
from wikichangewatcher import WikiChangeWatcher, FilterCollection, IpV4Filter, IpV6Filter, MatchType
# Callback function to run whenever an event matching one of our IPv4 address ranges is seen
def match_handler(json_data):
"""
json_data is a JSON-encoded event from the WikiMedia "recent changes" event stream,
as described here: https://www.mediawiki.org/wiki/Manual:RCFeed
"""
print("{user} edited {title_url}".format(**json_data))
filter_collection = FilterCollection(
IpV4Filter("136.200.0-255.0-255"), # IP4 range assigned to CA dept. of water resources
IpV4Filter("151.143.0-255.0-255"), # IP4 range assigned to CA dept. of technology
IpV4Filter("160.88.0-255.0-255"), # IP4 range assigned to CA dept. of insurance
IpV4Filter("192.56.110.0-255"), # IP4 range #1 assigned to CA dept. of corrections
IpV4Filter("153.48.0-255.0-255"), # IP4 range #2 assigned to CA dept. of corrections
IpV4Filter("149.136.0-255.0-255"), # IP4 range assigned to CA dept. of transportation
IpV6Filter("2602:814:5000-5fff:0-ffff:0-ffff:0-ffff:0-ffff:0-ffff"), # IP6 range assigned CA dept. of transportation
IpV4Filter("192.251.92.0-255"), # IP4 range assigned to CA dept. of general services
IpV4Filter("159.145.0-255.0-255"), # IP4 range assigned to CA dept. of consumer affairs
IpV4Filter("167.10.0-255.0-255"), # IP4 range assigned to CA dept. of justice
IpV4Filter("192.58.200-203.0-255"), # IP4 range assigned to Bureau of Justice Statistics in WA
IpV6Filter("2607:f330:0-ffff:0-ffff:0-ffff:0-ffff:0-ffff:0-ffff") # IP6 range assigned to the US dept. of justice in WA
).set_match_type(MatchType.ALL).on_match(match_handler)
wc = WikiChangeWatcher(filter_collection)
wc.run()
# Watch for page edits forever until KeyboardInterrupt
try:
while True:
time.sleep(0.1)
except KeyboardInterrupt:
wc.stop()
Calculating a running average of page-edits-per-minute for all of wikipedia
---------------------------------------------------------------------------
The following example watches for any edit to any wikipedia page, and updates a
running average of the rate of page edits per minute, which is printed to stdout
once every 5 seconds.
.. code:: python
# Example script showing how to use WikiChangeWatcher to watch for "anonymous" edits to any
# wikipedia page from specific IP address ranges
import time
import statistics
import queue
from wikichangewatcher import WikiChangeWatcher
# Max. number of samples in the averaging window
MAX_WINDOW_LEN = 6
# Interval between new samples for the averaging window, in seconds
INTERVAL_SECS = 5
class EditRateCounter():
"""
Tracks total number of page edits per minute across all of wikipedia,
using a simple averaging window
"""
def __init__(self, interval_secs=INTERVAL_SECS):
self._edit_count = 0
self._start_time = None
self._interval_secs = interval_secs
self._queue = queue.Queue()
self._window = []
# Callback function to run whenever an edit event is seen
def edit_handler(self, json_data):
"""
json_data is a JSON-encoded event from the WikiMedia "recent changes" event stream,
as described here: https://www.mediawiki.org/wiki/Manual:RCFeed
"""
self._edit_count += 1
# Add an edit rate sample to the averaging window, and return the new average
def _add_to_window(self, edits_per_min):
self._window.append(edits_per_min)
if len(self._window) > MAX_WINDOW_LEN:
self._window.pop(0)
return statistics.mean(self._window)
def run(self):
if self._start_time is None:
self._start_time = time.time()
if (time.time() - self._start_time) >= self._interval_secs:
# interval is up, calculate new rate and put it on the queue
edits_per_min = float(self._edit_count) * (60.0 / self._interval_secs)
self._queue.put((self._add_to_window(edits_per_min), self._edit_count))
self._edit_count = 0
self._start_time = time.time()
def get_rate(self):
ret = None
try:
ret = self._queue.get(block=False)
except queue.Empty:
pass
return ret
# Create rate counter class to monitor page edit rate over time
ratecounter = EditRateCounter()
# Create a watcher with no filters-- we want to see every single edit
wc = WikiChangeWatcher().on_edit(ratecounter.edit_handler)
wc.run()
# Watch for page edits forever until KeyboardInterrupt
try:
while True:
ratecounter.run()
new_rate = ratecounter.get_rate()
if new_rate:
rate, since_last = new_rate
print(f"{rate:.2f} avg. page edits per min. ({since_last} in the last {INTERVAL_SECS} secs)")
except KeyboardInterrupt:
wc.stop()
``wikiwatch`` CLI tool
======================
A CLI program called ``wikiwatch`` is provided, which uses the ``wikichangewatcher``
package to provide some monitoring capabilities at the command line:
::
usage: wikiwatch [-h] [-a ADDRESS] [-u USERNAME_REGEX] [-f FIELD_NAME VALUE_RGX]
[-s FORMAT_STRING] [--version]
Real-time monitoring of global Wikipedia page edits, with flexible filtering
features.
options:
-h, --help show this help message and exit
-a ADDRESS, --address ADDRESS
Adds an IPv4 or Ipv6 address range to look for. Any
anonymous edits made by IPv4 addresses in this range
will be displayed. Each dot-separated field (for IPv4
addresses) or colon-separated field (for IPv6 addresses)
may be optionally replaced with with an asterisk (which
acts as a wildcard, matching any value), or a range of
values. For example, the address range "*.22.33.0-55"
would match all IPv4 addresses in the range 0.22.33.0
through 255.22.33.50. This option can be used multiple
times to add multiple IP address filters.
-u USERNAME_REGEX, --username-regex USERNAME_REGEX
Adds a username regex to look for. Any edits made by
logged-in users with a username that matches this
regular expression will be displayed. This option can be
used multiple times to add multiple username filters.
-f FIELD_NAME VALUE_RGX, --field FIELD_NAME VALUE_RGX
Adds a regex to look for in a specific named field in
the JSON event provided by the wikimedia recent changes
stream (described here
https://www.mediawiki.org/wiki/Manual:RCFeed). Any edit
events which have a value matching the VALUE_RGX regular
expression stored in the FIELD_NAME field will be
displayed. This option can be used multiple times to add
multiple named field filters.
-s FORMAT_STRING, --format-string FORMAT_STRING
Define a custom format string to control how filtered
results are displayed. Format tokens may be used to
display data from any named field in the JSON event
described at
https://www.mediawiki.org/wiki/Manual:RCFeed. Format
tokens must be in the form "{field_name}", where
"field_name" is the name of any field from the JSON
event. This option can only be used once (Default:
"{user} edited {title_url}").
--version Show version and exit.
NOTE: if run without arguments, then all anonymous edits (any IPv4 or IPv6
address) will be shown.
EXAMPLES:
Show only edits made by one of two specific IP addresses:
wikiwatch -a 89.44.33.22 -a 2001:0db8:85a3:0000:0000:8a2e:0370:7334
Show only edits made by IPv4 addresses in the range 88.44.0-33.0-22:
wikiwatch -a 88.44.0-33.0-22
Show only edits made by IPv4 addresses in the range 232.22.0-255.0-255:
wikiwatch -a 232.22.*.*
Show only edits made by usernames that contain the word "Bot" or "bot":
wikiwatch -f user "[Bb]ot"
Contributions
=============
Contributions are welcome, please open a pull request at `<https://github.com/eriknyquist/wikichangewatcher/pulls>`_.
You will need to install packages required for development by doing ``pip install -r dev_requirements.txt``.
Please ensure that all existing tests pass, new test(s) are added if required, and the code coverage
check passes.
* Run tests with ``python setup.py test``.
* Run tests and and generate code coverage report with ``python code_coverage.py``
(this script will report an error if coverage is below 90%)
If you have any questions about / need help with contributions or tests, please
contact Erik at eknyquist@gmail.com.
Raw data
{
"_id": null,
"home_page": "http://github.com/eriknyquist/wikichangewatcher",
"name": "wikichangewatcher",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "",
"keywords": "",
"author": "Erik Nyquist",
"author_email": "eknyquist@gmail.com",
"download_url": "",
"platform": null,
"description": "WikiChangeWatcher 1.0.0\r\n=======================\r\n\r\n.. |tests_badge| image:: https://github.com/eriknyquist/wikichangewatcher/actions/workflows/tests.yml/badge.svg\r\n.. |cov_badge| image:: https://github.com/eriknyquist/wikichangewatcher/actions/workflows/coverage.yml/badge.svg\r\n.. |version_badge| image:: https://badgen.net/pypi/v/wikichangewatcher\r\n.. |license_badge| image:: https://badgen.net/pypi/license/wikichangewatcher\r\n\r\n.. image:: https://raw.githubusercontent.com/eriknyquist/wikichangewatcher/5f8e204db0af39d0a0ed00e5884a38544e11321a/images/wikiwatcher_github_banner.png\r\n\r\n.. contents:: Table of Contents\r\n\r\n|tests_badge| |cov_badge| |version_badge| |license_badge|\r\n\r\nIntroduction\r\n============\r\n\r\nWikipedia provides an `SSE Stream <https://en.wikipedia.org/wiki/Server-sent_events>`_ of\r\nall edits made to any page across Wikipedia, which allows you to watch all edits made to all wikipedia\r\npages in real time.\r\n\r\n``WikiChangeWatcher`` is an SSE client that watches the SSE stream of wikipedia page edits,\r\nwith some filtering features that allow you to watch for page edit events with specific attributes\r\n(e.g. `\"anonymous\" <https://en.wikipedia.org/wiki/Wikipedia:IP_edits_are_not_anonymous>`_\r\nedits with IP addresses in specific ranges, or edits made to a specific page, or edits made by a wikipedia\r\nuser whose username matches a specific regular expression).\r\n\r\nThis package is inspired by `Tom Scott's WikiParliament project <https://www.tomscott.com/wikiparliament/>`_.\r\n\r\nInstall\r\n=======\r\n\r\nInstall using ``pip``.\r\n\r\n::\r\n\r\n pip install wikichangewatcher\r\n\r\nExamples\r\n========\r\n\r\nSome example scripts illustrating how to use ``WikiChangeWatcher`` are presented in\r\nthe following sections.\r\n\r\n\r\nMonitoring \"anonymous\" page edits made from any IPv4 or IPv6 address\r\n--------------------------------------------------------------------\r\n\r\nThe following example code watches for edits made by any IPv4 or IPv6 address.\r\n\r\n.. code:: python\r\n\r\n # Example script showing how to use WikiChangeWatcher to watch for \"anonymous\" edits to any\r\n # wikipedia page from any IPv4 or IPv6 address\r\n\r\n import time\r\n from wikichangewatcher import WikiChangeWatcher, IpV4Filter, IpV6Filter\r\n\r\n # Callback function to run whenever an event matching our IPv4 address pattern is seen\r\n def match_handler(json_data):\r\n \"\"\"\r\n json_data is a JSON-encoded event from the WikiMedia \"recent changes\" event stream,\r\n as described here: https://www.mediawiki.org/wiki/Manual:RCFeed\r\n \"\"\"\r\n print(\"{user} edited {title_url}\".format(**json_data))\r\n\r\n # Watch for anonymous edits from any IPv4 or IPv6 address\r\n wc = WikiChangeWatcher((IpV4Filter() | IpV6Filter()).on_match(match_handler))\r\n wc.run()\r\n\r\n # Watch for page edits forever until KeyboardInterrupt\r\n try:\r\n while wc.is_running():\r\n time.sleep(0.1)\r\n except KeyboardInterrupt:\r\n wc.stop()\r\n\r\n\r\nMonitoring \"anonymous\" page edits made from specific IP address ranges\r\n----------------------------------------------------------------------\r\n\r\nThe following example code watches for edits made by 3 specific IPv4 address ranges.\r\n\r\n.. code:: python\r\n\r\n # Example script showing how to use WikiChangeWatcher to watch for \"anonymous\" edits to any\r\n # wikipedia page from specific IP address ranges\r\n\r\n import time\r\n from wikichangewatcher import WikiChangeWatcher, IpV4Filter, IpV6Filter\r\n\r\n # Callback function to run whenever an event matching our IPv4 address pattern is seen\r\n def match_handler(json_data):\r\n \"\"\"\r\n json_data is a JSON-encoded event from the WikiMedia \"recent changes\" event stream,\r\n as described here: https://www.mediawiki.org/wiki/Manual:RCFeed\r\n \"\"\"\r\n print(\"{user} edited {title_url}\".format(**json_data))\r\n\r\n # Watch for anonymous edits from some specific IP address ranges\r\n wc = WikiChangeWatcher(IpV4Filter(\"192.60.38.225-230\").on_match(match_handler),\r\n IpV6Filter(\"2601:205:4882:810:5D1D:BC41:61BB:0-ffff\").on_match(match_handler))\r\n\r\n # Wildcard '*' character can be used in place of a IPv4 or IP46 address field, to ignore that field entirely.\r\n # IPV6 filter with some fields ignored: IpV6Filter(\"*:*:*:810:5D1D:BC41:*:0-ffff\")\r\n # IPV6 filter with some fields ignored: IpV4Filter(\"192.*.*.225-230\")\r\n\r\n wc.run()\r\n\r\n # Watch for page edits forever until KeyboardInterrupt\r\n try:\r\n while True:\r\n time.sleep(0.1)\r\n except KeyboardInterrupt:\r\n wc.stop()\r\n\r\nMonitoring page edits made by usernames that match a regular expression\r\n-----------------------------------------------------------------------\r\n\r\nThe following example code watches for edits made by signed-in users with usernames\r\nthat contain one or more strings matching a regular expression.\r\n\r\n.. code:: python\r\n\r\n # Example script showing how to use WikiChangeWatcher to watch for NON-\"anonymous\" edits to any\r\n # wikipedia page, by usernames that contain a string matching a provided regular expression\r\n\r\n import time\r\n from wikichangewatcher import WikiChangeWatcher, UsernameRegexSearchFilter\r\n\r\n # Callback function to run whenever an edit by a user with a username containing our regex is seen\r\n def match_handler(json_data):\r\n \"\"\"\r\n json_data is a JSON-encoded event from the WikiMedia \"recent changes\" event stream,\r\n as described here: https://www.mediawiki.org/wiki/Manual:RCFeed\r\n \"\"\"\r\n print(\"{user} edited {title_url}\".format(**json_data))\r\n\r\n # Watch for edits made by users with \"bot\" in their username\r\n wc = WikiChangeWatcher(UsernameRegexSearchFilter(r\"[Bb]ot|BOT\").on_match(match_handler))\r\n\r\n wc.run()\r\n\r\n # Watch for page edits forever until KeyboardInterrupt\r\n try:\r\n while True:\r\n time.sleep(0.1)\r\n except KeyboardInterrupt:\r\n wc.stop()\r\n\r\nMonitoring page edit events based on regular expression match on arbitary JSON fields\r\n-------------------------------------------------------------------------------------\r\n\r\nThe following example code watches for any page edit events where the specified JSON\r\nfield matches contains one or more matches of a regular expression (available\r\nJSON fields and their descriptions can be found `here <https://www.mediawiki.org/wiki/Manual:RCFeed>`_).\r\n\r\n.. code:: python\r\n\r\n # Example script showing how to use WikiChangeWatcher to filter page edit events\r\n # by a regular expression match in an arbitrary named field from the JSON event\r\n # provided by the SSE stream of wikipedia page edits\r\n\r\n import time\r\n from wikichangewatcher import WikiChangeWatcher, FieldRegexSearchFilter\r\n\r\n # Callback function to run whenever an edit is made to a page that has a regex match in the page URL\r\n def match_handler(json_data):\r\n \"\"\"\r\n json_data is a JSON-encoded event from the WikiMedia \"recent changes\" event stream,\r\n as described here: https://www.mediawiki.org/wiki/Manual:RCFeed\r\n \"\"\"\r\n print(\"{user} edited {title_url}\".format(**json_data))\r\n\r\n # Watch for edits made to any page that has the word \"publish\" in the page URL\r\n # (\"title_url\" field in the JSON object)\r\n wc = WikiChangeWatcher(FieldRegexSearchFilter(\"title_url\", r\"[Pp]ublish\").on_match(match_handler))\r\n\r\n wc.run()\r\n\r\n # Watch for page edits forever until KeyboardInterrupt\r\n try:\r\n while True:\r\n time.sleep(0.1)\r\n except KeyboardInterrupt:\r\n wc.stop()\r\n\r\n\r\nCombining multiple filter classes with the ``FilterCollection`` class\r\n---------------------------------------------------------------------\r\n\r\nThe following example watches for anonymous page edits to a specific page URL.\r\n\r\n.. code:: python\r\n\r\n # Example script showing how to use WikiChangeWatcher to watch for \"anonymous\" edits to\r\n # a specific wikipedia page\r\n\r\n import time\r\n from wikichangewatcher import WikiChangeWatcher, FilterCollection, IpV4Filter, PageUrlFilter\r\n\r\n # Callback function to run whenever an event matching our filters is seen\r\n def match_handler(json_data):\r\n \"\"\"\r\n json_data is a JSON-encoded event from the WikiMedia \"recent changes\" event stream,\r\n as described here: https://www.mediawiki.org/wiki/Manual:RCFeed\r\n \"\"\"\r\n print(\"{user} edited {title_url}\".format(**json_data))\r\n\r\n # Default match type is is MatchType.ALL\r\n filters = FilterCollection(\r\n # Filter for any edits to a specific wikipedia page URL\r\n PageUrlFilter(\"https://es.wikipedia.org/wiki/Reclus_(La_Rioja)\"),\r\n\r\n # Filter for any IP address (any anonymous edit)\r\n IpV4Filter(\"*.*.*.*\"),\r\n ).on_match(match_handler)\r\n\r\n\r\n wc = WikiChangeWatcher(filters)\r\n\r\n wc.run()\r\n\r\n # Watch for page edits forever until KeyboardInterrupt\r\n try:\r\n while True:\r\n time.sleep(0.1)\r\n except KeyboardInterrupt:\r\n wc.stop()\r\n\r\nCombining/nesting multiple ``FilterCollection`` classes\r\n-------------------------------------------------------\r\n\r\nThe following example watches for page edits to several specific page URLs made by\r\nuser with the word \"bot\" in their username.\r\n\r\n.. code:: python\r\n\r\n # Example script showing how to use WikiChangeWatcher to watch for edit to specific\r\n # wikipedia page URLs by users with the word \"bot\" in their name\r\n\r\n import time\r\n from wikichangewatcher import WikiChangeWatcher, FilterCollection, UsernameRegexSearchFilter, PageUrlFilter, MatchType\r\n\r\n # Callback function to run whenever an event matching our filters is seen\r\n def match_handler(json_data):\r\n \"\"\"\r\n json_data is a JSON-encoded event from the WikiMedia \"recent changes\" event stream,\r\n as described here: https://www.mediawiki.org/wiki/Manual:RCFeed\r\n \"\"\"\r\n print(\"{user} edited {title_url}\".format(**json_data))\r\n\r\n # Make a filter collection that matches any one of several wikipedia pages\r\n page_urls = FilterCollection(\r\n # Filters for any edits to multiple specific wikipedia page URLs\r\n PageUrlFilter(\"https://en.wikipedia.org/wiki/Python_(programming_language)\"),\r\n PageUrlFilter(\"https://en.wikipedia.org/wiki/CPython\"),\r\n PageUrlFilter(\"https://en.wikipedia.org/wiki/Server-sent_events\"),\r\n ).set_match_type(MatchType.ANY)\r\n\r\n # Make a filter collection that matches one of the page URLs, *and* a specific username regex\r\n main_filter = FilterCollection(\r\n page_urls,\r\n UsernameRegexSearchFilter(r\"[Bb][Oo][Tt]\")\r\n ).set_match_type(MatchType.ALL).on_match(match_handler)\r\n\r\n wc = WikiChangeWatcher(main_filter)\r\n\r\n wc.run()\r\n\r\n # Watch for page edits forever until KeyboardInterrupt\r\n try:\r\n while True:\r\n time.sleep(0.1)\r\n except KeyboardInterrupt:\r\n wc.stop()\r\n\r\nUsing bitwise AND/OR operators to create ``FilterCollection`` classes\r\n---------------------------------------------------------------------\r\n\r\nInstead of creating FilterCollection classes directly, you can instead use bitwise AND ``&``\r\nand bitwise OR ``|`` to combine filter objects.\r\n\r\nFor example, this code uses the bitwise OR operator to create a filter that matches any\r\nIPv4 address, *or* any IPv6 address:\r\n\r\n.. code:: python\r\n\r\n from wikichangewatcher import IpV4Filter, IpV6Filter\r\n\r\n # Callback function to run whenever an event matching our filters is seen\r\n def match_handler(json_data):\r\n print(\"{user} edited {title_url}\".format(**json_data))\r\n\r\n filter_collection = (IpV4Filter() | IpV6Filter()).on_match(match_handler)\r\n\r\nAnd this code creates an equivalent filter, but uses the ``FilterCollection`` class\r\ndirectly instead:\r\n\r\n.. code:: python\r\n\r\n from wikichangewatcher import IpV4Filter, IpV6Filter, FilterCollection, MatchType\r\n\r\n # Callback function to run whenever an event matching our filters is seen\r\n def match_handler(json_data):\r\n print(\"{user} edited {title_url}\".format(**json_data))\r\n\r\n filter_collection = FilterCollection(\r\n IpV4Filter(), IpV6Filter()\r\n ).set_match_type(MatchType.ANY).on_match(match_handler)\r\n\r\nFinally, here is a slightly more complex example, which uses both bitwise AND / OR\r\noperators together to create a filter that matches any IPv4 or IPv6 address, *and* a specific\r\npage URL:\r\n\r\n.. code:: python\r\n\r\n from wikichangewatcher import IpV4Filter, IpV6Filter, PageUrlFilter\r\n\r\n PAGE_URL = \"https://en.wikipedia.org/wiki/Hayaguchi_Station\"\r\n\r\n # Callback function to run whenever an event matching our filters is seen\r\n def match_handler(json_data):\r\n print(\"{user} edited {title_url}\".format(**json_data))\r\n\r\n filter_collection = ((IpV4Filter() | IpV6Filter()) & PageUrlFilter(PAGE_URL)).on_match(match_handler)\r\n\r\nMonitoring \"anonymous\" edits made from IP address ranges owned by US government depts./agencies\r\n-----------------------------------------------------------------------------------------------\r\n\r\nThe following example watches for anonymous page edits to *any* wikipedia page,\r\nfrom IP address ranges that were found to be publicly listed as owned by various\r\nUS government department and agencies (mostly California, some federal).\r\n\r\nIf you want to look up some IP addresses owned by your local governments, or companies, it's pretty easy,\r\nI just went to ``https://ip-netblocks.whoisxmlapi.com/`` and searched for \"california department of\"\r\nas the company name.\r\n\r\n.. code:: python\r\n\r\n # Example script showing how to use WikiChangeWatcher to watch for \"anonymous\" edits to any\r\n # wikipedia page from IP address ranges that are publicly listed as being owned by various US government departments\r\n\r\n import time\r\n from wikichangewatcher import WikiChangeWatcher, FilterCollection, IpV4Filter, IpV6Filter, MatchType\r\n\r\n # Callback function to run whenever an event matching one of our IPv4 address ranges is seen\r\n def match_handler(json_data):\r\n \"\"\"\r\n json_data is a JSON-encoded event from the WikiMedia \"recent changes\" event stream,\r\n as described here: https://www.mediawiki.org/wiki/Manual:RCFeed\r\n \"\"\"\r\n print(\"{user} edited {title_url}\".format(**json_data))\r\n\r\n\r\n filter_collection = FilterCollection(\r\n IpV4Filter(\"136.200.0-255.0-255\"), # IP4 range assigned to CA dept. of water resources\r\n IpV4Filter(\"151.143.0-255.0-255\"), # IP4 range assigned to CA dept. of technology\r\n IpV4Filter(\"160.88.0-255.0-255\"), # IP4 range assigned to CA dept. of insurance\r\n IpV4Filter(\"192.56.110.0-255\"), # IP4 range #1 assigned to CA dept. of corrections\r\n IpV4Filter(\"153.48.0-255.0-255\"), # IP4 range #2 assigned to CA dept. of corrections\r\n IpV4Filter(\"149.136.0-255.0-255\"), # IP4 range assigned to CA dept. of transportation\r\n IpV6Filter(\"2602:814:5000-5fff:0-ffff:0-ffff:0-ffff:0-ffff:0-ffff\"), # IP6 range assigned CA dept. of transportation\r\n IpV4Filter(\"192.251.92.0-255\"), # IP4 range assigned to CA dept. of general services\r\n IpV4Filter(\"159.145.0-255.0-255\"), # IP4 range assigned to CA dept. of consumer affairs\r\n IpV4Filter(\"167.10.0-255.0-255\"), # IP4 range assigned to CA dept. of justice\r\n IpV4Filter(\"192.58.200-203.0-255\"), # IP4 range assigned to Bureau of Justice Statistics in WA\r\n IpV6Filter(\"2607:f330:0-ffff:0-ffff:0-ffff:0-ffff:0-ffff:0-ffff\") # IP6 range assigned to the US dept. of justice in WA\r\n ).set_match_type(MatchType.ALL).on_match(match_handler)\r\n\r\n wc = WikiChangeWatcher(filter_collection)\r\n wc.run()\r\n\r\n # Watch for page edits forever until KeyboardInterrupt\r\n try:\r\n while True:\r\n time.sleep(0.1)\r\n except KeyboardInterrupt:\r\n wc.stop()\r\n\r\nCalculating a running average of page-edits-per-minute for all of wikipedia\r\n---------------------------------------------------------------------------\r\n\r\nThe following example watches for any edit to any wikipedia page, and updates a\r\nrunning average of the rate of page edits per minute, which is printed to stdout\r\nonce every 5 seconds.\r\n\r\n.. code:: python\r\n\r\n # Example script showing how to use WikiChangeWatcher to watch for \"anonymous\" edits to any\r\n # wikipedia page from specific IP address ranges\r\n\r\n import time\r\n import statistics\r\n import queue\r\n\r\n from wikichangewatcher import WikiChangeWatcher\r\n\r\n\r\n # Max. number of samples in the averaging window\r\n MAX_WINDOW_LEN = 6\r\n\r\n # Interval between new samples for the averaging window, in seconds\r\n INTERVAL_SECS = 5\r\n\r\n\r\n class EditRateCounter():\r\n \"\"\"\r\n Tracks total number of page edits per minute across all of wikipedia,\r\n using a simple averaging window\r\n \"\"\"\r\n def __init__(self, interval_secs=INTERVAL_SECS):\r\n self._edit_count = 0\r\n self._start_time = None\r\n self._interval_secs = interval_secs\r\n self._queue = queue.Queue()\r\n self._window = []\r\n\r\n # Callback function to run whenever an edit event is seen\r\n def edit_handler(self, json_data):\r\n \"\"\"\r\n json_data is a JSON-encoded event from the WikiMedia \"recent changes\" event stream,\r\n as described here: https://www.mediawiki.org/wiki/Manual:RCFeed\r\n \"\"\"\r\n self._edit_count += 1\r\n\r\n # Add an edit rate sample to the averaging window, and return the new average\r\n def _add_to_window(self, edits_per_min):\r\n self._window.append(edits_per_min)\r\n if len(self._window) > MAX_WINDOW_LEN:\r\n self._window.pop(0)\r\n\r\n return statistics.mean(self._window)\r\n\r\n def run(self):\r\n if self._start_time is None:\r\n self._start_time = time.time()\r\n\r\n if (time.time() - self._start_time) >= self._interval_secs:\r\n # interval is up, calculate new rate and put it on the queue\r\n edits_per_min = float(self._edit_count) * (60.0 / self._interval_secs)\r\n self._queue.put((self._add_to_window(edits_per_min), self._edit_count))\r\n self._edit_count = 0\r\n self._start_time = time.time()\r\n\r\n def get_rate(self):\r\n ret = None\r\n\r\n try:\r\n ret = self._queue.get(block=False)\r\n except queue.Empty:\r\n pass\r\n\r\n return ret\r\n\r\n # Create rate counter class to monitor page edit rate over time\r\n ratecounter = EditRateCounter()\r\n\r\n # Create a watcher with no filters-- we want to see every single edit\r\n wc = WikiChangeWatcher().on_edit(ratecounter.edit_handler)\r\n\r\n wc.run()\r\n\r\n # Watch for page edits forever until KeyboardInterrupt\r\n try:\r\n while True:\r\n ratecounter.run()\r\n new_rate = ratecounter.get_rate()\r\n if new_rate:\r\n rate, since_last = new_rate\r\n print(f\"{rate:.2f} avg. page edits per min. ({since_last} in the last {INTERVAL_SECS} secs)\")\r\n except KeyboardInterrupt:\r\n wc.stop()\r\n\r\n\r\n``wikiwatch`` CLI tool\r\n======================\r\n\r\nA CLI program called ``wikiwatch`` is provided, which uses the ``wikichangewatcher``\r\npackage to provide some monitoring capabilities at the command line:\r\n\r\n::\r\n\r\n usage: wikiwatch [-h] [-a ADDRESS] [-u USERNAME_REGEX] [-f FIELD_NAME VALUE_RGX]\r\n [-s FORMAT_STRING] [--version]\r\n\r\n Real-time monitoring of global Wikipedia page edits, with flexible filtering\r\n features.\r\n\r\n options:\r\n -h, --help show this help message and exit\r\n -a ADDRESS, --address ADDRESS\r\n Adds an IPv4 or Ipv6 address range to look for. Any\r\n anonymous edits made by IPv4 addresses in this range\r\n will be displayed. Each dot-separated field (for IPv4\r\n addresses) or colon-separated field (for IPv6 addresses)\r\n may be optionally replaced with with an asterisk (which\r\n acts as a wildcard, matching any value), or a range of\r\n values. For example, the address range \"*.22.33.0-55\"\r\n would match all IPv4 addresses in the range 0.22.33.0\r\n through 255.22.33.50. This option can be used multiple\r\n times to add multiple IP address filters.\r\n -u USERNAME_REGEX, --username-regex USERNAME_REGEX\r\n Adds a username regex to look for. Any edits made by\r\n logged-in users with a username that matches this\r\n regular expression will be displayed. This option can be\r\n used multiple times to add multiple username filters.\r\n -f FIELD_NAME VALUE_RGX, --field FIELD_NAME VALUE_RGX\r\n Adds a regex to look for in a specific named field in\r\n the JSON event provided by the wikimedia recent changes\r\n stream (described here\r\n https://www.mediawiki.org/wiki/Manual:RCFeed). Any edit\r\n events which have a value matching the VALUE_RGX regular\r\n expression stored in the FIELD_NAME field will be\r\n displayed. This option can be used multiple times to add\r\n multiple named field filters.\r\n -s FORMAT_STRING, --format-string FORMAT_STRING\r\n Define a custom format string to control how filtered\r\n results are displayed. Format tokens may be used to\r\n display data from any named field in the JSON event\r\n described at\r\n https://www.mediawiki.org/wiki/Manual:RCFeed. Format\r\n tokens must be in the form \"{field_name}\", where\r\n \"field_name\" is the name of any field from the JSON\r\n event. This option can only be used once (Default:\r\n \"{user} edited {title_url}\").\r\n --version Show version and exit.\r\n\r\n NOTE: if run without arguments, then all anonymous edits (any IPv4 or IPv6\r\n address) will be shown.\r\n\r\n EXAMPLES:\r\n\r\n Show only edits made by one of two specific IP addresses:\r\n\r\n wikiwatch -a 89.44.33.22 -a 2001:0db8:85a3:0000:0000:8a2e:0370:7334\r\n\r\n Show only edits made by IPv4 addresses in the range 88.44.0-33.0-22:\r\n\r\n wikiwatch -a 88.44.0-33.0-22\r\n\r\n Show only edits made by IPv4 addresses in the range 232.22.0-255.0-255:\r\n\r\n wikiwatch -a 232.22.*.*\r\n\r\n Show only edits made by usernames that contain the word \"Bot\" or \"bot\":\r\n\r\n wikiwatch -f user \"[Bb]ot\"\r\n\r\nContributions\r\n=============\r\n\r\nContributions are welcome, please open a pull request at `<https://github.com/eriknyquist/wikichangewatcher/pulls>`_.\r\nYou will need to install packages required for development by doing ``pip install -r dev_requirements.txt``.\r\n\r\nPlease ensure that all existing tests pass, new test(s) are added if required, and the code coverage\r\ncheck passes.\r\n\r\n* Run tests with ``python setup.py test``.\r\n* Run tests and and generate code coverage report with ``python code_coverage.py``\r\n (this script will report an error if coverage is below 90%)\r\n\r\nIf you have any questions about / need help with contributions or tests, please\r\ncontact Erik at eknyquist@gmail.com.\r\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Real-time monitoring/filtering of global Wikipedia page edits",
"version": "1.0.0",
"project_urls": {
"Homepage": "http://github.com/eriknyquist/wikichangewatcher"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ddb83a01f651a8520961460ee5e5e3b30d81b4f8ea8c6a2e552a53d75a2ff2a3",
"md5": "3e146a0621e50f770281043f6f5f9166",
"sha256": "3e991c5bcb21da87de00a9527549ca1a543491fb76b5d7fd31df8b57c53527e0"
},
"downloads": -1,
"filename": "wikichangewatcher-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3e146a0621e50f770281043f6f5f9166",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 17836,
"upload_time": "2023-12-12T07:12:49",
"upload_time_iso_8601": "2023-12-12T07:12:49.503253Z",
"url": "https://files.pythonhosted.org/packages/dd/b8/3a01f651a8520961460ee5e5e3b30d81b4f8ea8c6a2e552a53d75a2ff2a3/wikichangewatcher-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-12 07:12:49",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "eriknyquist",
"github_project": "wikichangewatcher",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "wikichangewatcher"
}