web-watchr


Nameweb-watchr JSON
Version 0.3.0 PyPI version JSON
download
home_pageNone
SummaryMonitors a website for changes in a text element and publishes an alert (e.g., on telegram)
upload_time2024-10-21 19:02:28
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords bot scraping telegram
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![GitHub License](https://img.shields.io/github/license/Emrys-Merlin/web_watchr)
![Python Version from PEP 621 TOML](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2FEmrys-Merlin%2Fweb_watchr%2Fmain%2Fpyproject.toml)

| 3.10 | 3.11 | 3.12 | 3.13 |
|------|------|------|------|
|![tests](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2Fweb_watchr_3.10-junit-tests.json)|![tests](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2Fweb_watchr_3.11-junit-tests.json)|![tests](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2Fweb_watchr_3.12-junit-tests.json)|![tests](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2Fweb_watchr_3.13-junit-tests.json)|
|![Endpoint Badge](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2F272263ce795e0ca0adf06fa6d8aa2fe496a778dd%2Fweb_watchr_3.10-cobertura-coverage.json)|![Endpoint Badge](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2F272263ce795e0ca0adf06fa6d8aa2fe496a778dd%2Fweb_watchr_3.11-cobertura-coverage.json)|![Endpoint Badge](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2F272263ce795e0ca0adf06fa6d8aa2fe496a778dd%2Fweb_watchr_3.12-cobertura-coverage.json)|![Endpoint Badge](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2F272263ce795e0ca0adf06fa6d8aa2fe496a778dd%2Fweb_watchr_3.13-cobertura-coverage.json)|

# WebWatchr

This python package is a framework around [Playwright](https://Playwright.dev/python) to monitor a website and receive an alert if the monitored text changes. The setup is quite modular. To specify the website to monitor, you need to define a `Callable[[Playwright], str]` which is responsible to extract the text you are interested in. Currently, the only available alerting channel is via [telegram](https://telegram.org) bot. However, more alerting channels will follow.

> [!IMPORTANT]
> Before you start scraping any website, please make sure that you are allowed to. Besides legal obligations, please consider reaching out to the website owner and please respect `robots.txt`files.

## Installation

The package is available via PyPI. You can install it via
```
pip install web_watchr
```
If you prefer the latest changes, you can also install it directly from the repository via:
```
pip install git+https://github.com/Emrys-Merlin/web_watchr
```

## Usage

After the installation, the intended way to invoke the framework is by writing a small runner script (which you can find [here](examples/simple_dummy_example.py)):
```Python
from playwright.sync_api import Playwright
from web_watchr import Watchr
from web_watchr.compare import DummyComparer

watchr = Watchr(
    comparer=DummyComparer(),
)


@watchr.set_poller
def poll(playwright: Playwright) -> str:
    browser = playwright.chromium.launch(headless=True)
    context = browser.new_context()
    page = context.new_page()
    page.goto("https://www.example.com/")
    text = page.get_by_role("heading").inner_text()
    context.close()
    browser.close()

    return text


if __name__ == "__main__":
    watchr()
```

The runner consists of three parts:

1. A new `Watchr` object is initialized. For illustration purposes, a `DummyComparer` instance is passed to it, which will indicate that the monitored text has changed no matter the input.
2. We implement the `poll` function and decorate it with `@watchr.set_poller`. The poll function contains all the website-specific logic to extract the text of interest. Most of this function can be automatically generated using [`playwright codegen`](https://playwright.dev/python/docs/codegen#running-codegen).
3. We invoke `watchr`, which will poll the website once.

By default, `watchr` will simply print the text to std out. If you want to receive alerts on your phone via telegram, we need to modify the [script](examples/simple_telegram_alerting.py) slightly:
```Python
import os

from playwright.sync_api import Playwright
from web_watchr import Watchr
from web_watchr.alert import TelegramAlerter

watchr = Watchr(
    alerter=TelegramAlerter(
        token=os.getenv("TELEGRAM_TOKEN"),
        chat_id=os.getenv("TELEGRAM_CHAT_ID"),
    )
)


@watchr.set_poller
def poll(playwright: Playwright) -> str:
    browser = playwright.chromium.launch(headless=True)
    context = browser.new_context()
    page = context.new_page()
    page.goto("https://www.example.com/")
    text = page.get_by_role("heading").inner_text()
    context.close()
    browser.close()

    return text


if __name__ == "__main__":
    watchr()
```

There are two key changes compared to the inital script:

1. We removed the `DummyComparer`. By default, `Watchr` uses an `FSComparer` which stores the old state in a file. The default location is `~/.local/share/web_watchr/cache`, which can be adapted. This has the advantage that the runner does not need to run continously, but can be invoked periodically (e.g., via `cron`).
2. We instantiated a `TelegramAlerter` reading a `token` and a `chat_id` from some environment variables. These are secrets of your bot that you need to send messages with it. If you are unsure how to create a bot, please have a look [here](https://core.telegram.org/bots/tutorial#obtain-your-bot-token). To find out your `chat_id`, you can use the approach mentioned [here](https://stackoverflow.com/a/32572159/9685500).

> [!CAUTION]
> Keep your bot token secret. In particular, make sure to never add it to version control. Otherwise, malicious actors can use it for ther purposes.

Running the script will now send updates to your phone via telegram!

## Documentation

So far, almost all of the documentation is restricted to this readme. However, you can have a look at the [API Reference](https://emrys-merlin.github.io/web_watchr/api).

## Contribution

If you like what you see and would like to extend it, you can do so by
- filing an issue with a feature request (no promises on my part though) and
- forking the repo and opening a pull request.

I'm always happy to chat, so you can also simply reach out and we can talk.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "web-watchr",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "bot, scraping, telegram",
    "author": null,
    "author_email": "Tim Adler <tim+github@emrys-merlin.de>",
    "download_url": "https://files.pythonhosted.org/packages/6b/68/1f771c5110495717c5ac883d4e595afd3eed3ce4e69804a0af8bf4b6852f/web_watchr-0.3.0.tar.gz",
    "platform": null,
    "description": "![GitHub License](https://img.shields.io/github/license/Emrys-Merlin/web_watchr)\n![Python Version from PEP 621 TOML](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2FEmrys-Merlin%2Fweb_watchr%2Fmain%2Fpyproject.toml)\n\n| 3.10 | 3.11 | 3.12 | 3.13 |\n|------|------|------|------|\n|![tests](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2Fweb_watchr_3.10-junit-tests.json)|![tests](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2Fweb_watchr_3.11-junit-tests.json)|![tests](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2Fweb_watchr_3.12-junit-tests.json)|![tests](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2Fweb_watchr_3.13-junit-tests.json)|\n|![Endpoint Badge](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2F272263ce795e0ca0adf06fa6d8aa2fe496a778dd%2Fweb_watchr_3.10-cobertura-coverage.json)|![Endpoint Badge](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2F272263ce795e0ca0adf06fa6d8aa2fe496a778dd%2Fweb_watchr_3.11-cobertura-coverage.json)|![Endpoint Badge](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2F272263ce795e0ca0adf06fa6d8aa2fe496a778dd%2Fweb_watchr_3.12-cobertura-coverage.json)|![Endpoint Badge](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FEmrys-Merlin%2Fec2e4e339a048ca0f0b996517d282a4a%2Fraw%2F272263ce795e0ca0adf06fa6d8aa2fe496a778dd%2Fweb_watchr_3.13-cobertura-coverage.json)|\n\n# WebWatchr\n\nThis python package is a framework around [Playwright](https://Playwright.dev/python) to monitor a website and receive an alert if the monitored text changes. The setup is quite modular. To specify the website to monitor, you need to define a `Callable[[Playwright], str]` which is responsible to extract the text you are interested in. Currently, the only available alerting channel is via [telegram](https://telegram.org) bot. However, more alerting channels will follow.\n\n> [!IMPORTANT]\n> Before you start scraping any website, please make sure that you are allowed to. Besides legal obligations, please consider reaching out to the website owner and please respect `robots.txt`files.\n\n## Installation\n\nThe package is available via PyPI. You can install it via\n```\npip install web_watchr\n```\nIf you prefer the latest changes, you can also install it directly from the repository via:\n```\npip install git+https://github.com/Emrys-Merlin/web_watchr\n```\n\n## Usage\n\nAfter the installation, the intended way to invoke the framework is by writing a small runner script (which you can find [here](examples/simple_dummy_example.py)):\n```Python\nfrom playwright.sync_api import Playwright\nfrom web_watchr import Watchr\nfrom web_watchr.compare import DummyComparer\n\nwatchr = Watchr(\n    comparer=DummyComparer(),\n)\n\n\n@watchr.set_poller\ndef poll(playwright: Playwright) -> str:\n    browser = playwright.chromium.launch(headless=True)\n    context = browser.new_context()\n    page = context.new_page()\n    page.goto(\"https://www.example.com/\")\n    text = page.get_by_role(\"heading\").inner_text()\n    context.close()\n    browser.close()\n\n    return text\n\n\nif __name__ == \"__main__\":\n    watchr()\n```\n\nThe runner consists of three parts:\n\n1. A new `Watchr` object is initialized. For illustration purposes, a `DummyComparer` instance is passed to it, which will indicate that the monitored text has changed no matter the input.\n2. We implement the `poll` function and decorate it with `@watchr.set_poller`. The poll function contains all the website-specific logic to extract the text of interest. Most of this function can be automatically generated using [`playwright codegen`](https://playwright.dev/python/docs/codegen#running-codegen).\n3. We invoke `watchr`, which will poll the website once.\n\nBy default, `watchr` will simply print the text to std out. If you want to receive alerts on your phone via telegram, we need to modify the [script](examples/simple_telegram_alerting.py) slightly:\n```Python\nimport os\n\nfrom playwright.sync_api import Playwright\nfrom web_watchr import Watchr\nfrom web_watchr.alert import TelegramAlerter\n\nwatchr = Watchr(\n    alerter=TelegramAlerter(\n        token=os.getenv(\"TELEGRAM_TOKEN\"),\n        chat_id=os.getenv(\"TELEGRAM_CHAT_ID\"),\n    )\n)\n\n\n@watchr.set_poller\ndef poll(playwright: Playwright) -> str:\n    browser = playwright.chromium.launch(headless=True)\n    context = browser.new_context()\n    page = context.new_page()\n    page.goto(\"https://www.example.com/\")\n    text = page.get_by_role(\"heading\").inner_text()\n    context.close()\n    browser.close()\n\n    return text\n\n\nif __name__ == \"__main__\":\n    watchr()\n```\n\nThere are two key changes compared to the inital script:\n\n1. We removed the `DummyComparer`. By default, `Watchr` uses an `FSComparer` which stores the old state in a file. The default location is `~/.local/share/web_watchr/cache`, which can be adapted. This has the advantage that the runner does not need to run continously, but can be invoked periodically (e.g., via `cron`).\n2. We instantiated a `TelegramAlerter` reading a `token` and a `chat_id` from some environment variables. These are secrets of your bot that you need to send messages with it. If you are unsure how to create a bot, please have a look [here](https://core.telegram.org/bots/tutorial#obtain-your-bot-token). To find out your `chat_id`, you can use the approach mentioned [here](https://stackoverflow.com/a/32572159/9685500).\n\n> [!CAUTION]\n> Keep your bot token secret. In particular, make sure to never add it to version control. Otherwise, malicious actors can use it for ther purposes.\n\nRunning the script will now send updates to your phone via telegram!\n\n## Documentation\n\nSo far, almost all of the documentation is restricted to this readme. However, you can have a look at the [API Reference](https://emrys-merlin.github.io/web_watchr/api).\n\n## Contribution\n\nIf you like what you see and would like to extend it, you can do so by\n- filing an issue with a feature request (no promises on my part though) and\n- forking the repo and opening a pull request.\n\nI'm always happy to chat, so you can also simply reach out and we can talk.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Monitors a website for changes in a text element and publishes an alert (e.g., on telegram)",
    "version": "0.3.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/Emrys-Merlin/web_watchr/issues",
        "Documentation": "https://Emrys-Merlin.github.io/web_watchr/",
        "Homepage": "https://github.com/Emrys-Merlin/web_watchr",
        "Repository": "https://github.com/Emrys-Merlin/web_watchr.git"
    },
    "split_keywords": [
        "bot",
        " scraping",
        " telegram"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0cc2c57a62493d644323ebe639ea15ca51d0d32aef829a76b1aa9a439b085ec2",
                "md5": "d21462dd1eb162233da1fc0366f31260",
                "sha256": "3c1b58eb245562ccab309112e93fcb3c3b89b859001cb232cb65fbeca1611747"
            },
            "downloads": -1,
            "filename": "web_watchr-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d21462dd1eb162233da1fc0366f31260",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 9751,
            "upload_time": "2024-10-21T19:02:27",
            "upload_time_iso_8601": "2024-10-21T19:02:27.293602Z",
            "url": "https://files.pythonhosted.org/packages/0c/c2/c57a62493d644323ebe639ea15ca51d0d32aef829a76b1aa9a439b085ec2/web_watchr-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6b681f771c5110495717c5ac883d4e595afd3eed3ce4e69804a0af8bf4b6852f",
                "md5": "8c8bb79bfd42862da92613414fdefeaa",
                "sha256": "e30758073b34840dbf6355c07f04f6c2d7266b7746b8d0b1106baca7f8b365b8"
            },
            "downloads": -1,
            "filename": "web_watchr-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8c8bb79bfd42862da92613414fdefeaa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 52139,
            "upload_time": "2024-10-21T19:02:28",
            "upload_time_iso_8601": "2024-10-21T19:02:28.942814Z",
            "url": "https://files.pythonhosted.org/packages/6b/68/1f771c5110495717c5ac883d4e595afd3eed3ce4e69804a0af8bf4b6852f/web_watchr-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-21 19:02:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Emrys-Merlin",
    "github_project": "web_watchr",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "web-watchr"
}
        
Elapsed time: 4.29179s