Pushl


NamePushl JSON
Version 0.4.0 PyPI version JSON
download
home_pagehttps://plaidweb.site/
SummaryA conduit for pushing changes in a feed to the rest of the IndieWeb
upload_time2025-01-01 18:33:28
maintainerNone
docs_urlNone
authorfluffy
requires_python<4.0.0,>=3.10.0
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Pushl

A simple tool that parses content feeds and sends out appropriate push notifications (WebSub, webmention, etc.) when they change.

See http://publ.beesbuzz.biz/blog/113-Some-thoughts-on-WebMention for the motivation.

## Features

* Supports any feed supported by [feedparser](https://github.com/kurtmckee/feedparser)
    and [mf2py](https://github.com/microformats/mf2py) (RSS, Atom, HTML pages containing
    `h-entry`, etc.)
* Will send WebSub notifications for feeds which declare a WebSub hub
* Will send WebMention notifications for entries discovered on those feeds or specified directly
* Can perform autodiscovery of additional feeds on entry pages
* Can do a full backfill on Atom feeds configured with [RFC 5005](https://tools.ietf.org/html/rfc5005)
* When configured to use a cache directory, can detect entry deletions and updates to implement the webmention update and delete protocols (as well as saving some time and bandwidth)


## Site setup

If you want to support WebSub, have your feed implement [the WebSub protocol](https://indieweb.org/WebSub). The short version is that you should have a `<link rel="hub" href="http://path/to/hub" />` in your feed's top-level element.

There are a number of WebSub hubs available; I use [Superfeedr](http://pubsubhubbub.superfeedr.com).

For [WebMentions](https://indieweb.org/Webmention), configure your site templates with the various microformats; by default, Pushl will use the following tags as the top-level entry container, in descending order of priority:

* Anything with a `class` of `h-entry`
* An `<article>` tag
* Anything with a `class` of `entry`

For more information on how to configure your site templates, see the [microformats h-entry specification](http://microformats.org/wiki/h-entry).

### mf2 feed notes

If you're using an mf2 feed (i.e. an HTML-formatted page with `h-entry` declarations), only entries with a `u-url` property will be used for sending webmentions; further, Pushl will retrieve the page from that URL to ensure it has the full content. (This is to work around certain setups where the `h-feed` only shows summary text.)

Also, there is technically no requirement for an HTML page to declare an `h-feed`; all entities marked up with `h-entry` will be consumed.

## Installation

You can install it using `pip` with e.g.:

```bash
pip3 install pushl
```

However, I recommend installing it in a virtual environment with e.g.:

```bash
virtualenv3 $HOME/pushl
$HOME/pushl/bin/pip3 install pushl
```

and then putting a symlink to `$HOME/pushl/bin/pushl` to a directory in your $PATH, e.g.

```bash
ln -s $HOME/pushl/bin/pushl $HOME/bin/pushl
```

## Usage

### Basic

```bash
pushl -c $HOME/var/pushl-cache http://example.com/feed.xml
```

While you can run it without the `-c` argument, its use is highly recommended so that subsequent runs are both less spammy and so that it can detect changes and deletions.

### Sending pings from individual entries

If you just want to send webmentions from an entry page without processing an entire feed, the `-e/--entry` flag indicates that the following URLs are pages or entries, rather than feeds; e.g.

```bash
pushl -e http://example.com/some/page
```

will simply send the webmentions for that page.

### Additional feed discovery

The `-r/--recurse` flag will discover any additional feeds that are declared on entries and process them as well. This is useful if you have per-category feeds that you would also like to send WebSub notifications on. For example, [my site](http://beesbuzz.biz) has per-category feeds which are discoverable from individual entries, so `pushl -r http://beesbuzz.biz/feed` will send WebSub notifications for all of the categories which have recent changes.

Note that `-r` and `-e` in conjunction will also cause the feed declared on the entry page to be processed further. While it is tempting to use this in a feed autodiscovery context e.g.

```bash
pushl -re http://example.com/blog/
```

this will also send webmentions from the blog page itself which is probably *not* what you want to have happen.

### Backfilling old content

If your feed implements [RFC 5005](https://tools.ietf.org/html/rfc5005), the `-a` flag will scan past entries for WebMention as well. It is recommended to only use this flag when doing an initial backfill, as it can end up taking a long time on larger sites (and possibly make endpoint operators very grumpy at you). To send updates of much older entries it's better to just use `-e` to do it on a case-by-case basis.

### Dual-protocol/multi-domain websites

If you have a website which has multiple URLs that can access it (for example, http+https, or multiple domain names), you generally only want WebMentions to be sent from the canonical URL. The best solution is to use `<link rel="canonical">` to declare which one is the real one, and Pushl will use that in sending the mentions; so, for example:


```bash
pushl -r https://example.com/feed http://example.com/feed http://alt-domain.example.com/feed
```

As long as both `http://example.com` and `http://alt-domain.example.com` declare the `https://example.com` version as canonical, only the webmentions from `https://example.com` will be sent.

If, for some reason, you can't use `rel="canonical"` you can use the `-s/--websub-only` flag on Pushl to have it only send WebSub notifications for that feed; for example:

```bash
pushl -r https://example.com/feed -s https://other.example.com/feed
```

will send both Webmention and WebSub for `https://example.com` but only WebSub for `https://other.example.com`.

## Automated updates

`pushl` can be run from a cron job, although it's a good idea to use `flock -n` to prevent multiple instances from stomping on each other. An example cron job for updating a site might look like:

```crontab
*/5 * * * * flock -n $HOME/.pushl-lock pushl -rc $HOME/.pushl-cache http://example.com/feed
```

### My setup

In my setup, I have `pushl` installed in my website's pipenv:

```bash
cd $HOME/beesbuzz.biz
pipenv install pushl
```

and created this script as `$HOME/beesbuzz.biz/pushl.sh`:

```bash
#!/bin/bash

cd $(dirname "$0")
LOG=logs/pushl-$(date +%Y%m%d.log)

# redirect log output
if [ "$1" == "quiet" ] ; then
    exec >> $LOG 2>&1
else
    exec 2>&1 | tee -a $LOG
fi

# add timestamp
date

# run pushl
flock -n $HOME/var/pushl/run.lock $HOME/.local/bin/pipenv run pushl -rvvkc $HOME/var/pushl \
    https://beesbuzz.biz/feed\?push=1 \
    http://publ.beesbuzz.biz/feed\?push=1 \
    https://tumblr.beesbuzz.biz/rss \
    https://novembeat.com/feed\?push=1 \
    http://beesbuzz.biz/feed\?push=1 \
    -s http://beesbuzz.biz/feed-summary https://beesbuzz.biz/feed-summary

# while we're at it, clean out the log and pushl cache directory
find logs $HOME/var/pushl -type f -mtime +30 -print -delete
```

Then I have a cron job:

```crontab
*/15 * * * * $HOME/beesbuzz.biz/pushl.sh quiet
```

which runs it every 15 minutes.

I also have a [git deployment hook](http://publ.beesbuzz.biz/441) for my website, and its final step (after restarting `gunicorn`) is to run `pushl.sh`, in case a maximum latency of 15 minutes just isn't fast enough.

            

Raw data

            {
    "_id": null,
    "home_page": "https://plaidweb.site/",
    "name": "Pushl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0.0,>=3.10.0",
    "maintainer_email": null,
    "keywords": null,
    "author": "fluffy",
    "author_email": "fluffy@beesbuzz.biz",
    "download_url": "https://files.pythonhosted.org/packages/4b/32/f471a27c40a2e107c41b898ac6b0c59b1c844edcfe9f5883ad705c3661b3/pushl-0.4.0.tar.gz",
    "platform": null,
    "description": "# Pushl\n\nA simple tool that parses content feeds and sends out appropriate push notifications (WebSub, webmention, etc.) when they change.\n\nSee http://publ.beesbuzz.biz/blog/113-Some-thoughts-on-WebMention for the motivation.\n\n## Features\n\n* Supports any feed supported by [feedparser](https://github.com/kurtmckee/feedparser)\n    and [mf2py](https://github.com/microformats/mf2py) (RSS, Atom, HTML pages containing\n    `h-entry`, etc.)\n* Will send WebSub notifications for feeds which declare a WebSub hub\n* Will send WebMention notifications for entries discovered on those feeds or specified directly\n* Can perform autodiscovery of additional feeds on entry pages\n* Can do a full backfill on Atom feeds configured with [RFC 5005](https://tools.ietf.org/html/rfc5005)\n* When configured to use a cache directory, can detect entry deletions and updates to implement the webmention update and delete protocols (as well as saving some time and bandwidth)\n\n\n## Site setup\n\nIf you want to support WebSub, have your feed implement [the WebSub protocol](https://indieweb.org/WebSub). The short version is that you should have a `<link rel=\"hub\" href=\"http://path/to/hub\" />` in your feed's top-level element.\n\nThere are a number of WebSub hubs available; I use [Superfeedr](http://pubsubhubbub.superfeedr.com).\n\nFor [WebMentions](https://indieweb.org/Webmention), configure your site templates with the various microformats; by default, Pushl will use the following tags as the top-level entry container, in descending order of priority:\n\n* Anything with a `class` of `h-entry`\n* An `<article>` tag\n* Anything with a `class` of `entry`\n\nFor more information on how to configure your site templates, see the [microformats h-entry specification](http://microformats.org/wiki/h-entry).\n\n### mf2 feed notes\n\nIf you're using an mf2 feed (i.e. an HTML-formatted page with `h-entry` declarations), only entries with a `u-url` property will be used for sending webmentions; further, Pushl will retrieve the page from that URL to ensure it has the full content. (This is to work around certain setups where the `h-feed` only shows summary text.)\n\nAlso, there is technically no requirement for an HTML page to declare an `h-feed`; all entities marked up with `h-entry` will be consumed.\n\n## Installation\n\nYou can install it using `pip` with e.g.:\n\n```bash\npip3 install pushl\n```\n\nHowever, I recommend installing it in a virtual environment with e.g.:\n\n```bash\nvirtualenv3 $HOME/pushl\n$HOME/pushl/bin/pip3 install pushl\n```\n\nand then putting a symlink to `$HOME/pushl/bin/pushl` to a directory in your $PATH, e.g.\n\n```bash\nln -s $HOME/pushl/bin/pushl $HOME/bin/pushl\n```\n\n## Usage\n\n### Basic\n\n```bash\npushl -c $HOME/var/pushl-cache http://example.com/feed.xml\n```\n\nWhile you can run it without the `-c` argument, its use is highly recommended so that subsequent runs are both less spammy and so that it can detect changes and deletions.\n\n### Sending pings from individual entries\n\nIf you just want to send webmentions from an entry page without processing an entire feed, the `-e/--entry` flag indicates that the following URLs are pages or entries, rather than feeds; e.g.\n\n```bash\npushl -e http://example.com/some/page\n```\n\nwill simply send the webmentions for that page.\n\n### Additional feed discovery\n\nThe `-r/--recurse` flag will discover any additional feeds that are declared on entries and process them as well. This is useful if you have per-category feeds that you would also like to send WebSub notifications on. For example, [my site](http://beesbuzz.biz) has per-category feeds which are discoverable from individual entries, so `pushl -r http://beesbuzz.biz/feed` will send WebSub notifications for all of the categories which have recent changes.\n\nNote that `-r` and `-e` in conjunction will also cause the feed declared on the entry page to be processed further. While it is tempting to use this in a feed autodiscovery context e.g.\n\n```bash\npushl -re http://example.com/blog/\n```\n\nthis will also send webmentions from the blog page itself which is probably *not* what you want to have happen.\n\n### Backfilling old content\n\nIf your feed implements [RFC 5005](https://tools.ietf.org/html/rfc5005), the `-a` flag will scan past entries for WebMention as well. It is recommended to only use this flag when doing an initial backfill, as it can end up taking a long time on larger sites (and possibly make endpoint operators very grumpy at you). To send updates of much older entries it's better to just use `-e` to do it on a case-by-case basis.\n\n### Dual-protocol/multi-domain websites\n\nIf you have a website which has multiple URLs that can access it (for example, http+https, or multiple domain names), you generally only want WebMentions to be sent from the canonical URL. The best solution is to use `<link rel=\"canonical\">` to declare which one is the real one, and Pushl will use that in sending the mentions; so, for example:\n\n\n```bash\npushl -r https://example.com/feed http://example.com/feed http://alt-domain.example.com/feed\n```\n\nAs long as both `http://example.com` and `http://alt-domain.example.com` declare the `https://example.com` version as canonical, only the webmentions from `https://example.com` will be sent.\n\nIf, for some reason, you can't use `rel=\"canonical\"` you can use the `-s/--websub-only` flag on Pushl to have it only send WebSub notifications for that feed; for example:\n\n```bash\npushl -r https://example.com/feed -s https://other.example.com/feed\n```\n\nwill send both Webmention and WebSub for `https://example.com` but only WebSub for `https://other.example.com`.\n\n## Automated updates\n\n`pushl` can be run from a cron job, although it's a good idea to use `flock -n` to prevent multiple instances from stomping on each other. An example cron job for updating a site might look like:\n\n```crontab\n*/5 * * * * flock -n $HOME/.pushl-lock pushl -rc $HOME/.pushl-cache http://example.com/feed\n```\n\n### My setup\n\nIn my setup, I have `pushl` installed in my website's pipenv:\n\n```bash\ncd $HOME/beesbuzz.biz\npipenv install pushl\n```\n\nand created this script as `$HOME/beesbuzz.biz/pushl.sh`:\n\n```bash\n#!/bin/bash\n\ncd $(dirname \"$0\")\nLOG=logs/pushl-$(date +%Y%m%d.log)\n\n# redirect log output\nif [ \"$1\" == \"quiet\" ] ; then\n    exec >> $LOG 2>&1\nelse\n    exec 2>&1 | tee -a $LOG\nfi\n\n# add timestamp\ndate\n\n# run pushl\nflock -n $HOME/var/pushl/run.lock $HOME/.local/bin/pipenv run pushl -rvvkc $HOME/var/pushl \\\n    https://beesbuzz.biz/feed\\?push=1 \\\n    http://publ.beesbuzz.biz/feed\\?push=1 \\\n    https://tumblr.beesbuzz.biz/rss \\\n    https://novembeat.com/feed\\?push=1 \\\n    http://beesbuzz.biz/feed\\?push=1 \\\n    -s http://beesbuzz.biz/feed-summary https://beesbuzz.biz/feed-summary\n\n# while we're at it, clean out the log and pushl cache directory\nfind logs $HOME/var/pushl -type f -mtime +30 -print -delete\n```\n\nThen I have a cron job:\n\n```crontab\n*/15 * * * * $HOME/beesbuzz.biz/pushl.sh quiet\n```\n\nwhich runs it every 15 minutes.\n\nI also have a [git deployment hook](http://publ.beesbuzz.biz/441) for my website, and its final step (after restarting `gunicorn`) is to run `pushl.sh`, in case a maximum latency of 15 minutes just isn't fast enough.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A conduit for pushing changes in a feed to the rest of the IndieWeb",
    "version": "0.4.0",
    "project_urls": {
        "Homepage": "https://plaidweb.site/",
        "Repository": "https://github.com/PlaidWeb/Pushl"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0f2a3fc40b7ff0c8d7a95100c33faedef6b0bdf7385ea5a27403da3d3d0a93b9",
                "md5": "71384a070f47aaa59a388c51b7b920da",
                "sha256": "a1fcd7053c89ef7bcfd6a3796bedb797bb2b3283d5570bafb3c14d1657f511f4"
            },
            "downloads": -1,
            "filename": "pushl-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "71384a070f47aaa59a388c51b7b920da",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0.0,>=3.10.0",
            "size": 19618,
            "upload_time": "2025-01-01T18:33:26",
            "upload_time_iso_8601": "2025-01-01T18:33:26.341729Z",
            "url": "https://files.pythonhosted.org/packages/0f/2a/3fc40b7ff0c8d7a95100c33faedef6b0bdf7385ea5a27403da3d3d0a93b9/pushl-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4b32f471a27c40a2e107c41b898ac6b0c59b1c844edcfe9f5883ad705c3661b3",
                "md5": "5354111af1e41141593bc5c8082cc36f",
                "sha256": "13fe73c502ed433e263d47cd4a72ba0c764c85de3e59288b4d435fe46a7b07ef"
            },
            "downloads": -1,
            "filename": "pushl-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "5354111af1e41141593bc5c8082cc36f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0.0,>=3.10.0",
            "size": 18260,
            "upload_time": "2025-01-01T18:33:28",
            "upload_time_iso_8601": "2025-01-01T18:33:28.598713Z",
            "url": "https://files.pythonhosted.org/packages/4b/32/f471a27c40a2e107c41b898ac6b0c59b1c844edcfe9f5883ad705c3661b3/pushl-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-01 18:33:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "PlaidWeb",
    "github_project": "Pushl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pushl"
}
        
Elapsed time: 0.89675s