ek-scraper


Nameek-scraper JSON
Version 0.2.2 PyPI version JSON
download
home_pagehttps://github.com/jonasehrlich/ek-scraper
SummarySimple scraper for kleinanzeigen.de searches with notifications for new ads.
upload_time2024-01-02 21:18:15
maintainer
docs_urlNone
authorJonas Ehrlich
requires_python>=3.9,<4.0
licenseMIT
keywords kleinanzeigen.de scraper
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ek-scraper

Simple scraper for kleinanzeigen.de searches with notifications for new ads.

## Installation

Install this package from PyPi in a separate virtual environment using [`pipx`](https://github.com/pypa/pipx).

``` sh
pipx install ek-scraper
```

## Usage

> For the full usage check the `ek-scraper --help` command

Create a configuration file using

``` sh
ek-scraper create-config <path/to/config.json>
```

The example configuration file will look like this:

```json
{
  "filter": {
    "exclude_topads": true,
    "exclude_patterns": []
  },
  "notifications": {
    "pushover": {
        "token": "<your-app-api-token>",
        "user": "<your-user-api-token>",
        "device": []
    },
    "ntfy.sh": {
      "topic": "<your-private-topic>",
      "priority": 3
    },
  },
  "searches": [
    {
      "name": "Wohnungen in Hamburg Altona",
      "url": "https://www.kleinanzeigen.de/s-wohnung-mieten/altona/c203l9497",
      "recursive": true
    }
  ]
}
```

See [Configuration](#configuration) for details on all configuration options.

* Configure one or more searches in the `searches` section of the configuration,
  see [Searches](#searches) for more details
* Configure notifications in the `notifications` section of the configuration,
  see [Notifications](#notifications) for details on notification configuration
* (Optional) Configure filters in the `filter` section of the configuration,
  see [Filter](#filter) for more details

Run the following command to initialize the data store without sending any notifications:

``` sh
ek-scraper run --no-notifications path/to/config.json
```

Afterwards, run

```sh
ek-scraper run path/to/config.json
```

to receive notifications according to your `notifications` configuration.

## Development

Follow the steps below to set up a development environment for this project.

1. Clone this repository

   ``` sh
   git clone git@github.com:jonasehrlich/ek-scraper.git
   ```

2. Change directory into the repository

   ``` sh
   cd ek-scraper
   ```

3. Create a virtual environment using [poetry](https://python-poetry.org)

   ``` sh
   poetry install
   ```

4. (Optional) Install pre-commit environment

   ``` sh
   $ pre-commit
   [INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
   [INFO] Once installed this environment will be reused.
   [INFO] This may take a few minutes...
   [INFO] Installing environment for https://github.com/psf/black.
   [INFO] Once installed this environment will be reused.
   [INFO] This may take a few minutes...
   Check Yaml...........................................(no files to check)Skipped
   Fix End of Files.....................................(no files to check)Skipped
   Trim Trailing Whitespace.............................(no files to check)Skipped
   black................................................(no files to check)Skipped
   ```

## Configuration

### Searches

Searches can be configured in the `searches` array of the configuration file.
Each of the searches can be configured with the following parameters.

| Name        | Description                                                                        |
| ----------- | ---------------------------------------------------------------------------------- |
| `name`      | Name of the search, use a descriptive one (required)                               |
| `url`       | URL of the first page of your search (required)                                    |
| `recursive` | Whether to follow all pages of the search result <br/>(optional, defaults to true) |

### Filter

Filters can be configured in the `filter` section of the configuration file to exclude specific ads
from your scrape results on the client side. The following settings can be configured.

| Name | Description |
| ---- | ----------- |
| `exclude_topads` | Whether to exclude top ads from the results (optional, defaults to true) |
| `exclude_patterns` | Case-insensitive regular expression patterns used to exclude ads (optional) |

### Notifications

Notifications can be configured in the `notifications` section of the configuration file.

#### Push notifications using [Pushover](https://pushover.net/)

![Screenshot of a push notification using Pushover](assets/pushover-notification.jpeg)

`ek-scraper` supports push notifications to your devices using [Pushover](https://pushover.net/).
For further information on the service check their terms and conditions.

The implementation for _Pushover_ notifications will send a single notification per search, if new
ads are discovered.

To configure _Pushover_ for notifications from the scraper, first register at the service and create
an application (e.g. `ek-scraper`). To use the service in `ek-scraper`, add the `pushover` object
to the `notifications` object in your configuration file and fill the API tokens. Selection of the
devices which will receive the notifications, is supported using the `device` array.

| Name     | Description |
| -------- | ------------------------------------------------------------------------------------------- |
| `token`  | API token of the Pushover app (required) |
| `user`   | API token of the Pushover user (required) |
| `device` | List of device names to send the notifications to <br/> (optional, defaults to all devices) |

#### Push notifications using [ntfy.sh](https://ntfy.sh/)

![Screenshot of a push notification using ntfy.sh](assets/ntfy-sh-notification.jpeg)

`ek-scraper` supports push notifications to your devices using [ntfy.sh](https://ntfy.sh/).
For further information on the service check their terms and conditions.

The implementation for _ntfy.sh_ notifications will send a single notification per search, if new
ads are discovered.

To configure _ntfy.sh_ for notifications from the scraper, define a topic and subscribe to it in the
mobile app.

> Note that topic names are public, so it's wise to choose something that cannot be guessed easily.
> This can be done by including a UUID, e.g. by running the following command in your shell:
>
> ``` sh
> echo "ek-scraper-$(uuidgen)"
> ```

To use the service in `ek-scraper`, add the `ntfy.sh` object to the `notifications` object in your
configuration file and add the topic you previously subscribed to.

| Name       | Description                                                       |
| ---------- | ----------------------------------------------------------------- |
| `topic`    | Topic to publish the notifications to                             |
| `priority` | Priority to send the notifications with (optional, defaults to 3) |

## Running `ek-scraper` regularly

> It should be avoided to run the tool too often to avoid getting your IP address blocked by
> [kleinanzeigen.de](kleinanzeigen.de)

In order to run `ek-scraper` regularly on a Unix-like system, configure it as a cronjob.

To configure a cronjob, run

``` sh
crontab -e
```

Edit the crontab table to run the command you want to run. A handy tool to check schedule
configurations for cronjobs is [crontab.guru](https://crontab.guru/).

For more information on configuring cronjobs use your favorite search engine.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jonasehrlich/ek-scraper",
    "name": "ek-scraper",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "kleinanzeigen.de,scraper",
    "author": "Jonas Ehrlich",
    "author_email": "jonas.ehrlich@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/22/0a/08a9693dbf5b3bae5063066d598c51787dcea546ae125e81bca76ede339b/ek_scraper-0.2.2.tar.gz",
    "platform": null,
    "description": "# ek-scraper\n\nSimple scraper for kleinanzeigen.de searches with notifications for new ads.\n\n## Installation\n\nInstall this package from PyPi in a separate virtual environment using [`pipx`](https://github.com/pypa/pipx).\n\n``` sh\npipx install ek-scraper\n```\n\n## Usage\n\n> For the full usage check the `ek-scraper --help` command\n\nCreate a configuration file using\n\n``` sh\nek-scraper create-config <path/to/config.json>\n```\n\nThe example configuration file will look like this:\n\n```json\n{\n  \"filter\": {\n    \"exclude_topads\": true,\n    \"exclude_patterns\": []\n  },\n  \"notifications\": {\n    \"pushover\": {\n        \"token\": \"<your-app-api-token>\",\n        \"user\": \"<your-user-api-token>\",\n        \"device\": []\n    },\n    \"ntfy.sh\": {\n      \"topic\": \"<your-private-topic>\",\n      \"priority\": 3\n    },\n  },\n  \"searches\": [\n    {\n      \"name\": \"Wohnungen in Hamburg Altona\",\n      \"url\": \"https://www.kleinanzeigen.de/s-wohnung-mieten/altona/c203l9497\",\n      \"recursive\": true\n    }\n  ]\n}\n```\n\nSee [Configuration](#configuration) for details on all configuration options.\n\n* Configure one or more searches in the `searches` section of the configuration,\n  see [Searches](#searches) for more details\n* Configure notifications in the `notifications` section of the configuration,\n  see [Notifications](#notifications) for details on notification configuration\n* (Optional) Configure filters in the `filter` section of the configuration,\n  see [Filter](#filter) for more details\n\nRun the following command to initialize the data store without sending any notifications:\n\n``` sh\nek-scraper run --no-notifications path/to/config.json\n```\n\nAfterwards, run\n\n```sh\nek-scraper run path/to/config.json\n```\n\nto receive notifications according to your `notifications` configuration.\n\n## Development\n\nFollow the steps below to set up a development environment for this project.\n\n1. Clone this repository\n\n   ``` sh\n   git clone git@github.com:jonasehrlich/ek-scraper.git\n   ```\n\n2. Change directory into the repository\n\n   ``` sh\n   cd ek-scraper\n   ```\n\n3. Create a virtual environment using [poetry](https://python-poetry.org)\n\n   ``` sh\n   poetry install\n   ```\n\n4. (Optional) Install pre-commit environment\n\n   ``` sh\n   $ pre-commit\n   [INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.\n   [INFO] Once installed this environment will be reused.\n   [INFO] This may take a few minutes...\n   [INFO] Installing environment for https://github.com/psf/black.\n   [INFO] Once installed this environment will be reused.\n   [INFO] This may take a few minutes...\n   Check Yaml...........................................(no files to check)Skipped\n   Fix End of Files.....................................(no files to check)Skipped\n   Trim Trailing Whitespace.............................(no files to check)Skipped\n   black................................................(no files to check)Skipped\n   ```\n\n## Configuration\n\n### Searches\n\nSearches can be configured in the `searches` array of the configuration file.\nEach of the searches can be configured with the following parameters.\n\n| Name        | Description                                                                        |\n| ----------- | ---------------------------------------------------------------------------------- |\n| `name`      | Name of the search, use a descriptive one (required)                               |\n| `url`       | URL of the first page of your search (required)                                    |\n| `recursive` | Whether to follow all pages of the search result <br/>(optional, defaults to true) |\n\n### Filter\n\nFilters can be configured in the `filter` section of the configuration file to exclude specific ads\nfrom your scrape results on the client side. The following settings can be configured.\n\n| Name | Description |\n| ---- | ----------- |\n| `exclude_topads` | Whether to exclude top ads from the results (optional, defaults to true) |\n| `exclude_patterns` | Case-insensitive regular expression patterns used to exclude ads (optional) |\n\n### Notifications\n\nNotifications can be configured in the `notifications` section of the configuration file.\n\n#### Push notifications using [Pushover](https://pushover.net/)\n\n![Screenshot of a push notification using Pushover](assets/pushover-notification.jpeg)\n\n`ek-scraper` supports push notifications to your devices using [Pushover](https://pushover.net/).\nFor further information on the service check their terms and conditions.\n\nThe implementation for _Pushover_ notifications will send a single notification per search, if new\nads are discovered.\n\nTo configure _Pushover_ for notifications from the scraper, first register at the service and create\nan application (e.g. `ek-scraper`). To use the service in `ek-scraper`, add the `pushover` object\nto the `notifications` object in your configuration file and fill the API tokens. Selection of the\ndevices which will receive the notifications, is supported using the `device` array.\n\n| Name     | Description |\n| -------- | ------------------------------------------------------------------------------------------- |\n| `token`  | API token of the Pushover app (required) |\n| `user`   | API token of the Pushover user (required) |\n| `device` | List of device names to send the notifications to <br/> (optional, defaults to all devices) |\n\n#### Push notifications using [ntfy.sh](https://ntfy.sh/)\n\n![Screenshot of a push notification using ntfy.sh](assets/ntfy-sh-notification.jpeg)\n\n`ek-scraper` supports push notifications to your devices using [ntfy.sh](https://ntfy.sh/).\nFor further information on the service check their terms and conditions.\n\nThe implementation for _ntfy.sh_ notifications will send a single notification per search, if new\nads are discovered.\n\nTo configure _ntfy.sh_ for notifications from the scraper, define a topic and subscribe to it in the\nmobile app.\n\n> Note that topic names are public, so it's wise to choose something that cannot be guessed easily.\n> This can be done by including a UUID, e.g. by running the following command in your shell:\n>\n> ``` sh\n> echo \"ek-scraper-$(uuidgen)\"\n> ```\n\nTo use the service in `ek-scraper`, add the `ntfy.sh` object to the `notifications` object in your\nconfiguration file and add the topic you previously subscribed to.\n\n| Name       | Description                                                       |\n| ---------- | ----------------------------------------------------------------- |\n| `topic`    | Topic to publish the notifications to                             |\n| `priority` | Priority to send the notifications with (optional, defaults to 3) |\n\n## Running `ek-scraper` regularly\n\n> It should be avoided to run the tool too often to avoid getting your IP address blocked by\n> [kleinanzeigen.de](kleinanzeigen.de)\n\nIn order to run `ek-scraper` regularly on a Unix-like system, configure it as a cronjob.\n\nTo configure a cronjob, run\n\n``` sh\ncrontab -e\n```\n\nEdit the crontab table to run the command you want to run. A handy tool to check schedule\nconfigurations for cronjobs is [crontab.guru](https://crontab.guru/).\n\nFor more information on configuring cronjobs use your favorite search engine.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Simple scraper for kleinanzeigen.de searches with notifications for new ads.",
    "version": "0.2.2",
    "project_urls": {
        "Homepage": "https://github.com/jonasehrlich/ek-scraper",
        "Repository": "https://github.com/jonasehrlich/ek-scraper"
    },
    "split_keywords": [
        "kleinanzeigen.de",
        "scraper"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "318b2de027e4eabf08f84d27b0353bcbd7492a2d8c850d0630a57748ddb79b20",
                "md5": "291b8bc06ec9c40c086d540f2807dcf4",
                "sha256": "9ae9df826183724ef9c5a5a3dff89354f351f8fb8686df232444a6a89d97438c"
            },
            "downloads": -1,
            "filename": "ek_scraper-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "291b8bc06ec9c40c086d540f2807dcf4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 16095,
            "upload_time": "2024-01-02T21:18:13",
            "upload_time_iso_8601": "2024-01-02T21:18:13.517712Z",
            "url": "https://files.pythonhosted.org/packages/31/8b/2de027e4eabf08f84d27b0353bcbd7492a2d8c850d0630a57748ddb79b20/ek_scraper-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "220a08a9693dbf5b3bae5063066d598c51787dcea546ae125e81bca76ede339b",
                "md5": "95288b15c80fdf512166d3247e49b62a",
                "sha256": "083f402069a3ec54933cd8801354ba6a4d941a9caa871a6bcfec65caf7b08eba"
            },
            "downloads": -1,
            "filename": "ek_scraper-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "95288b15c80fdf512166d3247e49b62a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 14367,
            "upload_time": "2024-01-02T21:18:15",
            "upload_time_iso_8601": "2024-01-02T21:18:15.483601Z",
            "url": "https://files.pythonhosted.org/packages/22/0a/08a9693dbf5b3bae5063066d598c51787dcea546ae125e81bca76ede339b/ek_scraper-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-02 21:18:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jonasehrlich",
    "github_project": "ek-scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "ek-scraper"
}
        
Elapsed time: 0.15936s