# ek-scraper
Simple scraper for kleinanzeigen.de searches with notifications for new ads.
## Installation
Install this package from PyPi in a separate virtual environment using [`pipx`](https://github.com/pypa/pipx).
``` sh
pipx install ek-scraper
```
## Usage
> For the full usage check the `ek-scraper --help` command
Create a configuration file using
``` sh
ek-scraper create-config <path/to/config.json>
```
The example configuration file will look like this:
```json
{
"filter": {
"exclude_topads": true,
"exclude_patterns": []
},
"notifications": {
"pushover": {
"token": "<your-app-api-token>",
"user": "<your-user-api-token>",
"device": []
},
"ntfy.sh": {
"topic": "<your-private-topic>",
"priority": 3
},
},
"searches": [
{
"name": "Wohnungen in Hamburg Altona",
"url": "https://www.kleinanzeigen.de/s-wohnung-mieten/altona/c203l9497",
"recursive": true
}
]
}
```
See [Configuration](#configuration) for details on all configuration options.
* Configure one or more searches in the `searches` section of the configuration,
see [Searches](#searches) for more details
* Configure notifications in the `notifications` section of the configuration,
see [Notifications](#notifications) for details on notification configuration
* (Optional) Configure filters in the `filter` section of the configuration,
see [Filter](#filter) for more details
Run the following command to initialize the data store without sending any notifications:
``` sh
ek-scraper run --no-notifications path/to/config.json
```
Afterwards, run
```sh
ek-scraper run path/to/config.json
```
to receive notifications according to your `notifications` configuration.
## Development
Follow the steps below to set up a development environment for this project.
1. Clone this repository
``` sh
git clone git@github.com:jonasehrlich/ek-scraper.git
```
2. Change directory into the repository
``` sh
cd ek-scraper
```
3. Create a virtual environment using [poetry](https://python-poetry.org)
``` sh
poetry install
```
4. (Optional) Install pre-commit environment
``` sh
$ pre-commit
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/psf/black.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
Check Yaml...........................................(no files to check)Skipped
Fix End of Files.....................................(no files to check)Skipped
Trim Trailing Whitespace.............................(no files to check)Skipped
black................................................(no files to check)Skipped
```
## Configuration
### Searches
Searches can be configured in the `searches` array of the configuration file.
Each of the searches can be configured with the following parameters.
| Name | Description |
| ----------- | ---------------------------------------------------------------------------------- |
| `name` | Name of the search, use a descriptive one (required) |
| `url` | URL of the first page of your search (required) |
| `recursive` | Whether to follow all pages of the search result <br/>(optional, defaults to true) |
### Filter
Filters can be configured in the `filter` section of the configuration file to exclude specific ads
from your scrape results on the client side. The following settings can be configured.
| Name | Description |
| ---- | ----------- |
| `exclude_topads` | Whether to exclude top ads from the results (optional, defaults to true) |
| `exclude_patterns` | Case-insensitive regular expression patterns used to exclude ads (optional) |
### Notifications
Notifications can be configured in the `notifications` section of the configuration file.
#### Push notifications using [Pushover](https://pushover.net/)
![Screenshot of a push notification using Pushover](assets/pushover-notification.jpeg)
`ek-scraper` supports push notifications to your devices using [Pushover](https://pushover.net/).
For further information on the service check their terms and conditions.
The implementation for _Pushover_ notifications will send a single notification per search, if new
ads are discovered.
To configure _Pushover_ for notifications from the scraper, first register at the service and create
an application (e.g. `ek-scraper`). To use the service in `ek-scraper`, add the `pushover` object
to the `notifications` object in your configuration file and fill the API tokens. Selection of the
devices which will receive the notifications, is supported using the `device` array.
| Name | Description |
| -------- | ------------------------------------------------------------------------------------------- |
| `token` | API token of the Pushover app (required) |
| `user` | API token of the Pushover user (required) |
| `device` | List of device names to send the notifications to <br/> (optional, defaults to all devices) |
#### Push notifications using [ntfy.sh](https://ntfy.sh/)
![Screenshot of a push notification using ntfy.sh](assets/ntfy-sh-notification.jpeg)
`ek-scraper` supports push notifications to your devices using [ntfy.sh](https://ntfy.sh/).
For further information on the service check their terms and conditions.
The implementation for _ntfy.sh_ notifications will send a single notification per search, if new
ads are discovered.
To configure _ntfy.sh_ for notifications from the scraper, define a topic and subscribe to it in the
mobile app.
> Note that topic names are public, so it's wise to choose something that cannot be guessed easily.
> This can be done by including a UUID, e.g. by running the following command in your shell:
>
> ``` sh
> echo "ek-scraper-$(uuidgen)"
> ```
To use the service in `ek-scraper`, add the `ntfy.sh` object to the `notifications` object in your
configuration file and add the topic you previously subscribed to.
| Name | Description |
| ---------- | ----------------------------------------------------------------- |
| `topic` | Topic to publish the notifications to |
| `priority` | Priority to send the notifications with (optional, defaults to 3) |
## Running `ek-scraper` regularly
> It should be avoided to run the tool too often to avoid getting your IP address blocked by
> [kleinanzeigen.de](kleinanzeigen.de)
In order to run `ek-scraper` regularly on a Unix-like system, configure it as a cronjob.
To configure a cronjob, run
``` sh
crontab -e
```
Edit the crontab table to run the command you want to run. A handy tool to check schedule
configurations for cronjobs is [crontab.guru](https://crontab.guru/).
For more information on configuring cronjobs use your favorite search engine.
Raw data
{
"_id": null,
"home_page": "https://github.com/jonasehrlich/ek-scraper",
"name": "ek-scraper",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "kleinanzeigen.de,scraper",
"author": "Jonas Ehrlich",
"author_email": "jonas.ehrlich@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/22/0a/08a9693dbf5b3bae5063066d598c51787dcea546ae125e81bca76ede339b/ek_scraper-0.2.2.tar.gz",
"platform": null,
"description": "# ek-scraper\n\nSimple scraper for kleinanzeigen.de searches with notifications for new ads.\n\n## Installation\n\nInstall this package from PyPi in a separate virtual environment using [`pipx`](https://github.com/pypa/pipx).\n\n``` sh\npipx install ek-scraper\n```\n\n## Usage\n\n> For the full usage check the `ek-scraper --help` command\n\nCreate a configuration file using\n\n``` sh\nek-scraper create-config <path/to/config.json>\n```\n\nThe example configuration file will look like this:\n\n```json\n{\n \"filter\": {\n \"exclude_topads\": true,\n \"exclude_patterns\": []\n },\n \"notifications\": {\n \"pushover\": {\n \"token\": \"<your-app-api-token>\",\n \"user\": \"<your-user-api-token>\",\n \"device\": []\n },\n \"ntfy.sh\": {\n \"topic\": \"<your-private-topic>\",\n \"priority\": 3\n },\n },\n \"searches\": [\n {\n \"name\": \"Wohnungen in Hamburg Altona\",\n \"url\": \"https://www.kleinanzeigen.de/s-wohnung-mieten/altona/c203l9497\",\n \"recursive\": true\n }\n ]\n}\n```\n\nSee [Configuration](#configuration) for details on all configuration options.\n\n* Configure one or more searches in the `searches` section of the configuration,\n see [Searches](#searches) for more details\n* Configure notifications in the `notifications` section of the configuration,\n see [Notifications](#notifications) for details on notification configuration\n* (Optional) Configure filters in the `filter` section of the configuration,\n see [Filter](#filter) for more details\n\nRun the following command to initialize the data store without sending any notifications:\n\n``` sh\nek-scraper run --no-notifications path/to/config.json\n```\n\nAfterwards, run\n\n```sh\nek-scraper run path/to/config.json\n```\n\nto receive notifications according to your `notifications` configuration.\n\n## Development\n\nFollow the steps below to set up a development environment for this project.\n\n1. Clone this repository\n\n ``` sh\n git clone git@github.com:jonasehrlich/ek-scraper.git\n ```\n\n2. Change directory into the repository\n\n ``` sh\n cd ek-scraper\n ```\n\n3. Create a virtual environment using [poetry](https://python-poetry.org)\n\n ``` sh\n poetry install\n ```\n\n4. (Optional) Install pre-commit environment\n\n ``` sh\n $ pre-commit\n [INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.\n [INFO] Once installed this environment will be reused.\n [INFO] This may take a few minutes...\n [INFO] Installing environment for https://github.com/psf/black.\n [INFO] Once installed this environment will be reused.\n [INFO] This may take a few minutes...\n Check Yaml...........................................(no files to check)Skipped\n Fix End of Files.....................................(no files to check)Skipped\n Trim Trailing Whitespace.............................(no files to check)Skipped\n black................................................(no files to check)Skipped\n ```\n\n## Configuration\n\n### Searches\n\nSearches can be configured in the `searches` array of the configuration file.\nEach of the searches can be configured with the following parameters.\n\n| Name | Description |\n| ----------- | ---------------------------------------------------------------------------------- |\n| `name` | Name of the search, use a descriptive one (required) |\n| `url` | URL of the first page of your search (required) |\n| `recursive` | Whether to follow all pages of the search result <br/>(optional, defaults to true) |\n\n### Filter\n\nFilters can be configured in the `filter` section of the configuration file to exclude specific ads\nfrom your scrape results on the client side. The following settings can be configured.\n\n| Name | Description |\n| ---- | ----------- |\n| `exclude_topads` | Whether to exclude top ads from the results (optional, defaults to true) |\n| `exclude_patterns` | Case-insensitive regular expression patterns used to exclude ads (optional) |\n\n### Notifications\n\nNotifications can be configured in the `notifications` section of the configuration file.\n\n#### Push notifications using [Pushover](https://pushover.net/)\n\n![Screenshot of a push notification using Pushover](assets/pushover-notification.jpeg)\n\n`ek-scraper` supports push notifications to your devices using [Pushover](https://pushover.net/).\nFor further information on the service check their terms and conditions.\n\nThe implementation for _Pushover_ notifications will send a single notification per search, if new\nads are discovered.\n\nTo configure _Pushover_ for notifications from the scraper, first register at the service and create\nan application (e.g. `ek-scraper`). To use the service in `ek-scraper`, add the `pushover` object\nto the `notifications` object in your configuration file and fill the API tokens. Selection of the\ndevices which will receive the notifications, is supported using the `device` array.\n\n| Name | Description |\n| -------- | ------------------------------------------------------------------------------------------- |\n| `token` | API token of the Pushover app (required) |\n| `user` | API token of the Pushover user (required) |\n| `device` | List of device names to send the notifications to <br/> (optional, defaults to all devices) |\n\n#### Push notifications using [ntfy.sh](https://ntfy.sh/)\n\n![Screenshot of a push notification using ntfy.sh](assets/ntfy-sh-notification.jpeg)\n\n`ek-scraper` supports push notifications to your devices using [ntfy.sh](https://ntfy.sh/).\nFor further information on the service check their terms and conditions.\n\nThe implementation for _ntfy.sh_ notifications will send a single notification per search, if new\nads are discovered.\n\nTo configure _ntfy.sh_ for notifications from the scraper, define a topic and subscribe to it in the\nmobile app.\n\n> Note that topic names are public, so it's wise to choose something that cannot be guessed easily.\n> This can be done by including a UUID, e.g. by running the following command in your shell:\n>\n> ``` sh\n> echo \"ek-scraper-$(uuidgen)\"\n> ```\n\nTo use the service in `ek-scraper`, add the `ntfy.sh` object to the `notifications` object in your\nconfiguration file and add the topic you previously subscribed to.\n\n| Name | Description |\n| ---------- | ----------------------------------------------------------------- |\n| `topic` | Topic to publish the notifications to |\n| `priority` | Priority to send the notifications with (optional, defaults to 3) |\n\n## Running `ek-scraper` regularly\n\n> It should be avoided to run the tool too often to avoid getting your IP address blocked by\n> [kleinanzeigen.de](kleinanzeigen.de)\n\nIn order to run `ek-scraper` regularly on a Unix-like system, configure it as a cronjob.\n\nTo configure a cronjob, run\n\n``` sh\ncrontab -e\n```\n\nEdit the crontab table to run the command you want to run. A handy tool to check schedule\nconfigurations for cronjobs is [crontab.guru](https://crontab.guru/).\n\nFor more information on configuring cronjobs use your favorite search engine.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Simple scraper for kleinanzeigen.de searches with notifications for new ads.",
"version": "0.2.2",
"project_urls": {
"Homepage": "https://github.com/jonasehrlich/ek-scraper",
"Repository": "https://github.com/jonasehrlich/ek-scraper"
},
"split_keywords": [
"kleinanzeigen.de",
"scraper"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "318b2de027e4eabf08f84d27b0353bcbd7492a2d8c850d0630a57748ddb79b20",
"md5": "291b8bc06ec9c40c086d540f2807dcf4",
"sha256": "9ae9df826183724ef9c5a5a3dff89354f351f8fb8686df232444a6a89d97438c"
},
"downloads": -1,
"filename": "ek_scraper-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "291b8bc06ec9c40c086d540f2807dcf4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 16095,
"upload_time": "2024-01-02T21:18:13",
"upload_time_iso_8601": "2024-01-02T21:18:13.517712Z",
"url": "https://files.pythonhosted.org/packages/31/8b/2de027e4eabf08f84d27b0353bcbd7492a2d8c850d0630a57748ddb79b20/ek_scraper-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "220a08a9693dbf5b3bae5063066d598c51787dcea546ae125e81bca76ede339b",
"md5": "95288b15c80fdf512166d3247e49b62a",
"sha256": "083f402069a3ec54933cd8801354ba6a4d941a9caa871a6bcfec65caf7b08eba"
},
"downloads": -1,
"filename": "ek_scraper-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "95288b15c80fdf512166d3247e49b62a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 14367,
"upload_time": "2024-01-02T21:18:15",
"upload_time_iso_8601": "2024-01-02T21:18:15.483601Z",
"url": "https://files.pythonhosted.org/packages/22/0a/08a9693dbf5b3bae5063066d598c51787dcea546ae125e81bca76ede339b/ek_scraper-0.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-02 21:18:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jonasehrlich",
"github_project": "ek-scraper",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "ek-scraper"
}