il-supermarket-scraper


Nameil-supermarket-scraper JSON
Version 0.5.4 PyPI version JSON
download
home_pagehttps://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers
Summarypython package that implement a scraping for israeli supermarket data
upload_time2024-12-12 20:06:16
maintainerNone
docs_urlNone
authorSefi Erlich
requires_pythonNone
licenseMIT
keywords israel israeli scraper supermarket
VCS
bugtrack_url
requirements retry mock requests lxml beautifulsoup4 pymongo pytz holidays cachetools pytest-playwright
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
Israel Supermarket Scraper: Clients to download the data published by the supermarkets.
=======================================
This is a scraper for ALL the supermarket chains listed in the GOV.IL site.

שקיפות מחירים (השוואת מחירים) - https://www.gov.il/he/departments/legalInfo/cpfta_prices_regulations




[![Unit & Integration Tests](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/test-suite.yml/badge.svg?event=push)](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/test-suite.yml)
[![CodeQL](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/codeql.yml/badge.svg)](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/codeql.yml)
[![Pylint](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/pylint.yml/badge.svg)](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/pylint.yml)
[![Publish Docker image](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/docker-publish.yml/badge.svg)](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/docker-publish.yml)
[![Upload Python Package](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/python-publish.yml/badge.svg)](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/python-publish.yml)

## 🤗 Want to support my work?
<p align="center">
    <a href="https://buymeacoffee.com/erlichsefi" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;">
    </a>
</p>

Daily Automatic Testing
----
The test suite is scheduled to run daily, so you can see if the supermarket chains have changed something in their interface and the package will not work properly.

Status: [![Scheduled Tests](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/test-suite.yml/badge.svg?event=schedule)](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/test-suite.yml)

Notice:
- Berekt and Quik are flaky! They will not fail the testing framework, but you can still use them.
- Some of the scrapers sites are blocked from being accessed from outside of Israel. 

--------

 

Got a question?
---------------

You can email me at erlichsefi@gmail.com

If you think you've found a bug:

- Create issue in [issue tracker](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/issues) to see if
  it's already been reported
- Please consider solving the issue by yourself and creating a pull request.

What is il_supermarket_scarper?
-------------

There are a lot of projects in GitHub trying to scrape the supermarket data, but most of them are not stable or haven't been updated for a while, it's about time there will be one codebase that does the work completely. 

You only need to run the following code to get all the data currently shared by the supermarkets.

```python
from il_supermarket_scarper import MainScrapperRunner

scraper = MainScrapperRunner()
scraper.run()
```


Please notice!
Since new files are constantly uploaded by the supermarket to their site, you will only get the current snapshot. In order to keep getting data, you will need to run this code more than one time to get the newly uploaded files.

Quick start
-----------

il_supermarket_scarper can be installed using pip:

    python3 pip install israeli-supermarket-scraper

If you want to run the latest version of the code, you can install it from the
repo directly:

    python3 -m pip install -U git+https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers.git
    # or if you don't have 'git' installed
    python3 -m pip install -U https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/main
    


Running Docker
-----------
The docker is designed to re-run against the same configuration, in every iteration the scraper will collect the files available to download and check if the file already exists before fetching it, either by scanning the dump folder, or checking the mongo/status files.


Build yourself:

    docker build -t erlichsefi/israeli-supermarket-scarpers --target prod .

or pull the existing image from docker hub:

    docker pull erlichsefi/israeli-supermarket-scarpers:latest


Then running it using:


    docker run  -v "./dumps:/usr/src/app/dumps" \
                -e ENABLED_SCRAPERS="BAREKET,YAYNO_BITAN" \   # see: il_supermarket_scarper/scrappers_factory.py
                -e ENABLED_FILE_TYPES="STORE_FILE" \          # see: il_supermarket_scarper/utils/file_types.py
                -e LIMIT=1 \                                  # number of files you would like to download (remove for unlimited)
                -e TODAY="2024-10-23 14:35" \                 # the date to download data from
                erlichsefi/israeli-supermarket-scarpers



Contributing
------------

Help in testing, development, documentation and other tasks is
highly appreciated and useful to the project. There are tasks for
contributors of all experience levels.

If you need help getting started, don't hesitate to contact me.


Development status
------------------

IL SuperMarket Scraper is beta software, as far as i see devlopment stoped until new issues will be found.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers",
    "name": "il-supermarket-scraper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "israel, israeli, scraper, supermarket",
    "author": "Sefi Erlich",
    "author_email": "erlichsefi@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/97/3d/d909e17962d1f3b933caa01eee34b4e78c81f22da811e815d05b1f79ebae/il_supermarket_scraper-0.5.4.tar.gz",
    "platform": null,
    "description": "\nIsrael Supermarket Scraper: Clients to download the data published by the supermarkets.\n=======================================\nThis is a scraper for ALL the supermarket chains listed in the GOV.IL site.\n\n\u05e9\u05e7\u05d9\u05e4\u05d5\u05ea \u05de\u05d7\u05d9\u05e8\u05d9\u05dd (\u05d4\u05e9\u05d5\u05d5\u05d0\u05ea \u05de\u05d7\u05d9\u05e8\u05d9\u05dd) - https://www.gov.il/he/departments/legalInfo/cpfta_prices_regulations\n\n\n\n\n[![Unit & Integration Tests](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/test-suite.yml/badge.svg?event=push)](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/test-suite.yml)\n[![CodeQL](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/codeql.yml/badge.svg)](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/codeql.yml)\n[![Pylint](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/pylint.yml/badge.svg)](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/pylint.yml)\n[![Publish Docker image](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/docker-publish.yml/badge.svg)](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/docker-publish.yml)\n[![Upload Python Package](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/python-publish.yml/badge.svg)](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/python-publish.yml)\n\n## \ud83e\udd17 Want to support my work?\n<p align=\"center\">\n    <a href=\"https://buymeacoffee.com/erlichsefi\" target=\"_blank\"><img src=\"https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png\" alt=\"Buy Me A Coffee\" style=\"height: 60px !important;width: 217px !important;\">\n    </a>\n</p>\n\nDaily Automatic Testing\n----\nThe test suite is scheduled to run daily, so you can see if the supermarket chains have changed something in their interface and the package will not work properly.\n\nStatus: [![Scheduled Tests](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/test-suite.yml/badge.svg?event=schedule)](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/actions/workflows/test-suite.yml)\n\nNotice:\n- Berekt and Quik are flaky! They will not fail the testing framework, but you can still use them.\n- Some of the scrapers sites are blocked from being accessed from outside of Israel. \n\n--------\n\n \n\nGot a question?\n---------------\n\nYou can email me at erlichsefi@gmail.com\n\nIf you think you've found a bug:\n\n- Create issue in [issue tracker](https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/issues) to see if\n  it's already been reported\n- Please consider solving the issue by yourself and creating a pull request.\n\nWhat is il_supermarket_scarper?\n-------------\n\nThere are a lot of projects in GitHub trying to scrape the supermarket data, but most of them are not stable or haven't been updated for a while, it's about time there will be one codebase that does the work completely. \n\nYou only need to run the following code to get all the data currently shared by the supermarkets.\n\n```python\nfrom il_supermarket_scarper import MainScrapperRunner\n\nscraper = MainScrapperRunner()\nscraper.run()\n```\n\n\nPlease notice!\nSince new files are constantly uploaded by the supermarket to their site, you will only get the current snapshot. In order to keep getting data, you will need to run this code more than one time to get the newly uploaded files.\n\nQuick start\n-----------\n\nil_supermarket_scarper can be installed using pip:\n\n    python3 pip install israeli-supermarket-scraper\n\nIf you want to run the latest version of the code, you can install it from the\nrepo directly:\n\n    python3 -m pip install -U git+https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers.git\n    # or if you don't have 'git' installed\n    python3 -m pip install -U https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers/main\n    \n\n\nRunning Docker\n-----------\nThe docker is designed to re-run against the same configuration, in every iteration the scraper will collect the files available to download and check if the file already exists before fetching it, either by scanning the dump folder, or checking the mongo/status files.\n\n\nBuild yourself:\n\n    docker build -t erlichsefi/israeli-supermarket-scarpers --target prod .\n\nor pull the existing image from docker hub:\n\n    docker pull erlichsefi/israeli-supermarket-scarpers:latest\n\n\nThen running it using:\n\n\n    docker run  -v \"./dumps:/usr/src/app/dumps\" \\\n                -e ENABLED_SCRAPERS=\"BAREKET,YAYNO_BITAN\" \\   # see: il_supermarket_scarper/scrappers_factory.py\n                -e ENABLED_FILE_TYPES=\"STORE_FILE\" \\          # see: il_supermarket_scarper/utils/file_types.py\n                -e LIMIT=1 \\                                  # number of files you would like to download (remove for unlimited)\n                -e TODAY=\"2024-10-23 14:35\" \\                 # the date to download data from\n                erlichsefi/israeli-supermarket-scarpers\n\n\n\nContributing\n------------\n\nHelp in testing, development, documentation and other tasks is\nhighly appreciated and useful to the project. There are tasks for\ncontributors of all experience levels.\n\nIf you need help getting started, don't hesitate to contact me.\n\n\nDevelopment status\n------------------\n\nIL SuperMarket Scraper is beta software, as far as i see devlopment stoped until new issues will be found.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "python package that implement a scraping for israeli supermarket data",
    "version": "0.5.4",
    "project_urls": {
        "Homepage": "https://github.com/OpenIsraeliSupermarkets/israeli-supermarket-scarpers"
    },
    "split_keywords": [
        "israel",
        " israeli",
        " scraper",
        " supermarket"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "91175e28b72809a190e661af7485a092c23a381a9044aee772d4e3693a07b1dc",
                "md5": "9a26313944397c88e450d707c7881166",
                "sha256": "3fc6696ea132d7d1ac4759e1cd7ad0459eaa41a6bd62ce2c979cb7dc64caad1d"
            },
            "downloads": -1,
            "filename": "il_supermarket_scraper-0.5.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9a26313944397c88e450d707c7881166",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 63377,
            "upload_time": "2024-12-12T20:06:12",
            "upload_time_iso_8601": "2024-12-12T20:06:12.143748Z",
            "url": "https://files.pythonhosted.org/packages/91/17/5e28b72809a190e661af7485a092c23a381a9044aee772d4e3693a07b1dc/il_supermarket_scraper-0.5.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "973dd909e17962d1f3b933caa01eee34b4e78c81f22da811e815d05b1f79ebae",
                "md5": "575cabab049de945bbffeb8158a837ad",
                "sha256": "a95f361d8fc0b8b2b430a792608ca9939a6a8cfb0f5d48e3b75d09d97f77deb3"
            },
            "downloads": -1,
            "filename": "il_supermarket_scraper-0.5.4.tar.gz",
            "has_sig": false,
            "md5_digest": "575cabab049de945bbffeb8158a837ad",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 42302,
            "upload_time": "2024-12-12T20:06:16",
            "upload_time_iso_8601": "2024-12-12T20:06:16.087877Z",
            "url": "https://files.pythonhosted.org/packages/97/3d/d909e17962d1f3b933caa01eee34b4e78c81f22da811e815d05b1f79ebae/il_supermarket_scraper-0.5.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-12 20:06:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "OpenIsraeliSupermarkets",
    "github_project": "israeli-supermarket-scarpers",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "retry",
            "specs": [
                [
                    "==",
                    "0.9.2"
                ]
            ]
        },
        {
            "name": "mock",
            "specs": [
                [
                    "==",
                    "4.0.3"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.32.2"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    "==",
                    "5.2.1"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    "==",
                    "4.10.0"
                ]
            ]
        },
        {
            "name": "pymongo",
            "specs": [
                [
                    "==",
                    "4.6.3"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2022.4"
                ]
            ]
        },
        {
            "name": "holidays",
            "specs": [
                [
                    "==",
                    "0.16"
                ]
            ]
        },
        {
            "name": "cachetools",
            "specs": [
                [
                    "==",
                    "5.2.0"
                ]
            ]
        },
        {
            "name": "pytest-playwright",
            "specs": [
                [
                    "==",
                    "0.4.4"
                ]
            ]
        }
    ],
    "lcname": "il-supermarket-scraper"
}
        
Elapsed time: 0.38846s