alpha-filter


Namealpha-filter JSON
Version 0.9.2 PyPI version JSON
download
home_page
Summarydifferential filter
upload_time2023-05-29 21:53:50
maintainer
docs_urlNone
author
requires_python>=3.8
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Python 3.6](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-360/) _and greater_

# alpha-filter

When parsing, sometimes it is necessary to reduce the number of requests to the server, for example, our script collects links from pagination to the product every day, and then parses each product separately. But what to do if some time ago we already parsed these goods, why do it twice. alpha-filter will help filter out those ads that have already been read, and will return only new ones.
 
## Getting starting
```sh
pip install alpha-filter
```

### Usage

```python
from alphafilter import filter_ads, mark_as_processed, is_processed

>>> first_parsing_urls = ["https://www.example.com/1", "https://www.example.com/2"]
>>> new, old = filter_ads(first_parsing_urls)
>>> new
["https://www.example.com/1", "https://www.example.com/2"]
>>> old
[]

second_parsing_urls = first_parsing_urls # second parsing same with first

>>> new, old = filter_ads(second_parsing_urls)
>>> new
[]
>>> old
[]

>>>third_parsing_urls = ["https://www.example.com/2", "https://www.example.com/3"]

>>> new, old = filter_ads(third_parsing_urls)
>>> new
["https://www.example.com/3"]
>>> old
["https://www.example.com/1"]
```
Also you can mark your urls for some purposes


```python
>>> urls_for_mark = ["https://www.example.com/2", "https://www.example.com/3"]
>>> mark_as_processed(urls_for_mark)
>>> is_processed("https://www.example.com/2")
True
>>> is_processed("https://www.example.com/4")
False
```


It uses a fast sqlite database to store urls. The database file ('ads.db') will be created in the root directory

__Warning!!! this package has no protection against sql injection, do not use it for the external interface__



            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "alpha-filter",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/82/c3/f29fa2d774ec1e4b37c2ad4ea152f5d5594f42e30d81d6cf60f891d8a977/alpha-filter-0.9.2.tar.gz",
    "platform": null,
    "description": "[![Python 3.6](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-360/) _and greater_\n\n# alpha-filter\n\nWhen parsing, sometimes it is necessary to reduce the number of requests to the server, for example, our script collects links from pagination to the product every day, and then parses each product separately. But what to do if some time ago we already parsed these goods, why do it twice. alpha-filter will help filter out those ads that have already been read, and will return only new ones.\n \n## Getting starting\n```sh\npip install alpha-filter\n```\n\n### Usage\n\n```python\nfrom alphafilter import filter_ads, mark_as_processed, is_processed\n\n>>> first_parsing_urls = [\"https://www.example.com/1\", \"https://www.example.com/2\"]\n>>> new, old = filter_ads(first_parsing_urls)\n>>> new\n[\"https://www.example.com/1\", \"https://www.example.com/2\"]\n>>> old\n[]\n\nsecond_parsing_urls = first_parsing_urls # second parsing same with first\n\n>>> new, old = filter_ads(second_parsing_urls)\n>>> new\n[]\n>>> old\n[]\n\n>>>third_parsing_urls = [\"https://www.example.com/2\", \"https://www.example.com/3\"]\n\n>>> new, old = filter_ads(third_parsing_urls)\n>>> new\n[\"https://www.example.com/3\"]\n>>> old\n[\"https://www.example.com/1\"]\n```\nAlso you can mark your urls for some purposes\n\n\n```python\n>>> urls_for_mark = [\"https://www.example.com/2\", \"https://www.example.com/3\"]\n>>> mark_as_processed(urls_for_mark)\n>>> is_processed(\"https://www.example.com/2\")\nTrue\n>>> is_processed(\"https://www.example.com/4\")\nFalse\n```\n\n\nIt uses a fast sqlite database to store urls. The database file ('ads.db') will be created in the root directory\n\n__Warning!!! this package has no protection against sql injection, do not use it for the external interface__\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "differential filter",
    "version": "0.9.2",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "82c3f29fa2d774ec1e4b37c2ad4ea152f5d5594f42e30d81d6cf60f891d8a977",
                "md5": "ba628cd11ba6e645f1bd2d0f7855bf1e",
                "sha256": "8c3208d7451f459bfbb17eb8a7006f8286c1130eec1d2aae78c0d38dfa09a973"
            },
            "downloads": -1,
            "filename": "alpha-filter-0.9.2.tar.gz",
            "has_sig": false,
            "md5_digest": "ba628cd11ba6e645f1bd2d0f7855bf1e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 4016,
            "upload_time": "2023-05-29T21:53:50",
            "upload_time_iso_8601": "2023-05-29T21:53:50.277656Z",
            "url": "https://files.pythonhosted.org/packages/82/c3/f29fa2d774ec1e4b37c2ad4ea152f5d5594f42e30d81d6cf60f891d8a977/alpha-filter-0.9.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-29 21:53:50",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "alpha-filter"
}
        
Elapsed time: 0.07150s