crawlerdetect


Namecrawlerdetect JSON
Version 0.1.7 PyPI version JSON
download
home_pagehttps://github.com/moskrc/CrawlerDetect
SummaryCrawlerDetect is a Python class for detecting bots/crawlers/spiders via the user agent.
upload_time2023-08-17 10:02:24
maintainer
docs_urlNone
authorVitalii Shishorin
requires_python>=3.4, <4
licenseBSD
keywords crawler crawler detect crawler detector crawlerdetect python crawler detect
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## About CrawlerDetect

This is a Python wrapper for [CrawlerDetect](https://github.com/JayBizzle/Crawler-Detect) - the web crawler detection library
It helps to detect  bots/crawlers/spiders via the user agent and other HTTP-headers. Currently able to detect > 1,000's of bots/spiders/crawlers.

### Installation
Run `pip install crawlerdetect`

### Usage

#### Variant 1
```Python
from crawlerdetect import CrawlerDetect
crawler_detect = CrawlerDetect()
crawler_detect.isCrawler('Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)')
# true if crawler user agent detected
```

#### Variant 2
```Python
from crawlerdetect import CrawlerDetect
crawler_detect = CrawlerDetect(user_agent='Mozilla/5.0 (iPhone; CPU iPhone OS 7_1 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile (compatible; Yahoo Ad monitoring; https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html)')
crawler_detect.isCrawler()
# true if crawler user agent detected
```

#### Variant 3
```Python
from crawlerdetect import CrawlerDetect
crawler_detect = CrawlerDetect(headers={'DOCUMENT_ROOT': '/home/test/public_html', 'GATEWAY_INTERFACE': 'CGI/1.1', 'HTTP_ACCEPT': '*/*', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_CACHE_CONTROL': 'no-cache', 'HTTP_CONNECTION': 'Keep-Alive', 'HTTP_FROM': 'googlebot(at)googlebot.com', 'HTTP_HOST': 'www.test.com', 'HTTP_PRAGMA': 'no-cache', 'HTTP_USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36', 'PATH': '/bin:/usr/bin', 'QUERY_STRING': 'order=closingDate', 'REDIRECT_STATUS': '200', 'REMOTE_ADDR': '127.0.0.1', 'REMOTE_PORT': '3360', 'REQUEST_METHOD': 'GET', 'REQUEST_URI': '/?test=testing', 'SCRIPT_FILENAME': '/home/test/public_html/index.php', 'SCRIPT_NAME': '/index.php', 'SERVER_ADDR': '127.0.0.1', 'SERVER_ADMIN': 'webmaster@test.com', 'SERVER_NAME': 'www.test.com', 'SERVER_PORT': '80', 'SERVER_PROTOCOL': 'HTTP/1.1', 'SERVER_SIGNATURE': '', 'SERVER_SOFTWARE': 'Apache', 'UNIQUE_ID': 'Vx6MENRxerBUSDEQgFLAAAAAS', 'PHP_SELF': '/index.php', 'REQUEST_TIME_FLOAT': 1461619728.0705, 'REQUEST_TIME': 1461619728})
crawler_detect.isCrawler()
# true if crawler user agent detected
```
#### Output the name of the bot that matched (if any)
```Python
from crawlerdetect import CrawlerDetect
crawler_detect = CrawlerDetect()
crawler_detect.isCrawler('Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)')
# true if crawler user agent detected
crawler_detect.getMatches()
# Sosospider
```

### Contributing
If you find a bot/spider/crawler user agent that CrawlerDetect fails to detect, please submit a pull request with the regex pattern added to the array in `providers/crawlers.py` and add the failing user agent to `tests/crawlers.txt`.

Failing that, just create an issue with the user agent you have found, and we'll take it from there :)

### ES6 Library
To use this library with NodeJS or any ES6 application based, check out [es6-crawler-detect](https://github.com/JefferyHus/es6-crawler-detect).

### .NET Library
To use this library in a .net standard (including .net core) based project, check out [NetCrawlerDetect](https://github.com/gplumb/NetCrawlerDetect).

### Nette Extension
To use this library with the Nette framework, checkout [NetteCrawlerDetect](https://github.com/JanGalek/Crawler-Detect).

### Ruby Gem

To use this library with Ruby on Rails or any Ruby-based application, check out [crawler_detect](https://github.com/loadkpi/crawler_detect) gem.

_Parts of this class are based on the brilliant [MobileDetect](https://github.com/serbanghita/Mobile-Detect)_

[![Analytics](https://ga-beacon.appspot.com/UA-72430465-1/Crawler-Detect/readme?pixel)](https://github.com/JayBizzle/Crawler-Detect)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/moskrc/CrawlerDetect",
    "name": "crawlerdetect",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.4, <4",
    "maintainer_email": "",
    "keywords": "crawler,crawler detect,crawler detector,crawlerdetect,python crawler detect",
    "author": "Vitalii Shishorin",
    "author_email": "moskrc@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/0d/d8/60a3b7f2859a209430d16eb36b9bcc584828ead4d3bc440b8a1829c134e2/crawlerdetect-0.1.7.tar.gz",
    "platform": "any",
    "description": "## About CrawlerDetect\n\nThis is a Python wrapper for [CrawlerDetect](https://github.com/JayBizzle/Crawler-Detect) - the web crawler detection library\nIt helps to detect  bots/crawlers/spiders via the user agent and other HTTP-headers. Currently able to detect > 1,000's of bots/spiders/crawlers.\n\n### Installation\nRun `pip install crawlerdetect`\n\n### Usage\n\n#### Variant 1\n```Python\nfrom crawlerdetect import CrawlerDetect\ncrawler_detect = CrawlerDetect()\ncrawler_detect.isCrawler('Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)')\n# true if crawler user agent detected\n```\n\n#### Variant 2\n```Python\nfrom crawlerdetect import CrawlerDetect\ncrawler_detect = CrawlerDetect(user_agent='Mozilla/5.0 (iPhone; CPU iPhone OS 7_1 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile (compatible; Yahoo Ad monitoring; https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html)')\ncrawler_detect.isCrawler()\n# true if crawler user agent detected\n```\n\n#### Variant 3\n```Python\nfrom crawlerdetect import CrawlerDetect\ncrawler_detect = CrawlerDetect(headers={'DOCUMENT_ROOT': '/home/test/public_html', 'GATEWAY_INTERFACE': 'CGI/1.1', 'HTTP_ACCEPT': '*/*', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_CACHE_CONTROL': 'no-cache', 'HTTP_CONNECTION': 'Keep-Alive', 'HTTP_FROM': 'googlebot(at)googlebot.com', 'HTTP_HOST': 'www.test.com', 'HTTP_PRAGMA': 'no-cache', 'HTTP_USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36', 'PATH': '/bin:/usr/bin', 'QUERY_STRING': 'order=closingDate', 'REDIRECT_STATUS': '200', 'REMOTE_ADDR': '127.0.0.1', 'REMOTE_PORT': '3360', 'REQUEST_METHOD': 'GET', 'REQUEST_URI': '/?test=testing', 'SCRIPT_FILENAME': '/home/test/public_html/index.php', 'SCRIPT_NAME': '/index.php', 'SERVER_ADDR': '127.0.0.1', 'SERVER_ADMIN': 'webmaster@test.com', 'SERVER_NAME': 'www.test.com', 'SERVER_PORT': '80', 'SERVER_PROTOCOL': 'HTTP/1.1', 'SERVER_SIGNATURE': '', 'SERVER_SOFTWARE': 'Apache', 'UNIQUE_ID': 'Vx6MENRxerBUSDEQgFLAAAAAS', 'PHP_SELF': '/index.php', 'REQUEST_TIME_FLOAT': 1461619728.0705, 'REQUEST_TIME': 1461619728})\ncrawler_detect.isCrawler()\n# true if crawler user agent detected\n```\n#### Output the name of the bot that matched (if any)\n```Python\nfrom crawlerdetect import CrawlerDetect\ncrawler_detect = CrawlerDetect()\ncrawler_detect.isCrawler('Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)')\n# true if crawler user agent detected\ncrawler_detect.getMatches()\n# Sosospider\n```\n\n### Contributing\nIf you find a bot/spider/crawler user agent that CrawlerDetect fails to detect, please submit a pull request with the regex pattern added to the array in `providers/crawlers.py` and add the failing user agent to `tests/crawlers.txt`.\n\nFailing that, just create an issue with the user agent you have found, and we'll take it from there :)\n\n### ES6 Library\nTo use this library with NodeJS or any ES6 application based, check out [es6-crawler-detect](https://github.com/JefferyHus/es6-crawler-detect).\n\n### .NET Library\nTo use this library in a .net standard (including .net core) based project, check out [NetCrawlerDetect](https://github.com/gplumb/NetCrawlerDetect).\n\n### Nette Extension\nTo use this library with the Nette framework, checkout [NetteCrawlerDetect](https://github.com/JanGalek/Crawler-Detect).\n\n### Ruby Gem\n\nTo use this library with Ruby on Rails or any Ruby-based application, check out [crawler_detect](https://github.com/loadkpi/crawler_detect) gem.\n\n_Parts of this class are based on the brilliant [MobileDetect](https://github.com/serbanghita/Mobile-Detect)_\n\n[![Analytics](https://ga-beacon.appspot.com/UA-72430465-1/Crawler-Detect/readme?pixel)](https://github.com/JayBizzle/Crawler-Detect)\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "CrawlerDetect is a Python class for detecting bots/crawlers/spiders via the user agent.",
    "version": "0.1.7",
    "project_urls": {
        "Documentation": "https://crawlerdetect.readthedocs.io",
        "Download": "https://github.com/moskrc/CrawlerDetect/tarball/0.1.7",
        "Homepage": "https://github.com/moskrc/CrawlerDetect"
    },
    "split_keywords": [
        "crawler",
        "crawler detect",
        "crawler detector",
        "crawlerdetect",
        "python crawler detect"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d64096d88f551b675caed88b652b8df1e5d77199ce81acd0cd44ef0547810785",
                "md5": "639d6e6ac5f18f46a52c3b68a8dbfefc",
                "sha256": "cd7417f87105d508e5dee99ac1aed9f46b7f70019618c8974db29f103e4c2b33"
            },
            "downloads": -1,
            "filename": "crawlerdetect-0.1.7-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "639d6e6ac5f18f46a52c3b68a8dbfefc",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.4, <4",
            "size": 18298,
            "upload_time": "2023-08-17T10:02:22",
            "upload_time_iso_8601": "2023-08-17T10:02:22.594048Z",
            "url": "https://files.pythonhosted.org/packages/d6/40/96d88f551b675caed88b652b8df1e5d77199ce81acd0cd44ef0547810785/crawlerdetect-0.1.7-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0dd860a3b7f2859a209430d16eb36b9bcc584828ead4d3bc440b8a1829c134e2",
                "md5": "d30182dc1e7a68b231a13995f24059ba",
                "sha256": "28837b434250bc4647b8b1056ab9a5bdd7133072c029b59987a3faadadb21b04"
            },
            "downloads": -1,
            "filename": "crawlerdetect-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "d30182dc1e7a68b231a13995f24059ba",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.4, <4",
            "size": 20447,
            "upload_time": "2023-08-17T10:02:24",
            "upload_time_iso_8601": "2023-08-17T10:02:24.854324Z",
            "url": "https://files.pythonhosted.org/packages/0d/d8/60a3b7f2859a209430d16eb36b9bcc584828ead4d3bc440b8a1829c134e2/crawlerdetect-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-17 10:02:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "moskrc",
    "github_project": "CrawlerDetect",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "crawlerdetect"
}
        
Elapsed time: 0.15462s