robotsparse


Namerobotsparse JSON
Version 1.0 PyPI version JSON
download
home_pagehttps://github.com/xyzpw/robotsparse/
SummaryA python package that enhances speed and simplicity of parsing robots files.
upload_time2024-05-07 22:32:36
maintainerxyzpw
docs_urlNone
authorxyzpw
requires_pythonNone
licenseMIT
keywords parsing parser robots web-crawling crawlers crawling sitemaps sitemap
VCS
bugtrack_url
requirements requests
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # robotsparse
![Pepy Total Downlods](https://img.shields.io/pepy/dt/robotsparse)<br>
A python package that enhances speed and simplicity of parsing robots files.

## Usage
Basic usage, such as getting robots contents:
```python
import robotsparse

#NOTE: The `find_url` parameter will redirect the url to the default robots location.
robots = robotsparse.getRobots("https://github.com/", find_url=True)
print(list(robots)) # output: ['user-agents']
```
The `user-agents` key will contain each user-agent found in the robots file contents along with information associated with them.<br>

Alternatively, we can assign the robots contents as an object, which allows faster accessability:
```python
import robotsparse

# This function returns a class.
robots = robotsparse.getRobotsObject("https://duckduckgo.com/", find_url=True)
assert isinstance(robots, object)
print(robots.allow) # Prints allowed locations
print(robots.disallow) # Prints disallowed locations
print(robots.crawl_delay) # Prints found crawl-delays
print(robots.robots) # This output is equivalent to the above example
```

### Additional Features
When parsing robots files, it sometimes may be useful to parse sitemap files:
```python
import robotsparse
sitemap = robotsparse.getSitemap("https://pypi.org/", find_url=True)
```
The above code contains a variable named `sitemap` which contains information that looks like this:
```python
[{"url": "", "lastModified": ""}]
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/xyzpw/robotsparse/",
    "name": "robotsparse",
    "maintainer": "xyzpw",
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "parsing, parser, robots, web-crawling, crawlers, crawling, sitemaps, sitemap",
    "author": "xyzpw",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/9a/fc/d560faeb84d68802cc3ab4459a5353cedaadf021e9aee6ed08626936a577/robotsparse-1.0.tar.gz",
    "platform": null,
    "description": "# robotsparse\n![Pepy Total Downlods](https://img.shields.io/pepy/dt/robotsparse)<br>\nA python package that enhances speed and simplicity of parsing robots files.\n\n## Usage\nBasic usage, such as getting robots contents:\n```python\nimport robotsparse\n\n#NOTE: The `find_url` parameter will redirect the url to the default robots location.\nrobots = robotsparse.getRobots(\"https://github.com/\", find_url=True)\nprint(list(robots)) # output: ['user-agents']\n```\nThe `user-agents` key will contain each user-agent found in the robots file contents along with information associated with them.<br>\n\nAlternatively, we can assign the robots contents as an object, which allows faster accessability:\n```python\nimport robotsparse\n\n# This function returns a class.\nrobots = robotsparse.getRobotsObject(\"https://duckduckgo.com/\", find_url=True)\nassert isinstance(robots, object)\nprint(robots.allow) # Prints allowed locations\nprint(robots.disallow) # Prints disallowed locations\nprint(robots.crawl_delay) # Prints found crawl-delays\nprint(robots.robots) # This output is equivalent to the above example\n```\n\n### Additional Features\nWhen parsing robots files, it sometimes may be useful to parse sitemap files:\n```python\nimport robotsparse\nsitemap = robotsparse.getSitemap(\"https://pypi.org/\", find_url=True)\n```\nThe above code contains a variable named `sitemap` which contains information that looks like this:\n```python\n[{\"url\": \"\", \"lastModified\": \"\"}]\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A python package that enhances speed and simplicity of parsing robots files.",
    "version": "1.0",
    "project_urls": {
        "Homepage": "https://github.com/xyzpw/robotsparse/"
    },
    "split_keywords": [
        "parsing",
        " parser",
        " robots",
        " web-crawling",
        " crawlers",
        " crawling",
        " sitemaps",
        " sitemap"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6d309ee2722e62100da6ac9f15fcbdb75d818aa06cdf2bc401e86a85e1e1275e",
                "md5": "a40feb6f4ea4395b979ced91cc822402",
                "sha256": "aad90a9604b8ca94f47e0a151f6352e356512c48dc52140245d7a8591996d736"
            },
            "downloads": -1,
            "filename": "robotsparse-1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a40feb6f4ea4395b979ced91cc822402",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 5593,
            "upload_time": "2024-05-07T22:32:35",
            "upload_time_iso_8601": "2024-05-07T22:32:35.517399Z",
            "url": "https://files.pythonhosted.org/packages/6d/30/9ee2722e62100da6ac9f15fcbdb75d818aa06cdf2bc401e86a85e1e1275e/robotsparse-1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9afcd560faeb84d68802cc3ab4459a5353cedaadf021e9aee6ed08626936a577",
                "md5": "ccda89d76500ae098ca82b54d9468837",
                "sha256": "2bed0da0873c055653e39cc67bbea96fb8c9de3d1e7c5ada77003d7b86615479"
            },
            "downloads": -1,
            "filename": "robotsparse-1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ccda89d76500ae098ca82b54d9468837",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 4917,
            "upload_time": "2024-05-07T22:32:36",
            "upload_time_iso_8601": "2024-05-07T22:32:36.982193Z",
            "url": "https://files.pythonhosted.org/packages/9a/fc/d560faeb84d68802cc3ab4459a5353cedaadf021e9aee6ed08626936a577/robotsparse-1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-07 22:32:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "xyzpw",
    "github_project": "robotsparse",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.*"
                ]
            ]
        }
    ],
    "lcname": "robotsparse"
}
        
Elapsed time: 0.26267s