# robotsparse
![Pepy Total Downlods](https://img.shields.io/pepy/dt/robotsparse)<br>
A python package that enhances speed and simplicity of parsing robots files.
## Usage
Basic usage, such as getting robots contents:
```python
import robotsparse
#NOTE: The `find_url` parameter will redirect the url to the default robots location.
robots = robotsparse.getRobots("https://github.com/", find_url=True)
print(list(robots)) # output: ['user-agents']
```
The `user-agents` key will contain each user-agent found in the robots file contents along with information associated with them.<br>
Alternatively, we can assign the robots contents as an object, which allows faster accessability:
```python
import robotsparse
# This function returns a class.
robots = robotsparse.getRobotsObject("https://duckduckgo.com/", find_url=True)
assert isinstance(robots, object)
print(robots.allow) # Prints allowed locations
print(robots.disallow) # Prints disallowed locations
print(robots.crawl_delay) # Prints found crawl-delays
print(robots.robots) # This output is equivalent to the above example
```
### Additional Features
When parsing robots files, it sometimes may be useful to parse sitemap files:
```python
import robotsparse
sitemap = robotsparse.getSitemap("https://pypi.org/", find_url=True)
```
The above code contains a variable named `sitemap` which contains information that looks like this:
```python
[{"url": "", "lastModified": ""}]
```
Raw data
{
"_id": null,
"home_page": "https://github.com/xyzpw/robotsparse/",
"name": "robotsparse",
"maintainer": "xyzpw",
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "parsing, parser, robots, web-crawling, crawlers, crawling, sitemaps, sitemap",
"author": "xyzpw",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/9a/fc/d560faeb84d68802cc3ab4459a5353cedaadf021e9aee6ed08626936a577/robotsparse-1.0.tar.gz",
"platform": null,
"description": "# robotsparse\n![Pepy Total Downlods](https://img.shields.io/pepy/dt/robotsparse)<br>\nA python package that enhances speed and simplicity of parsing robots files.\n\n## Usage\nBasic usage, such as getting robots contents:\n```python\nimport robotsparse\n\n#NOTE: The `find_url` parameter will redirect the url to the default robots location.\nrobots = robotsparse.getRobots(\"https://github.com/\", find_url=True)\nprint(list(robots)) # output: ['user-agents']\n```\nThe `user-agents` key will contain each user-agent found in the robots file contents along with information associated with them.<br>\n\nAlternatively, we can assign the robots contents as an object, which allows faster accessability:\n```python\nimport robotsparse\n\n# This function returns a class.\nrobots = robotsparse.getRobotsObject(\"https://duckduckgo.com/\", find_url=True)\nassert isinstance(robots, object)\nprint(robots.allow) # Prints allowed locations\nprint(robots.disallow) # Prints disallowed locations\nprint(robots.crawl_delay) # Prints found crawl-delays\nprint(robots.robots) # This output is equivalent to the above example\n```\n\n### Additional Features\nWhen parsing robots files, it sometimes may be useful to parse sitemap files:\n```python\nimport robotsparse\nsitemap = robotsparse.getSitemap(\"https://pypi.org/\", find_url=True)\n```\nThe above code contains a variable named `sitemap` which contains information that looks like this:\n```python\n[{\"url\": \"\", \"lastModified\": \"\"}]\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A python package that enhances speed and simplicity of parsing robots files.",
"version": "1.0",
"project_urls": {
"Homepage": "https://github.com/xyzpw/robotsparse/"
},
"split_keywords": [
"parsing",
" parser",
" robots",
" web-crawling",
" crawlers",
" crawling",
" sitemaps",
" sitemap"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6d309ee2722e62100da6ac9f15fcbdb75d818aa06cdf2bc401e86a85e1e1275e",
"md5": "a40feb6f4ea4395b979ced91cc822402",
"sha256": "aad90a9604b8ca94f47e0a151f6352e356512c48dc52140245d7a8591996d736"
},
"downloads": -1,
"filename": "robotsparse-1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a40feb6f4ea4395b979ced91cc822402",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 5593,
"upload_time": "2024-05-07T22:32:35",
"upload_time_iso_8601": "2024-05-07T22:32:35.517399Z",
"url": "https://files.pythonhosted.org/packages/6d/30/9ee2722e62100da6ac9f15fcbdb75d818aa06cdf2bc401e86a85e1e1275e/robotsparse-1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9afcd560faeb84d68802cc3ab4459a5353cedaadf021e9aee6ed08626936a577",
"md5": "ccda89d76500ae098ca82b54d9468837",
"sha256": "2bed0da0873c055653e39cc67bbea96fb8c9de3d1e7c5ada77003d7b86615479"
},
"downloads": -1,
"filename": "robotsparse-1.0.tar.gz",
"has_sig": false,
"md5_digest": "ccda89d76500ae098ca82b54d9468837",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 4917,
"upload_time": "2024-05-07T22:32:36",
"upload_time_iso_8601": "2024-05-07T22:32:36.982193Z",
"url": "https://files.pythonhosted.org/packages/9a/fc/d560faeb84d68802cc3ab4459a5353cedaadf021e9aee6ed08626936a577/robotsparse-1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-07 22:32:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "xyzpw",
"github_project": "robotsparse",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "requests",
"specs": [
[
"==",
"2.*"
]
]
}
],
"lcname": "robotsparse"
}