Name | reachable JSON |
Version |
0.9.1
JSON |
| download |
home_page | None |
Summary | Check if a URL is reachable |
upload_time | 2025-08-02 21:04:40 |
maintainer | None |
docs_url | None |
author | Alex Mili |
requires_python | >=3.8 |
license | MIT License
Copyright (c) 2016
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE. |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
**Reachable** checks if a URL exists and is reachable.
# Features
- Use `HEAD`request instead of `GET` to save some bandwidth
- Follow redirects
- Handle local redirects (without full URL in `location` header)
- Record all the URLs of the redirection chain
- Check if redirected URL match the TLD of source URL
- Detect Cloudflare protection
- Avoid basic bot detectors
- Use randome Chrome user agent
- Wait between consecutive requests to the same host
- Include `Host` header
- Can use Playwright to make the request
- Use of HTTP/2
- Detect parking domains
# Installation
You can install it with pip :
```bash
pip install reachable
```
If you want to use playwright:
```bash
pip install reachable[playwright]
```
Or clone this repository and simply run :
```bash
cd reachable/
pip install -e .
```
# Usage
## Simple URL
```python
from reachable import is_reachable
result = is_reachable("https://google.com")
```
The output will look like this:
```json
{
"original_url": "https://google.com",
"final_url": "https://www.google.com/",
"response": null,
"status_code": 200,
"success": true,
"error_name": null,
"cloudflare_protection": false,
"redirect": {
"chain": ["https://www.google.com/"],
"final_url": "https://www.google.com/",
"tld_match": true
}
}
```
## Multiple URLs
```python
from reachable import is_reachable
result = is_reachable(["https://google.com", "http://bing.com"])
```
The output will look like this:
```json
[
{
"original_url": "https://google.com",
"final_url": "https://www.google.com/",
"response": null,
"status_code": 200,
"success": true,
"error_name": null,
"cloudflare_protection": false,
"redirect": {
"chain": ["https://www.google.com/"],
"final_url": "https://www.google.com/",
"tld_match": true
}
},
{
"original_url": "http://bing.com",
"final_url": "https://www.bing.com/?toWww=1&redig=16A78C94",
"response": null,
"status_code": 200,
"success": true,
"error_name": null,
"cloudflare_protection": false,
"redirect": {
"chain": ["https://www.bing.com:443/?toWww=1&redig=16A78C94"],
"final_url": "https://www.bing.com/?toWww=1&redig=16A78C94",
"tld_match": true
}
}
]
```
## Async
```python
import asyncio
from reachable import is_reachable_async
result = asyncio.run(is_reachable_async("https://google.com"))
```
or
```python
import asyncio
from reachable import is_reachable_async
urls = ["https://google.com", "https://bing.com"]
try:
loop = asyncio.get_running_loop()
except RuntimeError:
# No loop already exists so we crete one
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
result = loop.run_until_complete(asyncio.gather(*[is_reachable_async(url) for url in urls]))
finally:
loop.close()
```
### Handling high volumes with Taskpool
If you want to process a large number of URLs (> 500) you will quickly hit the limits of your hardware and/or OS because you can only open a defined number of active connections.
To bypass this problem you can use the `TaskPool` class. It uses Asyncio Semaphores to limit the number of asyncio threads running. It works by acquiring a lock when starting the worker and releasing it when done. It allows to always have a number of asyncio workers without overwhelming the OS.
```python
import asyncio
from reachable import is_reachable_async
from reachable.client import AsyncClient
from reachable.pool import TaskPool
urls = ["https://google.com", "https://bing.com"]
async def worker(url, client):
result = await is_reachable_async(url, client=client)
return result
async def workers_builder(urls, pool_size: int = 100):
async with AsyncClient() as client:
tasks = TaskPool(workers=pool_size)
for url in urls:
await tasks.put(worker(url, client=client))
await tasks.join()
return tasks._results
try:
loop = asyncio.get_running_loop()
except RuntimeError:
# No loop already exists so we crete one
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
result = loop.run_until_complete(workers_builder(urls))
print(result)
finally:
loop.close()
```
Raw data
{
"_id": null,
"home_page": null,
"name": "reachable",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Alex Mili",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/aa/88/289cf4e39e3c4dcedebf245edc8edff7ee1302eaeffa39a2389ff98cb68b/reachable-0.9.1.tar.gz",
"platform": null,
"description": "**Reachable** checks if a URL exists and is reachable.\n\n# Features\n- Use `HEAD`request instead of `GET` to save some bandwidth\n- Follow redirects\n- Handle local redirects (without full URL in `location` header)\n- Record all the URLs of the redirection chain\n- Check if redirected URL match the TLD of source URL\n- Detect Cloudflare protection\n- Avoid basic bot detectors\n - Use randome Chrome user agent\n - Wait between consecutive requests to the same host\n - Include `Host` header\n - Can use Playwright to make the request\n- Use of HTTP/2\n- Detect parking domains\n\n# Installation\nYou can install it with pip :\n```bash\npip install reachable\n```\nIf you want to use playwright:\n```bash\npip install reachable[playwright]\n```\nOr clone this repository and simply run :\n```bash\ncd reachable/\npip install -e .\n```\n\n# Usage\n\n## Simple URL\n```python\nfrom reachable import is_reachable\nresult = is_reachable(\"https://google.com\")\n```\n\nThe output will look like this:\n```json\n{\n \"original_url\": \"https://google.com\",\n \"final_url\": \"https://www.google.com/\",\n \"response\": null, \n \"status_code\": 200,\n \"success\": true,\n \"error_name\": null,\n \"cloudflare_protection\": false,\n \"redirect\": {\n \"chain\": [\"https://www.google.com/\"],\n \"final_url\": \"https://www.google.com/\",\n \"tld_match\": true\n }\n}\n```\n\n## Multiple URLs\n```python\nfrom reachable import is_reachable\nresult = is_reachable([\"https://google.com\", \"http://bing.com\"])\n```\n\nThe output will look like this:\n```json\n[\n {\n \"original_url\": \"https://google.com\",\n \"final_url\": \"https://www.google.com/\",\n \"response\": null, \n \"status_code\": 200,\n \"success\": true,\n \"error_name\": null,\n \"cloudflare_protection\": false,\n \"redirect\": {\n \"chain\": [\"https://www.google.com/\"],\n \"final_url\": \"https://www.google.com/\",\n \"tld_match\": true\n }\n },\n {\n \"original_url\": \"http://bing.com\",\n \"final_url\": \"https://www.bing.com/?toWww=1&redig=16A78C94\",\n \"response\": null,\n \"status_code\": 200,\n \"success\": true,\n \"error_name\": null,\n \"cloudflare_protection\": false,\n \"redirect\": {\n \"chain\": [\"https://www.bing.com:443/?toWww=1&redig=16A78C94\"],\n \"final_url\": \"https://www.bing.com/?toWww=1&redig=16A78C94\",\n \"tld_match\": true\n }\n }\n]\n```\n\n## Async\n```python\nimport asyncio\nfrom reachable import is_reachable_async\n\nresult = asyncio.run(is_reachable_async(\"https://google.com\"))\n```\nor\n```python\nimport asyncio\nfrom reachable import is_reachable_async\n\nurls = [\"https://google.com\", \"https://bing.com\"]\n\ntry:\n loop = asyncio.get_running_loop()\nexcept RuntimeError:\n # No loop already exists so we crete one\n loop = asyncio.new_event_loop()\n asyncio.set_event_loop(loop)\ntry:\n result = loop.run_until_complete(asyncio.gather(*[is_reachable_async(url) for url in urls]))\nfinally:\n loop.close()\n```\n\n### Handling high volumes with Taskpool\n\nIf you want to process a large number of URLs (> 500) you will quickly hit the limits of your hardware and/or OS because you can only open a defined number of active connections.\n\nTo bypass this problem you can use the `TaskPool` class. It uses Asyncio Semaphores to limit the number of asyncio threads running. It works by acquiring a lock when starting the worker and releasing it when done. It allows to always have a number of asyncio workers without overwhelming the OS.\n\n```python\nimport asyncio\n\nfrom reachable import is_reachable_async\nfrom reachable.client import AsyncClient\nfrom reachable.pool import TaskPool\n\n\nurls = [\"https://google.com\", \"https://bing.com\"]\n\n\nasync def worker(url, client):\n result = await is_reachable_async(url, client=client)\n return result\n\n\nasync def workers_builder(urls, pool_size: int = 100):\n async with AsyncClient() as client:\n tasks = TaskPool(workers=pool_size)\n\n for url in urls:\n await tasks.put(worker(url, client=client))\n\n await tasks.join()\n\n return tasks._results\n\n\ntry:\n loop = asyncio.get_running_loop()\nexcept RuntimeError:\n # No loop already exists so we crete one\n loop = asyncio.new_event_loop()\n asyncio.set_event_loop(loop)\n\ntry:\n result = loop.run_until_complete(workers_builder(urls))\n print(result)\nfinally:\n loop.close()\n\n```",
"bugtrack_url": null,
"license": "MIT License\n \n Copyright (c) 2016 \n \n Permission is hereby granted, free of charge, to any person obtaining a copy\n of this software and associated documentation files (the \"Software\"), to deal\n in the Software without restriction, including without limitation the rights\n to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the Software is\n furnished to do so, subject to the following conditions:\n \n The above copyright notice and this permission notice shall be included in all\n copies or substantial portions of the Software.\n \n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n SOFTWARE.",
"summary": "Check if a URL is reachable",
"version": "0.9.1",
"project_urls": {
"Documentation": "https://github.com/AlexMili/Reachable",
"Homepage": "https://github.com/AlexMili/Reachable",
"Issues": "https://github.com/AlexMili/Reachable/issues",
"Repository": "https://github.com/AlexMili/Reachable"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "495412103dbe877b9c84351c2aadbbb9ef46ae114a02f86a861f79020b62b66d",
"md5": "1bdff2d14dae33087374a66cc375f9ea",
"sha256": "989aa277d2a7e1daa642a61c01df127e0f0fa4ea139954b3ddf38580c2865a27"
},
"downloads": -1,
"filename": "reachable-0.9.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1bdff2d14dae33087374a66cc375f9ea",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 13663,
"upload_time": "2025-08-02T21:04:39",
"upload_time_iso_8601": "2025-08-02T21:04:39.212627Z",
"url": "https://files.pythonhosted.org/packages/49/54/12103dbe877b9c84351c2aadbbb9ef46ae114a02f86a861f79020b62b66d/reachable-0.9.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "aa88289cf4e39e3c4dcedebf245edc8edff7ee1302eaeffa39a2389ff98cb68b",
"md5": "c4bc1a27c000950850f0cefeedd2e020",
"sha256": "fe4493543108880e0db12293a567c169b47f90da56fa96a7664db9b42d72921a"
},
"downloads": -1,
"filename": "reachable-0.9.1.tar.gz",
"has_sig": false,
"md5_digest": "c4bc1a27c000950850f0cefeedd2e020",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 14904,
"upload_time": "2025-08-02T21:04:40",
"upload_time_iso_8601": "2025-08-02T21:04:40.406604Z",
"url": "https://files.pythonhosted.org/packages/aa/88/289cf4e39e3c4dcedebf245edc8edff7ee1302eaeffa39a2389ff98cb68b/reachable-0.9.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-02 21:04:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AlexMili",
"github_project": "Reachable",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "reachable"
}