Name | crawlpnt JSON |
Version |
0.1.0
JSON |
| download |
home_page | None |
Summary | Precision Navigation Tool for dependency-free, AI-ready web crawling. |
upload_time | 2025-02-10 05:51:54 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | MIT |
keywords |
web-crawler
data-extraction
ai
opensource
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# CrawlPNT 🤖🕸️
**Precision Navigation Tool for Dependency-Free, AI-Ready Web Crawling**
*A deterministic, rules-based web crawler built for scalability and structured data extraction.*
[](https://opensource.org/licenses/MIT)
[](https://github.com/datapnt/crawlpnt/actions)
[](https://pypi.org/project/crawlpnt/)
---
## Features
- 🎯 **Rule-Based Crawling**: Define limits (`max_depth`, `max_pages`) and targets (`target_url`, `exclude_url`).
- 🚫 **Zero Dependencies**: Built entirely with Python’s standard library.
- 🤖 **Deterministic Behavior**: Configurable rules ensure predictable, repeatable crawls.
- 📂 **Structured Output**: Extract HTML content, URLs, or metadata into `.txt`, `.json`, or `.csv`.
---
## Quick Start
### Installation
```bash
pip install crawlpnt
```
### Basic Usage
```python
from crawlpnt import CrawlPNT
crawler = CrawlPNT(
entry_urls=["https://example.com"],
max_depth=2,
target_url=r"^https://example\.com/blog/",
politeness_delay=1.5,
)
crawler.run(output_dir="./data")
```
### Configuration
```yaml
# config.yml
entry_urls:
- https://example.com
max_depth: 3
exclude_url: \.pdf$
output_format: json
```
```bash
crawlpnt --config config.yml
```
---
## Why CrawlPNT?
| | Traditional Crawlers | CrawlPNT |
|----------------|-------------------------------|------------------------------------|
| **Dependencies** | Often require Scrapy/BS4 | **Zero third-party dependencies** |
| **Focus** | Broad coverage | **Precision targeting** |
| **Output** | Raw HTML | **Structured, AI-ready data** |
| **Complexity** | Steep learning curve | **Simple YAML/CLI configuration** |
---
## Contributing
We welcome contributions! Please read our:
- [Contributing Guide](CONTRIBUTING.md)
- [Code of Conduct](CODE_OF_CONDUCT.md)
Check out the [ROADMAP](ROADMAP.md) to see our future plans and the [CHANGELOG](CHANGELOG.md) for recent updates.
---
## License
MIT License © 2025 [DataPNT](https://datapnt.com).
*Free for open-source and commercial use.*
Raw data
{
"_id": null,
"home_page": null,
"name": "crawlpnt",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "web-crawler, data-extraction, ai, opensource",
"author": null,
"author_email": "andrew <andrew@datapnt.com>",
"download_url": "https://files.pythonhosted.org/packages/19/20/8b8d90f37596568f6b53931f3ca35183e7175d0f80efc8f013d665b31251/crawlpnt-0.1.0.tar.gz",
"platform": null,
"description": "# CrawlPNT \ud83e\udd16\ud83d\udd78\ufe0f\n\n**Precision Navigation Tool for Dependency-Free, AI-Ready Web Crawling** \n*A deterministic, rules-based web crawler built for scalability and structured data extraction.*\n\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/datapnt/crawlpnt/actions)\n[](https://pypi.org/project/crawlpnt/)\n\n---\n\n## Features\n\n- \ud83c\udfaf **Rule-Based Crawling**: Define limits (`max_depth`, `max_pages`) and targets (`target_url`, `exclude_url`).\n- \ud83d\udeab **Zero Dependencies**: Built entirely with Python\u2019s standard library.\n- \ud83e\udd16 **Deterministic Behavior**: Configurable rules ensure predictable, repeatable crawls.\n- \ud83d\udcc2 **Structured Output**: Extract HTML content, URLs, or metadata into `.txt`, `.json`, or `.csv`.\n\n---\n\n## Quick Start\n\n### Installation\n```bash\npip install crawlpnt\n```\n\n### Basic Usage\n```python\nfrom crawlpnt import CrawlPNT\n\ncrawler = CrawlPNT(\n entry_urls=[\"https://example.com\"],\n max_depth=2,\n target_url=r\"^https://example\\.com/blog/\",\n politeness_delay=1.5,\n)\n\ncrawler.run(output_dir=\"./data\")\n```\n\n### Configuration\n```yaml\n# config.yml\nentry_urls:\n - https://example.com\nmax_depth: 3\nexclude_url: \\.pdf$\noutput_format: json\n```\n\n```bash\ncrawlpnt --config config.yml\n```\n\n---\n\n## Why CrawlPNT?\n\n| | Traditional Crawlers | CrawlPNT |\n|----------------|-------------------------------|------------------------------------|\n| **Dependencies** | Often require Scrapy/BS4 | **Zero third-party dependencies** |\n| **Focus** | Broad coverage | **Precision targeting** |\n| **Output** | Raw HTML | **Structured, AI-ready data** |\n| **Complexity** | Steep learning curve | **Simple YAML/CLI configuration** |\n\n---\n\n## Contributing\n\nWe welcome contributions! Please read our:\n- [Contributing Guide](CONTRIBUTING.md)\n- [Code of Conduct](CODE_OF_CONDUCT.md)\n\nCheck out the [ROADMAP](ROADMAP.md) to see our future plans and the [CHANGELOG](CHANGELOG.md) for recent updates.\n\n---\n\n## License\n\nMIT License \u00a9 2025 [DataPNT](https://datapnt.com). \n*Free for open-source and commercial use.*\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Precision Navigation Tool for dependency-free, AI-ready web crawling.",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://github.com/datapnt/crawlpnt#readme",
"Homepage": "https://datapnt.com",
"Source": "https://github.com/datapnt/crawlpnt"
},
"split_keywords": [
"web-crawler",
" data-extraction",
" ai",
" opensource"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f2ac73c1896f3aa50537959184fa041d957eb212c271984bca32b3b4efbd3855",
"md5": "f45916d7e0c54ff795f4716dd4b9c66c",
"sha256": "19ec5dec007a3645d75e1ed0aaf83a0c9531e458799b696d523a0964aa62406e"
},
"downloads": -1,
"filename": "crawlpnt-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f45916d7e0c54ff795f4716dd4b9c66c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 5434,
"upload_time": "2025-02-10T05:51:53",
"upload_time_iso_8601": "2025-02-10T05:51:53.268060Z",
"url": "https://files.pythonhosted.org/packages/f2/ac/73c1896f3aa50537959184fa041d957eb212c271984bca32b3b4efbd3855/crawlpnt-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "19208b8d90f37596568f6b53931f3ca35183e7175d0f80efc8f013d665b31251",
"md5": "5f57471c03e8490d1c3ccff588041260",
"sha256": "9ba5be47a2fd1f1387ea8a764b357c3f234a58ac767891dcf1f2610b3bd0f488"
},
"downloads": -1,
"filename": "crawlpnt-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "5f57471c03e8490d1c3ccff588041260",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 5918,
"upload_time": "2025-02-10T05:51:54",
"upload_time_iso_8601": "2025-02-10T05:51:54.529220Z",
"url": "https://files.pythonhosted.org/packages/19/20/8b8d90f37596568f6b53931f3ca35183e7175d0f80efc8f013d665b31251/crawlpnt-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-10 05:51:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "datapnt",
"github_project": "crawlpnt#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "crawlpnt"
}