crawlpnt

Name	crawlpnt JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	Precision Navigation Tool for dependency-free, AI-ready web crawling.
upload_time	2025-02-10 05:51:54
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	MIT
keywords	web-crawler data-extraction ai opensource
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # CrawlPNT 🤖🕸️

**Precision Navigation Tool for Dependency-Free, AI-Ready Web Crawling**  
*A deterministic, rules-based web crawler built for scalability and structured data extraction.*

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://github.com/datapnt/crawlpnt/actions/workflows/tests.yml/badge.svg)](https://github.com/datapnt/crawlpnt/actions)
[![PyPI Version](https://img.shields.io/pypi/v/crawlpnt)](https://pypi.org/project/crawlpnt/)

---

## Features

- 🎯 **Rule-Based Crawling**: Define limits (`max_depth`, `max_pages`) and targets (`target_url`, `exclude_url`).
- 🚫 **Zero Dependencies**: Built entirely with Python’s standard library.
- 🤖 **Deterministic Behavior**: Configurable rules ensure predictable, repeatable crawls.
- 📂 **Structured Output**: Extract HTML content, URLs, or metadata into `.txt`, `.json`, or `.csv`.

---

## Quick Start

### Installation
```bash
pip install crawlpnt
```

### Basic Usage
```python
from crawlpnt import CrawlPNT

crawler = CrawlPNT(
    entry_urls=["https://example.com"],
    max_depth=2,
    target_url=r"^https://example\.com/blog/",
    politeness_delay=1.5,
)

crawler.run(output_dir="./data")
```

### Configuration
```yaml
# config.yml
entry_urls:
  - https://example.com
max_depth: 3
exclude_url: \.pdf$
output_format: json
```

```bash
crawlpnt --config config.yml
```

---

## Why CrawlPNT?

|                | Traditional Crawlers          | CrawlPNT                          |
|----------------|-------------------------------|------------------------------------|
| **Dependencies** | Often require Scrapy/BS4      | **Zero third-party dependencies** |
| **Focus**      | Broad coverage                | **Precision targeting**           |
| **Output**     | Raw HTML                      | **Structured, AI-ready data**     |
| **Complexity** | Steep learning curve          | **Simple YAML/CLI configuration** |

---

## Contributing

We welcome contributions! Please read our:
- [Contributing Guide](CONTRIBUTING.md)
- [Code of Conduct](CODE_OF_CONDUCT.md)

Check out the [ROADMAP](ROADMAP.md) to see our future plans and the [CHANGELOG](CHANGELOG.md) for recent updates.

---

## License

MIT License © 2025 [DataPNT](https://datapnt.com).  
*Free for open-source and commercial use.*

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "crawlpnt",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "web-crawler, data-extraction, ai, opensource",
    "author": null,
    "author_email": "andrew <andrew@datapnt.com>",
    "download_url": "https://files.pythonhosted.org/packages/19/20/8b8d90f37596568f6b53931f3ca35183e7175d0f80efc8f013d665b31251/crawlpnt-0.1.0.tar.gz",
    "platform": null,
    "description": "# CrawlPNT \ud83e\udd16\ud83d\udd78\ufe0f\n\n**Precision Navigation Tool for Dependency-Free, AI-Ready Web Crawling**  \n*A deterministic, rules-based web crawler built for scalability and structured data extraction.*\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Tests](https://github.com/datapnt/crawlpnt/actions/workflows/tests.yml/badge.svg)](https://github.com/datapnt/crawlpnt/actions)\n[![PyPI Version](https://img.shields.io/pypi/v/crawlpnt)](https://pypi.org/project/crawlpnt/)\n\n---\n\n## Features\n\n- \ud83c\udfaf **Rule-Based Crawling**: Define limits (`max_depth`, `max_pages`) and targets (`target_url`, `exclude_url`).\n- \ud83d\udeab **Zero Dependencies**: Built entirely with Python\u2019s standard library.\n- \ud83e\udd16 **Deterministic Behavior**: Configurable rules ensure predictable, repeatable crawls.\n- \ud83d\udcc2 **Structured Output**: Extract HTML content, URLs, or metadata into `.txt`, `.json`, or `.csv`.\n\n---\n\n## Quick Start\n\n### Installation\n```bash\npip install crawlpnt\n```\n\n### Basic Usage\n```python\nfrom crawlpnt import CrawlPNT\n\ncrawler = CrawlPNT(\n    entry_urls=[\"https://example.com\"],\n    max_depth=2,\n    target_url=r\"^https://example\\.com/blog/\",\n    politeness_delay=1.5,\n)\n\ncrawler.run(output_dir=\"./data\")\n```\n\n### Configuration\n```yaml\n# config.yml\nentry_urls:\n  - https://example.com\nmax_depth: 3\nexclude_url: \\.pdf$\noutput_format: json\n```\n\n```bash\ncrawlpnt --config config.yml\n```\n\n---\n\n## Why CrawlPNT?\n\n|                | Traditional Crawlers          | CrawlPNT                          |\n|----------------|-------------------------------|------------------------------------|\n| **Dependencies** | Often require Scrapy/BS4      | **Zero third-party dependencies** |\n| **Focus**      | Broad coverage                | **Precision targeting**           |\n| **Output**     | Raw HTML                      | **Structured, AI-ready data**     |\n| **Complexity** | Steep learning curve          | **Simple YAML/CLI configuration** |\n\n---\n\n## Contributing\n\nWe welcome contributions! Please read our:\n- [Contributing Guide](CONTRIBUTING.md)\n- [Code of Conduct](CODE_OF_CONDUCT.md)\n\nCheck out the [ROADMAP](ROADMAP.md) to see our future plans and the [CHANGELOG](CHANGELOG.md) for recent updates.\n\n---\n\n## License\n\nMIT License \u00a9 2025 [DataPNT](https://datapnt.com).  \n*Free for open-source and commercial use.*\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Precision Navigation Tool for dependency-free, AI-ready web crawling.",
    "version": "0.1.0",
    "project_urls": {
        "Documentation": "https://github.com/datapnt/crawlpnt#readme",
        "Homepage": "https://datapnt.com",
        "Source": "https://github.com/datapnt/crawlpnt"
    },
    "split_keywords": [
        "web-crawler",
        " data-extraction",
        " ai",
        " opensource"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f2ac73c1896f3aa50537959184fa041d957eb212c271984bca32b3b4efbd3855",
                "md5": "f45916d7e0c54ff795f4716dd4b9c66c",
                "sha256": "19ec5dec007a3645d75e1ed0aaf83a0c9531e458799b696d523a0964aa62406e"
            },
            "downloads": -1,
            "filename": "crawlpnt-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f45916d7e0c54ff795f4716dd4b9c66c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 5434,
            "upload_time": "2025-02-10T05:51:53",
            "upload_time_iso_8601": "2025-02-10T05:51:53.268060Z",
            "url": "https://files.pythonhosted.org/packages/f2/ac/73c1896f3aa50537959184fa041d957eb212c271984bca32b3b4efbd3855/crawlpnt-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "19208b8d90f37596568f6b53931f3ca35183e7175d0f80efc8f013d665b31251",
                "md5": "5f57471c03e8490d1c3ccff588041260",
                "sha256": "9ba5be47a2fd1f1387ea8a764b357c3f234a58ac767891dcf1f2610b3bd0f488"
            },
            "downloads": -1,
            "filename": "crawlpnt-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "5f57471c03e8490d1c3ccff588041260",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 5918,
            "upload_time": "2025-02-10T05:51:54",
            "upload_time_iso_8601": "2025-02-10T05:51:54.529220Z",
            "url": "https://files.pythonhosted.org/packages/19/20/8b8d90f37596568f6b53931f3ca35183e7175d0f80efc8f013d665b31251/crawlpnt-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-10 05:51:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "datapnt",
    "github_project": "crawlpnt#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "crawlpnt"
}

None