crawailer


Namecrawailer JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryModern Python library for browser automation and intelligent content extraction with full JavaScript execution support
upload_time2025-09-19 04:46:38
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseMIT
keywords ai angular-scraping automation browser-control content-extraction javascript-execution llm mcp playwright react-scraping spa-crawling vue-scraping web-automation web-scraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # πŸ•·οΈ Crawailer

**The web scraper that doesn't suck at JavaScript** ✨

> **Stop fighting modern websites.** While `requests` gives you empty `<div id="root"></div>`, Crawailer actually executes JavaScript and extracts real content from React, Vue, and Angular apps. Finally, web scraping that works in 2025.

> ⚑ **Claude Code's new best friend** - Your AI assistant can now access ANY website

```python
pip install crawailer
```

[![PyPI version](https://badge.fury.io/py/crawailer.svg)](https://badge.fury.io/py/crawailer)
[![Python Support](https://img.shields.io/pypi/pyversions/crawailer.svg)](https://pypi.org/project/crawailer/)

## ✨ Why Developers Choose Crawailer

**πŸ”₯ JavaScript That Actually Works**  
While other tools timeout or crash, Crawailer executes real JavaScript like a human browser

**⚑ Stupidly Fast**  
5-10x faster than BeautifulSoup with C-based parsing that doesn't make you wait

**πŸ€– AI Assistant Ready**  
Perfect markdown output that your Claude/GPT/local model will love

**🎯 Zero Learning Curve**  
`pip install` β†’ works immediately β†’ no 47-page configuration guides

**πŸ§ͺ Production Battle-Tested**  
18 comprehensive test suites covering every edge case we could think of

**🎨 Actually Enjoyable**  
Rich terminal output, helpful errors, progress bars that don't lie

## πŸš€ Quick Start

> *(Honestly, you probably don't need to read these examples - just ask your AI assistant to figure it out. That's what models are for! But here they are anyway...)*

### 🎬 See It In Action

**Basic Usage Demo** - Crawailer vs requests:
```bash
# View the demo locally
asciinema play demos/basic-usage.cast
```

**Claude Code Integration** - Give your AI web superpowers:
```bash
# View the Claude integration demo  
asciinema play demos/claude-integration.cast
```

*Don't have asciinema? `pip install asciinema` or run the demos yourself:*
```bash
# Clone the repo and run demos interactively
git clone https://git.supported.systems/MCP/crawailer.git
cd crawailer
python demo_basic_usage.py
python demo_claude_integration.py
```

```python
import crawailer as web

# Simple content extraction
content = await web.get("https://example.com")
print(content.markdown)  # Clean, LLM-ready markdown
print(content.text)      # Human-readable text
print(content.title)     # Extracted title

# JavaScript execution for dynamic content
content = await web.get(
    "https://spa-app.com",
    script="document.querySelector('.dynamic-price').textContent"
)
print(f"Price: {content.script_result}")

# Batch processing with JavaScript
results = await web.get_many(
    ["url1", "url2", "url3"],
    script="document.title + ' | ' + document.querySelector('.description')?.textContent"
)
for result in results:
    print(f"{result.title}: {result.script_result}")

# Smart discovery with interaction
research = await web.discover(
    "AI safety papers", 
    script="document.querySelector('.show-more')?.click()",
    max_pages=10
)
# Returns the most relevant content with enhanced extraction

# Compare: Traditional scraping fails on modern sites
# requests.get("https://react-app.com") β†’ Empty <div id="root"></div>
# Crawailer β†’ Full content + dynamic data
```

### 🧠 Claude Code MCP Integration

> *"Hey Claude, go grab that data from the React app"* ← This actually works now

```python
# Add to your Claude Code MCP server
from crawailer.mcp import create_mcp_server

@mcp_tool("web_extract")
async def extract_content(url: str, script: str = ""):
    """Extract content from any website with optional JavaScript execution"""
    content = await web.get(url, script=script)
    return {
        "title": content.title,
        "markdown": content.markdown,
        "script_result": content.script_result,
        "word_count": content.word_count
    }

# πŸŽ‰ No more "I can't access that site" 
# πŸŽ‰ No more copy-pasting content manually
# πŸŽ‰ Your AI can now browse the web like a human
```

## 🎯 Design Philosophy

### For Robots, By Humans
- **Predictive**: Anticipates what you need and provides it
- **Forgiving**: Handles errors gracefully with helpful suggestions  
- **Efficient**: Fast by default, with smart caching and concurrency
- **Composable**: Small, focused functions that work well together

### Perfect for AI Workflows
- **LLM-Optimized**: Clean markdown, structured data, semantic chunking
- **Context-Aware**: Extracts relationships and metadata automatically
- **Quality-Focused**: Built-in content quality assessment
- **Archive-Ready**: Designed for long-term storage and retrieval

## πŸ“– Use Cases

### πŸ€– AI Agents & LLM Applications
**Problem**: Training data scattered across JavaScript-heavy academic sites
```python
# Research assistant workflow with JavaScript interaction
research = await web.discover(
    "quantum computing breakthroughs",
    script="document.querySelector('.show-abstract')?.click(); return document.querySelector('.full-text')?.textContent"
)
for paper in research:
    # Rich content includes JavaScript-extracted data
    summary = await llm.summarize(paper.markdown)
    dynamic_content = paper.script_result  # JavaScript execution result
    insights = await llm.extract_insights(paper.content + dynamic_content)
```

### πŸ›’ E-commerce Price Monitoring
**Problem**: Product prices loaded via AJAX, `requests` sees loading spinners
```python
# Monitor competitor pricing with dynamic content
products = await web.get_many(
    competitor_urls,
    script="return {price: document.querySelector('.price')?.textContent, stock: document.querySelector('.inventory')?.textContent}"
)
for product in products:
    if product.script_result['price'] != cached_price:
        await alert_price_change(product.url, product.script_result)
```

### πŸ”— MCP Servers
**Problem**: Claude needs reliable web content extraction tools
```python
# Easy MCP integration (with crawailer[mcp])
from crawailer.mcp import create_mcp_server

server = create_mcp_server()
# Automatically exposes web.get, web.discover, etc. as MCP tools
```

### πŸ“Š Social Media & Content Analysis
**Problem**: Posts and comments load infinitely via JavaScript
```python
# Extract social media discussions with infinite scroll
content = await web.get(
    "https://social-platform.com/topic/ai-safety",
    script="window.scrollTo(0, document.body.scrollHeight); return document.querySelectorAll('.post').length"
)
# Gets full thread content, not just initial page load
```

## πŸ› οΈ Installation

```bash
# Basic installation
pip install crawailer

# With MCP server capabilities  
pip install crawailer[mcp]

# Everything
pip install crawailer[all]

# Post-install setup (installs Playwright browsers)
crawailer setup
```

## πŸ—οΈ Architecture

Crawailer is built on modern, focused libraries:

- **🎭 Playwright**: Reliable browser automation
- **⚑ selectolax**: 5-10x faster HTML parsing (C-based)
- **πŸ“ markdownify**: Clean HTMLβ†’Markdown conversion
- **🧹 justext**: Intelligent content extraction and cleaning
- **πŸ”„ httpx**: Modern async HTTP client

## πŸ§ͺ Battle-Tested Quality

Crawailer includes **18 comprehensive test suites** with real-world scenarios:

- **Modern Frameworks**: React, Vue, Angular demos with full JavaScript APIs
- **Mobile Compatibility**: Safari iOS, Chrome Android, responsive designs
- **Production Edge Cases**: Network failures, memory pressure, browser differences
- **Performance Testing**: Stress tests, concurrency, resource management

**Want to contribute?** We welcome PRs with new test scenarios! Our test sites library shows exactly how different frameworks should behave with JavaScript execution.

> πŸ“ **Future TODO**: Move examples to dedicated repository for community contributions

## 🀝 Perfect for MCP Projects

MCP servers love Crawailer because it provides:

- **Focused tools**: Each function does one thing well
- **Rich outputs**: Structured data ready for LLM consumption  
- **Smart defaults**: Works out of the box with minimal configuration
- **Extensible**: Easy to add domain-specific extraction logic

```python
# Example MCP server tool
@mcp_tool("web_research")
async def research_topic(topic: str, depth: str = "comprehensive"):
    results = await web.discover(topic, max_pages=20)
    return {
        "sources": len(results),
        "content": [r.summary for r in results],
        "insights": await analyze_patterns(results)
    }
```

## πŸ₯Š Crawailer vs Traditional Tools

| Challenge | `requests` & HTTP libs | Selenium | **Crawailer** |
|-----------|------------------------|----------|---------------|
| **React/Vue/Angular** | ❌ Empty templates | 🟑 Slow, complex setup | βœ… **Just works** |
| **Dynamic Pricing** | ❌ Shows loading spinner | 🟑 Requires waits/timeouts | βœ… **Intelligent waiting** |
| **JavaScript APIs** | ❌ No access | 🟑 Clunky WebDriver calls | βœ… **Native page.evaluate()** |
| **Speed** | 🟒 100-500ms | ❌ 5-15 seconds | βœ… **2-5 seconds** |
| **Memory** | 🟒 1-5MB | ❌ 200-500MB | 🟑 **100-200MB** |
| **AI-Ready Output** | ❌ Raw HTML | ❌ Raw HTML | βœ… **Clean Markdown** |
| **Developer Experience** | 🟑 Manual parsing | ❌ Complex WebDriver | βœ… **Intuitive API** |

> **The bottom line**: When JavaScript matters, Crawailer delivers. When it doesn't, use `requests`.
> 
> πŸ“– **[See complete tool comparison β†’](docs/COMPARISON.md)** (includes Scrapy, Playwright, BeautifulSoup, and more)

## πŸŽ‰ What Makes It Delightful

### JavaScript-Powered Intelligence
```python
# Dynamic content extraction from SPAs
content = await web.get(
    "https://react-app.com",
    script="window.testData?.framework + ' v' + window.React?.version"
)
# Automatically detects: React application with version info
# Extracts: Dynamic content + framework details

# E-commerce with JavaScript-loaded prices
product = await web.get(
    "https://shop.com/product",
    script="document.querySelector('.dynamic-price')?.textContent",
    wait_for=".price-loaded"
) 
# Recognizes product page with dynamic pricing
# Extracts: Real-time price, reviews, availability, specs
```

### Beautiful Output
```
✨ Found 15 high-quality sources
πŸ“Š Sources: 4 arxiv, 3 journals, 2 conferences, 6 blogs  
πŸ“… Date range: 2023-2024 (recent research)
⚑ Average quality score: 8.7/10
πŸ” Key topics: transformers, safety, alignment
```

### Helpful Errors
```python
try:
    content = await web.get("problematic-site.com")
except web.CloudflareProtected:
    # "πŸ’‘ Try: await web.get(url, stealth=True)"
except web.PaywallDetected as e:
    # "πŸ” Found archived version: {e.archive_url}"
```

## πŸ“š Documentation

- **[Tool Comparison](docs/COMPARISON.md)**: How Crawailer compares to Scrapy, Selenium, BeautifulSoup, etc.
- **[Getting Started](docs/getting-started.md)**: Installation and first steps
- **[JavaScript API](docs/JAVASCRIPT_API.md)**: Complete JavaScript execution guide
- **[API Reference](docs/API_REFERENCE.md)**: Complete function documentation  
- **[Benchmarks](docs/BENCHMARKS.md)**: Performance comparison with other tools
- **[MCP Integration](docs/mcp.md)**: Building MCP servers with Crawailer
- **[Examples](examples/)**: Real-world usage patterns
- **[Architecture](docs/architecture.md)**: How Crawailer works internally

## 🀝 Contributing

We love contributions! Crawailer is designed to be:
- **Easy to extend**: Add new content extractors and browser capabilities
- **Well-tested**: Comprehensive test suite with real websites
- **Documented**: Every feature has examples and use cases

See [CONTRIBUTING.md](CONTRIBUTING.md) for details.

## πŸ“„ License

MIT License - see [LICENSE](LICENSE) for details.

---

## πŸš€ Ready to Stop Losing Your Mind?

```bash
pip install crawailer
crawailer setup  # Install browser engines
```

**Life's too short** for empty `<div>` tags and "JavaScript required" messages.

Get content that actually exists. From websites that actually work.

⭐ **Star us if this saves your sanity** β†’ [git.supported.systems/MCP/crawailer](https://git.supported.systems/MCP/crawailer)

---

**Built with ❀️ for the age of AI agents and automation**

*Crawailer: Because robots deserve delightful web experiences too* πŸ€–βœ¨
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "crawailer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "rpm <hello@crawailer.dev>",
    "keywords": "ai, angular-scraping, automation, browser-control, content-extraction, javascript-execution, llm, mcp, playwright, react-scraping, spa-crawling, vue-scraping, web-automation, web-scraping",
    "author": null,
    "author_email": "rpm <hello@crawailer.dev>",
    "download_url": "https://files.pythonhosted.org/packages/24/0f/40d9e8323548612bbf523b4d0e8abb841dce369a609268253c1b981ee7a7/crawailer-0.1.2.tar.gz",
    "platform": null,
    "description": "# \ud83d\udd77\ufe0f Crawailer\n\n**The web scraper that doesn't suck at JavaScript** \u2728\n\n> **Stop fighting modern websites.** While `requests` gives you empty `<div id=\"root\"></div>`, Crawailer actually executes JavaScript and extracts real content from React, Vue, and Angular apps. Finally, web scraping that works in 2025.\n\n> \u26a1 **Claude Code's new best friend** - Your AI assistant can now access ANY website\n\n```python\npip install crawailer\n```\n\n[![PyPI version](https://badge.fury.io/py/crawailer.svg)](https://badge.fury.io/py/crawailer)\n[![Python Support](https://img.shields.io/pypi/pyversions/crawailer.svg)](https://pypi.org/project/crawailer/)\n\n## \u2728 Why Developers Choose Crawailer\n\n**\ud83d\udd25 JavaScript That Actually Works**  \nWhile other tools timeout or crash, Crawailer executes real JavaScript like a human browser\n\n**\u26a1 Stupidly Fast**  \n5-10x faster than BeautifulSoup with C-based parsing that doesn't make you wait\n\n**\ud83e\udd16 AI Assistant Ready**  \nPerfect markdown output that your Claude/GPT/local model will love\n\n**\ud83c\udfaf Zero Learning Curve**  \n`pip install` \u2192 works immediately \u2192 no 47-page configuration guides\n\n**\ud83e\uddea Production Battle-Tested**  \n18 comprehensive test suites covering every edge case we could think of\n\n**\ud83c\udfa8 Actually Enjoyable**  \nRich terminal output, helpful errors, progress bars that don't lie\n\n## \ud83d\ude80 Quick Start\n\n> *(Honestly, you probably don't need to read these examples - just ask your AI assistant to figure it out. That's what models are for! But here they are anyway...)*\n\n### \ud83c\udfac See It In Action\n\n**Basic Usage Demo** - Crawailer vs requests:\n```bash\n# View the demo locally\nasciinema play demos/basic-usage.cast\n```\n\n**Claude Code Integration** - Give your AI web superpowers:\n```bash\n# View the Claude integration demo  \nasciinema play demos/claude-integration.cast\n```\n\n*Don't have asciinema? `pip install asciinema` or run the demos yourself:*\n```bash\n# Clone the repo and run demos interactively\ngit clone https://git.supported.systems/MCP/crawailer.git\ncd crawailer\npython demo_basic_usage.py\npython demo_claude_integration.py\n```\n\n```python\nimport crawailer as web\n\n# Simple content extraction\ncontent = await web.get(\"https://example.com\")\nprint(content.markdown)  # Clean, LLM-ready markdown\nprint(content.text)      # Human-readable text\nprint(content.title)     # Extracted title\n\n# JavaScript execution for dynamic content\ncontent = await web.get(\n    \"https://spa-app.com\",\n    script=\"document.querySelector('.dynamic-price').textContent\"\n)\nprint(f\"Price: {content.script_result}\")\n\n# Batch processing with JavaScript\nresults = await web.get_many(\n    [\"url1\", \"url2\", \"url3\"],\n    script=\"document.title + ' | ' + document.querySelector('.description')?.textContent\"\n)\nfor result in results:\n    print(f\"{result.title}: {result.script_result}\")\n\n# Smart discovery with interaction\nresearch = await web.discover(\n    \"AI safety papers\", \n    script=\"document.querySelector('.show-more')?.click()\",\n    max_pages=10\n)\n# Returns the most relevant content with enhanced extraction\n\n# Compare: Traditional scraping fails on modern sites\n# requests.get(\"https://react-app.com\") \u2192 Empty <div id=\"root\"></div>\n# Crawailer \u2192 Full content + dynamic data\n```\n\n### \ud83e\udde0 Claude Code MCP Integration\n\n> *\"Hey Claude, go grab that data from the React app\"* \u2190 This actually works now\n\n```python\n# Add to your Claude Code MCP server\nfrom crawailer.mcp import create_mcp_server\n\n@mcp_tool(\"web_extract\")\nasync def extract_content(url: str, script: str = \"\"):\n    \"\"\"Extract content from any website with optional JavaScript execution\"\"\"\n    content = await web.get(url, script=script)\n    return {\n        \"title\": content.title,\n        \"markdown\": content.markdown,\n        \"script_result\": content.script_result,\n        \"word_count\": content.word_count\n    }\n\n# \ud83c\udf89 No more \"I can't access that site\" \n# \ud83c\udf89 No more copy-pasting content manually\n# \ud83c\udf89 Your AI can now browse the web like a human\n```\n\n## \ud83c\udfaf Design Philosophy\n\n### For Robots, By Humans\n- **Predictive**: Anticipates what you need and provides it\n- **Forgiving**: Handles errors gracefully with helpful suggestions  \n- **Efficient**: Fast by default, with smart caching and concurrency\n- **Composable**: Small, focused functions that work well together\n\n### Perfect for AI Workflows\n- **LLM-Optimized**: Clean markdown, structured data, semantic chunking\n- **Context-Aware**: Extracts relationships and metadata automatically\n- **Quality-Focused**: Built-in content quality assessment\n- **Archive-Ready**: Designed for long-term storage and retrieval\n\n## \ud83d\udcd6 Use Cases\n\n### \ud83e\udd16 AI Agents & LLM Applications\n**Problem**: Training data scattered across JavaScript-heavy academic sites\n```python\n# Research assistant workflow with JavaScript interaction\nresearch = await web.discover(\n    \"quantum computing breakthroughs\",\n    script=\"document.querySelector('.show-abstract')?.click(); return document.querySelector('.full-text')?.textContent\"\n)\nfor paper in research:\n    # Rich content includes JavaScript-extracted data\n    summary = await llm.summarize(paper.markdown)\n    dynamic_content = paper.script_result  # JavaScript execution result\n    insights = await llm.extract_insights(paper.content + dynamic_content)\n```\n\n### \ud83d\uded2 E-commerce Price Monitoring\n**Problem**: Product prices loaded via AJAX, `requests` sees loading spinners\n```python\n# Monitor competitor pricing with dynamic content\nproducts = await web.get_many(\n    competitor_urls,\n    script=\"return {price: document.querySelector('.price')?.textContent, stock: document.querySelector('.inventory')?.textContent}\"\n)\nfor product in products:\n    if product.script_result['price'] != cached_price:\n        await alert_price_change(product.url, product.script_result)\n```\n\n### \ud83d\udd17 MCP Servers\n**Problem**: Claude needs reliable web content extraction tools\n```python\n# Easy MCP integration (with crawailer[mcp])\nfrom crawailer.mcp import create_mcp_server\n\nserver = create_mcp_server()\n# Automatically exposes web.get, web.discover, etc. as MCP tools\n```\n\n### \ud83d\udcca Social Media & Content Analysis\n**Problem**: Posts and comments load infinitely via JavaScript\n```python\n# Extract social media discussions with infinite scroll\ncontent = await web.get(\n    \"https://social-platform.com/topic/ai-safety\",\n    script=\"window.scrollTo(0, document.body.scrollHeight); return document.querySelectorAll('.post').length\"\n)\n# Gets full thread content, not just initial page load\n```\n\n## \ud83d\udee0\ufe0f Installation\n\n```bash\n# Basic installation\npip install crawailer\n\n# With MCP server capabilities  \npip install crawailer[mcp]\n\n# Everything\npip install crawailer[all]\n\n# Post-install setup (installs Playwright browsers)\ncrawailer setup\n```\n\n## \ud83c\udfd7\ufe0f Architecture\n\nCrawailer is built on modern, focused libraries:\n\n- **\ud83c\udfad Playwright**: Reliable browser automation\n- **\u26a1 selectolax**: 5-10x faster HTML parsing (C-based)\n- **\ud83d\udcdd markdownify**: Clean HTML\u2192Markdown conversion\n- **\ud83e\uddf9 justext**: Intelligent content extraction and cleaning\n- **\ud83d\udd04 httpx**: Modern async HTTP client\n\n## \ud83e\uddea Battle-Tested Quality\n\nCrawailer includes **18 comprehensive test suites** with real-world scenarios:\n\n- **Modern Frameworks**: React, Vue, Angular demos with full JavaScript APIs\n- **Mobile Compatibility**: Safari iOS, Chrome Android, responsive designs\n- **Production Edge Cases**: Network failures, memory pressure, browser differences\n- **Performance Testing**: Stress tests, concurrency, resource management\n\n**Want to contribute?** We welcome PRs with new test scenarios! Our test sites library shows exactly how different frameworks should behave with JavaScript execution.\n\n> \ud83d\udcdd **Future TODO**: Move examples to dedicated repository for community contributions\n\n## \ud83e\udd1d Perfect for MCP Projects\n\nMCP servers love Crawailer because it provides:\n\n- **Focused tools**: Each function does one thing well\n- **Rich outputs**: Structured data ready for LLM consumption  \n- **Smart defaults**: Works out of the box with minimal configuration\n- **Extensible**: Easy to add domain-specific extraction logic\n\n```python\n# Example MCP server tool\n@mcp_tool(\"web_research\")\nasync def research_topic(topic: str, depth: str = \"comprehensive\"):\n    results = await web.discover(topic, max_pages=20)\n    return {\n        \"sources\": len(results),\n        \"content\": [r.summary for r in results],\n        \"insights\": await analyze_patterns(results)\n    }\n```\n\n## \ud83e\udd4a Crawailer vs Traditional Tools\n\n| Challenge | `requests` & HTTP libs | Selenium | **Crawailer** |\n|-----------|------------------------|----------|---------------|\n| **React/Vue/Angular** | \u274c Empty templates | \ud83d\udfe1 Slow, complex setup | \u2705 **Just works** |\n| **Dynamic Pricing** | \u274c Shows loading spinner | \ud83d\udfe1 Requires waits/timeouts | \u2705 **Intelligent waiting** |\n| **JavaScript APIs** | \u274c No access | \ud83d\udfe1 Clunky WebDriver calls | \u2705 **Native page.evaluate()** |\n| **Speed** | \ud83d\udfe2 100-500ms | \u274c 5-15 seconds | \u2705 **2-5 seconds** |\n| **Memory** | \ud83d\udfe2 1-5MB | \u274c 200-500MB | \ud83d\udfe1 **100-200MB** |\n| **AI-Ready Output** | \u274c Raw HTML | \u274c Raw HTML | \u2705 **Clean Markdown** |\n| **Developer Experience** | \ud83d\udfe1 Manual parsing | \u274c Complex WebDriver | \u2705 **Intuitive API** |\n\n> **The bottom line**: When JavaScript matters, Crawailer delivers. When it doesn't, use `requests`.\n> \n> \ud83d\udcd6 **[See complete tool comparison \u2192](docs/COMPARISON.md)** (includes Scrapy, Playwright, BeautifulSoup, and more)\n\n## \ud83c\udf89 What Makes It Delightful\n\n### JavaScript-Powered Intelligence\n```python\n# Dynamic content extraction from SPAs\ncontent = await web.get(\n    \"https://react-app.com\",\n    script=\"window.testData?.framework + ' v' + window.React?.version\"\n)\n# Automatically detects: React application with version info\n# Extracts: Dynamic content + framework details\n\n# E-commerce with JavaScript-loaded prices\nproduct = await web.get(\n    \"https://shop.com/product\",\n    script=\"document.querySelector('.dynamic-price')?.textContent\",\n    wait_for=\".price-loaded\"\n) \n# Recognizes product page with dynamic pricing\n# Extracts: Real-time price, reviews, availability, specs\n```\n\n### Beautiful Output\n```\n\u2728 Found 15 high-quality sources\n\ud83d\udcca Sources: 4 arxiv, 3 journals, 2 conferences, 6 blogs  \n\ud83d\udcc5 Date range: 2023-2024 (recent research)\n\u26a1 Average quality score: 8.7/10\n\ud83d\udd0d Key topics: transformers, safety, alignment\n```\n\n### Helpful Errors\n```python\ntry:\n    content = await web.get(\"problematic-site.com\")\nexcept web.CloudflareProtected:\n    # \"\ud83d\udca1 Try: await web.get(url, stealth=True)\"\nexcept web.PaywallDetected as e:\n    # \"\ud83d\udd0d Found archived version: {e.archive_url}\"\n```\n\n## \ud83d\udcda Documentation\n\n- **[Tool Comparison](docs/COMPARISON.md)**: How Crawailer compares to Scrapy, Selenium, BeautifulSoup, etc.\n- **[Getting Started](docs/getting-started.md)**: Installation and first steps\n- **[JavaScript API](docs/JAVASCRIPT_API.md)**: Complete JavaScript execution guide\n- **[API Reference](docs/API_REFERENCE.md)**: Complete function documentation  \n- **[Benchmarks](docs/BENCHMARKS.md)**: Performance comparison with other tools\n- **[MCP Integration](docs/mcp.md)**: Building MCP servers with Crawailer\n- **[Examples](examples/)**: Real-world usage patterns\n- **[Architecture](docs/architecture.md)**: How Crawailer works internally\n\n## \ud83e\udd1d Contributing\n\nWe love contributions! Crawailer is designed to be:\n- **Easy to extend**: Add new content extractors and browser capabilities\n- **Well-tested**: Comprehensive test suite with real websites\n- **Documented**: Every feature has examples and use cases\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for details.\n\n## \ud83d\udcc4 License\n\nMIT License - see [LICENSE](LICENSE) for details.\n\n---\n\n## \ud83d\ude80 Ready to Stop Losing Your Mind?\n\n```bash\npip install crawailer\ncrawailer setup  # Install browser engines\n```\n\n**Life's too short** for empty `<div>` tags and \"JavaScript required\" messages.\n\nGet content that actually exists. From websites that actually work.\n\n\u2b50 **Star us if this saves your sanity** \u2192 [git.supported.systems/MCP/crawailer](https://git.supported.systems/MCP/crawailer)\n\n---\n\n**Built with \u2764\ufe0f for the age of AI agents and automation**\n\n*Crawailer: Because robots deserve delightful web experiences too* \ud83e\udd16\u2728",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Modern Python library for browser automation and intelligent content extraction with full JavaScript execution support",
    "version": "0.1.2",
    "project_urls": {
        "API Reference": "https://git.supported.systems/MCP/crawailer/src/branch/main/docs/API_REFERENCE.md",
        "Benchmarks": "https://git.supported.systems/MCP/crawailer/src/branch/main/docs/BENCHMARKS.md",
        "Bug Tracker": "https://git.supported.systems/MCP/crawailer/issues",
        "Changelog": "https://git.supported.systems/MCP/crawailer/releases",
        "Documentation": "https://git.supported.systems/MCP/crawailer/src/branch/main/docs/README.md",
        "Homepage": "https://git.supported.systems/MCP/crawailer",
        "JavaScript Guide": "https://git.supported.systems/MCP/crawailer/src/branch/main/docs/JAVASCRIPT_API.md",
        "Repository": "https://git.supported.systems/MCP/crawailer",
        "Source Code": "https://git.supported.systems/MCP/crawailer"
    },
    "split_keywords": [
        "ai",
        " angular-scraping",
        " automation",
        " browser-control",
        " content-extraction",
        " javascript-execution",
        " llm",
        " mcp",
        " playwright",
        " react-scraping",
        " spa-crawling",
        " vue-scraping",
        " web-automation",
        " web-scraping"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4d95912c7dad41a26a369b78e06056bd031a8848cd5f6391b763806dbb5dd88f",
                "md5": "7123e6e982281d44bb095dcb371258a6",
                "sha256": "9417042864571d7796dac7aba93b683234a666b9800c5f29c46aad24bd17610d"
            },
            "downloads": -1,
            "filename": "crawailer-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7123e6e982281d44bb095dcb371258a6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 28400,
            "upload_time": "2025-09-19T04:46:35",
            "upload_time_iso_8601": "2025-09-19T04:46:35.736954Z",
            "url": "https://files.pythonhosted.org/packages/4d/95/912c7dad41a26a369b78e06056bd031a8848cd5f6391b763806dbb5dd88f/crawailer-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "240f40d9e8323548612bbf523b4d0e8abb841dce369a609268253c1b981ee7a7",
                "md5": "a45eaa0341b1f1acaae37b99df9230f4",
                "sha256": "876b5952a0a163f2610bd839bd0537411fa0142200a909274445799b541bf1ea"
            },
            "downloads": -1,
            "filename": "crawailer-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a45eaa0341b1f1acaae37b99df9230f4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 264550,
            "upload_time": "2025-09-19T04:46:38",
            "upload_time_iso_8601": "2025-09-19T04:46:38.758261Z",
            "url": "https://files.pythonhosted.org/packages/24/0f/40d9e8323548612bbf523b4d0e8abb841dce369a609268253c1b981ee7a7/crawailer-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-19 04:46:38",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "crawailer"
}
        
Elapsed time: 0.68478s