# OpenCrawler
<div align="center">
<img src="assets/opencrawler-logo.svg" alt="OpenCrawler Logo" width="200" height="200">
<br>
<em>AI-Powered Web Intelligence</em>
</div>
[](https://www.python.org/downloads/)
[](LICENSE)
[](https://pypi.org/project/opencrawler/)
[](tests/)
[](https://github.com/psf/black)
**OpenCrawler** is a production-ready, enterprise-grade web scraping and crawling framework with advanced AI integration, comprehensive monitoring, and scalable architecture.
## 🚀 Quick Installation
```bash
# Install from PyPI
pip install opencrawler
# Install with AI capabilities
pip install "opencrawler[ai]"
# Install with all features
pip install "opencrawler[all]"
```
## Features
### Core Capabilities
- **Multi-Engine Support**: Playwright, Selenium, Requests, CloudScraper
- **AI-Powered Extraction**: OpenAI Agents SDK integration for intelligent data extraction
- **Stealth Technology**: Advanced anti-detection and bot bypass capabilities
- **Distributed Processing**: Scalable architecture for high-volume operations
- **Real-time Monitoring**: Comprehensive metrics and health monitoring
- **Enterprise Security**: RBAC, audit trails, and compliance features
### Advanced Features
- **LLM Integration**: Support for OpenAI, Anthropic, and local models
- **Microservice Architecture**: FastAPI-based REST API with auto-documentation
- **Database Support**: PostgreSQL, TimescaleDB, Redis integration
- **Container Ready**: Docker and Kubernetes deployment configurations
- **Performance Optimization**: Intelligent caching, rate limiting, and resource management
- **Error Recovery**: Sophisticated error handling and retry mechanisms
## Quick Start
### Basic Usage
```python
import asyncio
from webscraper.core.advanced_scraper import AdvancedWebScraper
async def main():
# Initialize scraper
scraper = AdvancedWebScraper()
await scraper.setup()
# Scrape a webpage
result = await scraper.scrape_url("https://example.com")
print(f"Title: {result.get('title')}")
print(f"Content length: {len(result.get('content', ''))}")
# Cleanup
await scraper.cleanup()
asyncio.run(main())
```
### CLI Usage
```bash
# Basic scraping
opencrawler scrape https://example.com
# Advanced scraping with AI
opencrawler scrape https://example.com --ai-extract --model gpt-4
# Start API server
opencrawler api --host 0.0.0.0 --port 8000
# Run system validation
opencrawler-validate --level production
```
## Architecture
OpenCrawler follows a modular, microservice-oriented architecture:
```
OpenCrawler/
├── webscraper/
│ ├── core/ # Core scraping engines
│ ├── ai/ # AI/LLM integration
│ ├── api/ # FastAPI REST API
│ ├── engines/ # Scraping engines (Playwright, Selenium, etc.)
│ ├── processors/ # Data processing pipelines
│ ├── monitoring/ # System monitoring and metrics
│ ├── security/ # Authentication and security
│ ├── utils/ # Utilities and helpers
│ └── orchestrator/ # System orchestration
├── tests/ # Comprehensive test suite
├── deployment/ # Docker and Kubernetes configs
├── docs/ # Documentation
└── examples/ # Usage examples
```
## Configuration
### Environment Variables
```bash
# OpenAI API (optional)
export OPENAI_API_KEY="your-api-key-here"
# Database (optional)
export DATABASE_URL="postgresql://user:pass@localhost/opencrawler"
# Redis (optional)
export REDIS_URL="redis://localhost:6379"
# Test mode
export OPENCRAWLER_TEST_MODE=true
```
### Configuration File
Create a `config.yaml` file:
```yaml
scraper:
engines: ["playwright", "requests"]
stealth_level: "medium"
javascript_enabled: true
ai:
enabled: true
model: "gpt-4"
temperature: 0.7
database:
url: "postgresql://localhost/opencrawler"
pool_size: 10
monitoring:
enabled: true
metrics_port: 9090
security:
enable_auth: true
rate_limit: 100
```
## API Reference
### REST API
Start the API server:
```bash
opencrawler-api --port 8000
```
#### Endpoints
- `GET /health` - Health check
- `POST /scrape` - Scrape a single URL
- `POST /crawl` - Crawl multiple URLs
- `GET /metrics` - System metrics
- `GET /docs` - API documentation
#### Example Request
```bash
curl -X POST "http://localhost:8000/scrape" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "extract_ai": true}'
```
### Python API
```python
from webscraper.api.complete_api import OpenCrawlerAPI
# Initialize API
api = OpenCrawlerAPI()
await api.initialize()
# Scrape with AI
result = await api.scrape_with_ai(
url="https://example.com",
schema={"title": "string", "content": "string"}
)
# Cleanup
await api.cleanup()
```
## Advanced Usage
### AI-Powered Extraction
```python
from webscraper.ai.llm_scraper import LLMScraper
scraper = LLMScraper()
await scraper.initialize()
# Extract structured data
result = await scraper.run(
url="https://news.example.com",
schema={
"title": "string",
"author": "string",
"date": "date",
"content": "string"
}
)
```
### Distributed Processing
```python
from webscraper.core.distributed_processor import DistributedProcessor
processor = DistributedProcessor(worker_count=16)
await processor.initialize()
# Process multiple URLs
results = await processor.process_batch([
"https://example1.com",
"https://example2.com",
"https://example3.com"
])
```
### Custom Engines
```python
from webscraper.engines.base_engine import BaseEngine
class CustomEngine(BaseEngine):
async def fetch(self, url: str, **kwargs) -> dict:
# Custom implementation
return {"content": "...", "status": 200}
# Register custom engine
scraper.register_engine("custom", CustomEngine())
```
## Monitoring and Metrics
### Built-in Monitoring
```python
from webscraper.monitoring.advanced_monitoring import AdvancedMonitoringSystem
monitor = AdvancedMonitoringSystem()
await monitor.initialize()
# Get system metrics
metrics = await monitor.get_system_metrics()
print(f"CPU: {metrics['cpu_usage']}%")
print(f"Memory: {metrics['memory_usage']}%")
```
### Prometheus Integration
OpenCrawler exports metrics to Prometheus:
```bash
# Start with monitoring
python master_cli.py api --enable-metrics --metrics-port 9090
```
Metrics available at `http://localhost:9090/metrics`
## Deployment
### Docker
```bash
# Build image
docker build -t opencrawler .
# Run container
docker run -p 8000:8000 opencrawler
```
### Docker Compose
```bash
# Start all services
docker-compose up -d
# Production deployment
docker-compose -f docker-compose.production.yml up -d
```
### Kubernetes
```bash
# Deploy to Kubernetes
kubectl apply -f kubernetes/
# Check deployment
kubectl get pods -l app=opencrawler
```
### Production Deployment
```python
from deployment.production_deployment import ProductionDeploymentSystem
deployment = ProductionDeploymentSystem()
await deployment.initialize()
# Deploy to production
result = await deployment.deploy(
environment="production",
config_overrides={"replicas": 5}
)
```
## Testing
### Running Tests
```bash
# Run all tests
pytest
# Run specific test suite
pytest tests/test_complete_system.py
# Run with coverage
pytest --cov=webscraper
# Run in test mode
OPENCRAWLER_TEST_MODE=true pytest
```
### Test Categories
- **Unit Tests**: Core component testing
- **Integration Tests**: Service integration testing
- **Performance Tests**: Load and performance testing
- **Security Tests**: Security validation
- **End-to-End Tests**: Complete workflow testing
### Validation
```bash
# Run comprehensive validation
python webscraper/utils/comprehensive_validator.py --level production
# Check system health
python -c "
from webscraper.orchestrator.system_orchestrator import SystemOrchestrator
import asyncio
async def main():
orchestrator = SystemOrchestrator()
await orchestrator.initialize()
health = await orchestrator.get_system_health()
print(f'System Status: {health[\"status\"]}')
await orchestrator.shutdown()
asyncio.run(main())
"
```
## Performance
### Benchmarks
- **Single Page**: ~2-5 seconds per page
- **Concurrent Crawling**: 50-100 pages/minute
- **Memory Usage**: <1GB for typical workloads
- **CPU Usage**: Optimized for multi-core systems
### Optimization
```python
# Enable performance optimizations
scraper = AdvancedWebScraper(
stealth_level="low", # Faster but less stealthy
javascript_enabled=False, # Skip JS rendering
cache_enabled=True, # Enable caching
concurrent_requests=10 # Increase concurrency
)
```
## Security
### Authentication
```python
from webscraper.security.authentication import AuthManager
auth = AuthManager()
await auth.initialize()
# Create user
user = await auth.create_user("username", "password", ["scraper"])
# Authenticate
token = await auth.authenticate("username", "password")
```
### Rate Limiting
```python
from webscraper.security.rate_limiter import RateLimiter
limiter = RateLimiter(requests_per_minute=60)
await limiter.check_rate_limit(user_id="user123")
```
## Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
### Development Setup
```bash
# Clone and install
git clone https://github.com/llamasearch/opencrawler.git
cd opencrawler
pip install -e ".[dev]"
# Run pre-commit hooks
pre-commit install
# Run tests
pytest
```
### Code Style
We use [Black](https://github.com/psf/black) for code formatting:
```bash
# Format code
black webscraper/
# Check formatting
black --check webscraper/
```
## License
OpenCrawler is licensed under the MIT License. See [LICENSE](LICENSE) for details.
## Support
- **Documentation**: [docs/](docs/)
- **Examples**: [examples/](examples/)
- **Issues**: [GitHub Issues](https://github.com/llamasearch/opencrawler/issues)
- **Discussions**: [GitHub Discussions](https://github.com/llamasearch/opencrawler/discussions)
## Changelog
See [CHANGELOG.md](CHANGELOG.md) for version history and updates.
## Assets
OpenCrawler includes a complete set of professional logo assets:
### Logo Variants
- **`assets/opencrawler-logo.svg`** - Main logo with full branding (light theme)
- **`assets/opencrawler-logo-dark.svg`** - Dark variant for light backgrounds
- **`assets/opencrawler-icon.svg`** - Icon version for app icons and buttons
- **`assets/favicon.svg`** - Favicon optimized for small sizes
### Design Features
- **Spider/Crawler Theme**: Represents web crawling and data extraction
- **AI/Neural Network Elements**: Symbolizes AI-powered intelligence
- **Modern Gradients**: Professional blue, green, and orange color scheme
- **Scalable Vector Graphics**: Perfect quality at any size
- **Multiple Formats**: SVG for web, can be converted to PNG/ICO as needed
### Usage Guidelines
```html
<!-- Main logo for documentation -->
<img src="assets/opencrawler-logo.svg" alt="OpenCrawler" width="200">
<!-- Dark variant for light backgrounds -->
<img src="assets/opencrawler-logo-dark.svg" alt="OpenCrawler" width="200">
<!-- Icon for buttons/navigation -->
<img src="assets/opencrawler-icon.svg" alt="OpenCrawler" width="32">
<!-- Favicon -->
<link rel="icon" type="image/svg+xml" href="assets/favicon.svg">
```
## Acknowledgments
OpenCrawler is built with these excellent libraries:
- [Playwright](https://playwright.dev/) - Modern web automation
- [FastAPI](https://fastapi.tiangolo.com/) - High-performance API framework
- [OpenAI](https://openai.com/) - AI/LLM integration
- [PostgreSQL](https://www.postgresql.org/) - Database backend
- [Docker](https://www.docker.com/) - Containerization
- [Kubernetes](https://kubernetes.io/) - Container orchestration
---
**Author**: Nik Jois <nikjois@llamasearch.ai>
**Organization**: LlamaSearch.ai
**Version**: 1.0.1
**Status**: Production Ready
Raw data
{
"_id": null,
"home_page": null,
"name": "opencrawler",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Nik Jois <nikjois@llamasearch.ai>",
"keywords": "web-scraping, crawling, ai, llm, automation, data-extraction, playwright, selenium, fastapi, microservices, enterprise, production",
"author": null,
"author_email": "Nik Jois <nikjois@llamasearch.ai>",
"download_url": "https://files.pythonhosted.org/packages/7c/1c/654ed4818796c9f3e76d21c1f3d09f85e8e05e67443e8269198e22f7084a/opencrawler-1.0.2.tar.gz",
"platform": null,
"description": "# OpenCrawler\n\n<div align=\"center\">\n <img src=\"assets/opencrawler-logo.svg\" alt=\"OpenCrawler Logo\" width=\"200\" height=\"200\">\n <br>\n <em>AI-Powered Web Intelligence</em>\n</div>\n\n[](https://www.python.org/downloads/)\n[](LICENSE)\n[](https://pypi.org/project/opencrawler/)\n[](tests/)\n[](https://github.com/psf/black)\n\n**OpenCrawler** is a production-ready, enterprise-grade web scraping and crawling framework with advanced AI integration, comprehensive monitoring, and scalable architecture.\n\n## \ud83d\ude80 Quick Installation\n\n```bash\n# Install from PyPI\npip install opencrawler\n\n# Install with AI capabilities\npip install \"opencrawler[ai]\"\n\n# Install with all features\npip install \"opencrawler[all]\"\n```\n\n## Features\n\n### Core Capabilities\n- **Multi-Engine Support**: Playwright, Selenium, Requests, CloudScraper\n- **AI-Powered Extraction**: OpenAI Agents SDK integration for intelligent data extraction\n- **Stealth Technology**: Advanced anti-detection and bot bypass capabilities\n- **Distributed Processing**: Scalable architecture for high-volume operations\n- **Real-time Monitoring**: Comprehensive metrics and health monitoring\n- **Enterprise Security**: RBAC, audit trails, and compliance features\n\n### Advanced Features\n- **LLM Integration**: Support for OpenAI, Anthropic, and local models\n- **Microservice Architecture**: FastAPI-based REST API with auto-documentation\n- **Database Support**: PostgreSQL, TimescaleDB, Redis integration\n- **Container Ready**: Docker and Kubernetes deployment configurations\n- **Performance Optimization**: Intelligent caching, rate limiting, and resource management\n- **Error Recovery**: Sophisticated error handling and retry mechanisms\n\n## Quick Start\n\n### Basic Usage\n\n```python\nimport asyncio\nfrom webscraper.core.advanced_scraper import AdvancedWebScraper\n\nasync def main():\n # Initialize scraper\n scraper = AdvancedWebScraper()\n await scraper.setup()\n \n # Scrape a webpage\n result = await scraper.scrape_url(\"https://example.com\")\n print(f\"Title: {result.get('title')}\")\n print(f\"Content length: {len(result.get('content', ''))}\")\n \n # Cleanup\n await scraper.cleanup()\n\nasyncio.run(main())\n```\n\n### CLI Usage\n\n```bash\n# Basic scraping\nopencrawler scrape https://example.com\n\n# Advanced scraping with AI\nopencrawler scrape https://example.com --ai-extract --model gpt-4\n\n# Start API server\nopencrawler api --host 0.0.0.0 --port 8000\n\n# Run system validation\nopencrawler-validate --level production\n```\n\n## Architecture\n\nOpenCrawler follows a modular, microservice-oriented architecture:\n\n```\nOpenCrawler/\n\u251c\u2500\u2500 webscraper/\n\u2502 \u251c\u2500\u2500 core/ # Core scraping engines\n\u2502 \u251c\u2500\u2500 ai/ # AI/LLM integration\n\u2502 \u251c\u2500\u2500 api/ # FastAPI REST API\n\u2502 \u251c\u2500\u2500 engines/ # Scraping engines (Playwright, Selenium, etc.)\n\u2502 \u251c\u2500\u2500 processors/ # Data processing pipelines\n\u2502 \u251c\u2500\u2500 monitoring/ # System monitoring and metrics\n\u2502 \u251c\u2500\u2500 security/ # Authentication and security\n\u2502 \u251c\u2500\u2500 utils/ # Utilities and helpers\n\u2502 \u2514\u2500\u2500 orchestrator/ # System orchestration\n\u251c\u2500\u2500 tests/ # Comprehensive test suite\n\u251c\u2500\u2500 deployment/ # Docker and Kubernetes configs\n\u251c\u2500\u2500 docs/ # Documentation\n\u2514\u2500\u2500 examples/ # Usage examples\n```\n\n## Configuration\n\n### Environment Variables\n\n```bash\n# OpenAI API (optional)\nexport OPENAI_API_KEY=\"your-api-key-here\"\n\n# Database (optional)\nexport DATABASE_URL=\"postgresql://user:pass@localhost/opencrawler\"\n\n# Redis (optional)\nexport REDIS_URL=\"redis://localhost:6379\"\n\n# Test mode\nexport OPENCRAWLER_TEST_MODE=true\n```\n\n### Configuration File\n\nCreate a `config.yaml` file:\n\n```yaml\nscraper:\n engines: [\"playwright\", \"requests\"]\n stealth_level: \"medium\"\n javascript_enabled: true\n \nai:\n enabled: true\n model: \"gpt-4\"\n temperature: 0.7\n \ndatabase:\n url: \"postgresql://localhost/opencrawler\"\n pool_size: 10\n \nmonitoring:\n enabled: true\n metrics_port: 9090\n \nsecurity:\n enable_auth: true\n rate_limit: 100\n```\n\n## API Reference\n\n### REST API\n\nStart the API server:\n\n```bash\nopencrawler-api --port 8000\n```\n\n#### Endpoints\n\n- `GET /health` - Health check\n- `POST /scrape` - Scrape a single URL\n- `POST /crawl` - Crawl multiple URLs\n- `GET /metrics` - System metrics\n- `GET /docs` - API documentation\n\n#### Example Request\n\n```bash\ncurl -X POST \"http://localhost:8000/scrape\" \\\n -H \"Content-Type: application/json\" \\\n -d '{\"url\": \"https://example.com\", \"extract_ai\": true}'\n```\n\n### Python API\n\n```python\nfrom webscraper.api.complete_api import OpenCrawlerAPI\n\n# Initialize API\napi = OpenCrawlerAPI()\nawait api.initialize()\n\n# Scrape with AI\nresult = await api.scrape_with_ai(\n url=\"https://example.com\",\n schema={\"title\": \"string\", \"content\": \"string\"}\n)\n\n# Cleanup\nawait api.cleanup()\n```\n\n## Advanced Usage\n\n### AI-Powered Extraction\n\n```python\nfrom webscraper.ai.llm_scraper import LLMScraper\n\nscraper = LLMScraper()\nawait scraper.initialize()\n\n# Extract structured data\nresult = await scraper.run(\n url=\"https://news.example.com\",\n schema={\n \"title\": \"string\",\n \"author\": \"string\", \n \"date\": \"date\",\n \"content\": \"string\"\n }\n)\n```\n\n### Distributed Processing\n\n```python\nfrom webscraper.core.distributed_processor import DistributedProcessor\n\nprocessor = DistributedProcessor(worker_count=16)\nawait processor.initialize()\n\n# Process multiple URLs\nresults = await processor.process_batch([\n \"https://example1.com\",\n \"https://example2.com\",\n \"https://example3.com\"\n])\n```\n\n### Custom Engines\n\n```python\nfrom webscraper.engines.base_engine import BaseEngine\n\nclass CustomEngine(BaseEngine):\n async def fetch(self, url: str, **kwargs) -> dict:\n # Custom implementation\n return {\"content\": \"...\", \"status\": 200}\n\n# Register custom engine\nscraper.register_engine(\"custom\", CustomEngine())\n```\n\n## Monitoring and Metrics\n\n### Built-in Monitoring\n\n```python\nfrom webscraper.monitoring.advanced_monitoring import AdvancedMonitoringSystem\n\nmonitor = AdvancedMonitoringSystem()\nawait monitor.initialize()\n\n# Get system metrics\nmetrics = await monitor.get_system_metrics()\nprint(f\"CPU: {metrics['cpu_usage']}%\")\nprint(f\"Memory: {metrics['memory_usage']}%\")\n```\n\n### Prometheus Integration\n\nOpenCrawler exports metrics to Prometheus:\n\n```bash\n# Start with monitoring\npython master_cli.py api --enable-metrics --metrics-port 9090\n```\n\nMetrics available at `http://localhost:9090/metrics`\n\n## Deployment\n\n### Docker\n\n```bash\n# Build image\ndocker build -t opencrawler .\n\n# Run container\ndocker run -p 8000:8000 opencrawler\n```\n\n### Docker Compose\n\n```bash\n# Start all services\ndocker-compose up -d\n\n# Production deployment\ndocker-compose -f docker-compose.production.yml up -d\n```\n\n### Kubernetes\n\n```bash\n# Deploy to Kubernetes\nkubectl apply -f kubernetes/\n\n# Check deployment\nkubectl get pods -l app=opencrawler\n```\n\n### Production Deployment\n\n```python\nfrom deployment.production_deployment import ProductionDeploymentSystem\n\ndeployment = ProductionDeploymentSystem()\nawait deployment.initialize()\n\n# Deploy to production\nresult = await deployment.deploy(\n environment=\"production\",\n config_overrides={\"replicas\": 5}\n)\n```\n\n## Testing\n\n### Running Tests\n\n```bash\n# Run all tests\npytest\n\n# Run specific test suite\npytest tests/test_complete_system.py\n\n# Run with coverage\npytest --cov=webscraper\n\n# Run in test mode\nOPENCRAWLER_TEST_MODE=true pytest\n```\n\n### Test Categories\n\n- **Unit Tests**: Core component testing\n- **Integration Tests**: Service integration testing\n- **Performance Tests**: Load and performance testing\n- **Security Tests**: Security validation\n- **End-to-End Tests**: Complete workflow testing\n\n### Validation\n\n```bash\n# Run comprehensive validation\npython webscraper/utils/comprehensive_validator.py --level production\n\n# Check system health\npython -c \"\nfrom webscraper.orchestrator.system_orchestrator import SystemOrchestrator\nimport asyncio\n\nasync def main():\n orchestrator = SystemOrchestrator()\n await orchestrator.initialize()\n health = await orchestrator.get_system_health()\n print(f'System Status: {health[\\\"status\\\"]}')\n await orchestrator.shutdown()\n\nasyncio.run(main())\n\"\n```\n\n## Performance\n\n### Benchmarks\n\n- **Single Page**: ~2-5 seconds per page\n- **Concurrent Crawling**: 50-100 pages/minute\n- **Memory Usage**: <1GB for typical workloads\n- **CPU Usage**: Optimized for multi-core systems\n\n### Optimization\n\n```python\n# Enable performance optimizations\nscraper = AdvancedWebScraper(\n stealth_level=\"low\", # Faster but less stealthy\n javascript_enabled=False, # Skip JS rendering\n cache_enabled=True, # Enable caching\n concurrent_requests=10 # Increase concurrency\n)\n```\n\n## Security\n\n### Authentication\n\n```python\nfrom webscraper.security.authentication import AuthManager\n\nauth = AuthManager()\nawait auth.initialize()\n\n# Create user\nuser = await auth.create_user(\"username\", \"password\", [\"scraper\"])\n\n# Authenticate\ntoken = await auth.authenticate(\"username\", \"password\")\n```\n\n### Rate Limiting\n\n```python\nfrom webscraper.security.rate_limiter import RateLimiter\n\nlimiter = RateLimiter(requests_per_minute=60)\nawait limiter.check_rate_limit(user_id=\"user123\")\n```\n\n## Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n### Development Setup\n\n```bash\n# Clone and install\ngit clone https://github.com/llamasearch/opencrawler.git\ncd opencrawler\npip install -e \".[dev]\"\n\n# Run pre-commit hooks\npre-commit install\n\n# Run tests\npytest\n```\n\n### Code Style\n\nWe use [Black](https://github.com/psf/black) for code formatting:\n\n```bash\n# Format code\nblack webscraper/\n\n# Check formatting\nblack --check webscraper/\n```\n\n## License\n\nOpenCrawler is licensed under the MIT License. See [LICENSE](LICENSE) for details.\n\n## Support\n\n- **Documentation**: [docs/](docs/)\n- **Examples**: [examples/](examples/)\n- **Issues**: [GitHub Issues](https://github.com/llamasearch/opencrawler/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/llamasearch/opencrawler/discussions)\n\n## Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for version history and updates.\n\n## Assets\n\nOpenCrawler includes a complete set of professional logo assets:\n\n### Logo Variants\n\n- **`assets/opencrawler-logo.svg`** - Main logo with full branding (light theme)\n- **`assets/opencrawler-logo-dark.svg`** - Dark variant for light backgrounds\n- **`assets/opencrawler-icon.svg`** - Icon version for app icons and buttons\n- **`assets/favicon.svg`** - Favicon optimized for small sizes\n\n### Design Features\n\n- **Spider/Crawler Theme**: Represents web crawling and data extraction\n- **AI/Neural Network Elements**: Symbolizes AI-powered intelligence\n- **Modern Gradients**: Professional blue, green, and orange color scheme\n- **Scalable Vector Graphics**: Perfect quality at any size\n- **Multiple Formats**: SVG for web, can be converted to PNG/ICO as needed\n\n### Usage Guidelines\n\n```html\n<!-- Main logo for documentation -->\n<img src=\"assets/opencrawler-logo.svg\" alt=\"OpenCrawler\" width=\"200\">\n\n<!-- Dark variant for light backgrounds -->\n<img src=\"assets/opencrawler-logo-dark.svg\" alt=\"OpenCrawler\" width=\"200\">\n\n<!-- Icon for buttons/navigation -->\n<img src=\"assets/opencrawler-icon.svg\" alt=\"OpenCrawler\" width=\"32\">\n\n<!-- Favicon -->\n<link rel=\"icon\" type=\"image/svg+xml\" href=\"assets/favicon.svg\">\n```\n\n## Acknowledgments\n\nOpenCrawler is built with these excellent libraries:\n\n- [Playwright](https://playwright.dev/) - Modern web automation\n- [FastAPI](https://fastapi.tiangolo.com/) - High-performance API framework\n- [OpenAI](https://openai.com/) - AI/LLM integration\n- [PostgreSQL](https://www.postgresql.org/) - Database backend\n- [Docker](https://www.docker.com/) - Containerization\n- [Kubernetes](https://kubernetes.io/) - Container orchestration\n\n---\n\n**Author**: Nik Jois <nikjois@llamasearch.ai> \n**Organization**: LlamaSearch.ai \n**Version**: 1.0.1 \n**Status**: Production Ready \n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Production-ready, enterprise-grade web scraping and crawling framework with advanced AI integration",
"version": "1.0.2",
"project_urls": {
"Changelog": "https://github.com/llamasearch/opencrawler/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/llamasearch/opencrawler/docs",
"Homepage": "https://github.com/llamasearch/opencrawler",
"Issues": "https://github.com/llamasearch/opencrawler/issues",
"Repository": "https://github.com/llamasearch/opencrawler"
},
"split_keywords": [
"web-scraping",
" crawling",
" ai",
" llm",
" automation",
" data-extraction",
" playwright",
" selenium",
" fastapi",
" microservices",
" enterprise",
" production"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "22ebfb279d35e792363e9a46d10317ca7c621a0318268a1a02edca53e04330c0",
"md5": "7908eead0d4bf45fa145d60affb5653a",
"sha256": "ce2dc519d0aa1021ab30d0ca50b194c835d1be1f5ad3ec0be938ab8700c422b3"
},
"downloads": -1,
"filename": "opencrawler-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7908eead0d4bf45fa145d60affb5653a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 24400,
"upload_time": "2025-07-16T16:04:27",
"upload_time_iso_8601": "2025-07-16T16:04:27.594254Z",
"url": "https://files.pythonhosted.org/packages/22/eb/fb279d35e792363e9a46d10317ca7c621a0318268a1a02edca53e04330c0/opencrawler-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "7c1c654ed4818796c9f3e76d21c1f3d09f85e8e05e67443e8269198e22f7084a",
"md5": "045b2f681fd936047bdf2b48df6cd9d4",
"sha256": "9be15833b6b9ad19552192e35044025b426fdfda2780ad72eb6f579af4af7386"
},
"downloads": -1,
"filename": "opencrawler-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "045b2f681fd936047bdf2b48df6cd9d4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 48329,
"upload_time": "2025-07-16T16:04:28",
"upload_time_iso_8601": "2025-07-16T16:04:28.891884Z",
"url": "https://files.pythonhosted.org/packages/7c/1c/654ed4818796c9f3e76d21c1f3d09f85e8e05e67443e8269198e22f7084a/opencrawler-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-16 16:04:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "llamasearch",
"github_project": "opencrawler",
"github_not_found": true,
"lcname": "opencrawler"
}