ita-scrapper

Name	ita-scrapper JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	A Python library for scraping ITA Matrix travel website using Playwright
upload_time	2025-07-27 02:23:15
maintainer	None
docs_url	None
author	ITA Scrapper Contributors
requires_python	>=3.10
license	MIT
keywords	automation flight ita matrix playwright scraping travel
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # ITA Scrapper

[![PyPI version](https://badge.fury.io/py/ita-scrapper.svg)](https://badge.fury.io/py/ita-scrapper)
[![Python versions](https://img.shields.io/pypi/pyversions/ita-scrapper.svg)](https://pypi.org/project/ita-scrapper/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![CI](https://github.com/yourusername/ita-scrapper/workflows/CI/badge.svg)](https://github.com/yourusername/ita-scrapper/actions)

A powerful Python library for scraping ITA Matrix flight data using Playwright. Get flight prices, schedules, and travel information programmatically with a clean, async API.

## ✨ Features

- 🛫 **Flight Search**: Search flights between any airports worldwide
- 📅 **Flexible Dates**: Support for one-way, round-trip, and multi-city searches  
- 💰 **Price Parsing**: Parse and normalize flight prices from various formats
- ⏱️ **Duration Handling**: Parse flight durations and format them consistently
- 🌍 **Airport Codes**: Validate and normalize IATA/ICAO airport codes
- 🎯 **Type Safety**: Full Pydantic model support with type hints
- ⚡ **Async Support**: Built with async/await for high performance
- � **Tested**: Comprehensive test suite with 95%+ coverage
- 🖥️ **CLI Interface**: Command-line tool for quick searches
- 🔧 **MCP Server**: Model Context Protocol server for AI integration

## 📦 Installation

```bash
pip install ita-scrapper
```

For development with all extras:
```bash
pip install ita-scrapper[dev,mcp]
```

### Install Playwright browsers:
```bash
playwright install chromium
```

## 🚀 Quick Start

### Python API

```python
import asyncio
from datetime import date, timedelta
from ita_scrapper import ITAScrapper, CabinClass

async def search_flights():
    async with ITAScrapper(headless=True) as scrapper:
        # Search for flights
        results = await scrapper.search_flights(
            origin="JFK",
            destination="LAX", 
            departure_date=date.today() + timedelta(days=30),
            return_date=date.today() + timedelta(days=37),
            adults=2,
            cabin_class=CabinClass.BUSINESS
        )
        
        # Print results
        for i, flight in enumerate(results.flights, 1):
            print(f"Flight {i}:")
            print(f"  Price: ${flight.price}")
            print(f"  Duration: {flight.duration}")
            print(f"  Stops: {flight.stops}")
            print(f"  Airline: {flight.airline}")
            print()

# Run the search
asyncio.run(search_flights())
```

### Command Line Interface

```bash
# Search for flights
ita-scrapper search --origin JFK --destination LAX \
    --departure-date 2024-08-15 --return-date 2024-08-22 \
    --adults 2 --cabin-class BUSINESS

# Parse flight data
ita-scrapper parse "2h 30m" --type duration
ita-scrapper parse "$1,234.56" --type price  
ita-scrapper parse "14:30" --type time --reference-date 2024-08-15

# Get help
ita-scrapper --help
```

## 📚 Documentation

### Quick Links

- **[📖 API Documentation](docs/api.md)** - Complete API reference with examples
- **[🔧 Developer Guide](docs/developer-guide.md)** - Architecture and extension guide  
- **[🚨 Troubleshooting](docs/troubleshooting.md)** - Common issues and solutions
- **[📊 Project Summary](PROJECT_SUMMARY.md)** - High-level project overview

### API Documentation

Comprehensive API documentation is available in the [docs/api.md](docs/api.md) file, covering:

- **Core Classes**: ITAScrapper, ITAMatrixParser
- **Data Models**: Flight, SearchParams, FlightResult
- **Utility Functions**: Price parsing, duration formatting, validation
- **Exception Handling**: Complete error handling strategies
- **Best Practices**: Recommended usage patterns

### Developer Guide

For developers wanting to extend or contribute to ITA Scrapper, see [docs/developer-guide.md](docs/developer-guide.md):

- **Architecture Overview**: Component design and data flow
- **Parser Architecture**: Multi-strategy parsing system
- **Browser Automation**: Playwright integration and anti-detection
- **Extension Points**: Adding new parsers and data models
- **Debugging Guide**: Tools and techniques for troubleshooting
- **Performance Optimization**: Memory and speed optimization

### Troubleshooting

Having issues? Check [docs/troubleshooting.md](docs/troubleshooting.md) for solutions to:

- **Installation Issues**: Dependencies and browser setup
- **Website Access**: Blocking, CAPTCHAs, and rate limiting  
- **Parsing Problems**: Data extraction and validation issues
- **Performance**: Memory usage and speed optimization
- **Development Setup**: Environment configuration and debugging

## 🚀 Quick Start

### Core Classes

#### ITAScrapper
Main scraper class for flight searches.

```python
class ITAScrapper:
    def __init__(self, headless: bool = True, timeout: int = 30000):
        """Initialize the scrapper."""
        
    async def search_flights(
        self,
        origin: str,
        destination: str,
        departure_date: date,
        return_date: Optional[date] = None,
        adults: int = 1,
        children: int = 0,
        infants: int = 0,
        cabin_class: CabinClass = CabinClass.ECONOMY
    ) -> FlightResult:
        """Search for flights."""
```

#### Models

```python
from ita_scrapper import (
    Flight,           # Individual flight details
    FlightResult,     # Search results container
    SearchParams,     # Search parameters
    CabinClass,       # Enum for cabin classes
    TripType,         # Enum for trip types
    Airport,          # Airport information
)
```

### Utility Functions

```python
from ita_scrapper import (
    parse_price,           # Parse price strings
    parse_duration,        # Parse duration strings
    parse_time,            # Parse time strings  
    validate_airport_code, # Validate airport codes
    format_duration,       # Format durations
    is_valid_date_range,   # Validate date ranges
)

# Examples
price = parse_price("$1,234.56")  # Returns Decimal('1234.56')
duration = parse_duration("2h 30m")  # Returns 150 (minutes)
code = validate_airport_code("jfk")  # Returns "JFK"
```

## 🎯 Advanced Usage

### Context Manager
```python
# Recommended: Use as context manager
async with ITAScrapper(headless=True) as scrapper:
    results = await scrapper.search_flights(...)

# Manual management
scrapper = ITAScrapper()
await scrapper.start()
try:
    results = await scrapper.search_flights(...)
finally:
    await scrapper.close()
```

### Error Handling
```python
from ita_scrapper import ITAScrapperError, NavigationError, TimeoutError

try:
    async with ITAScrapper() as scrapper:
        results = await scrapper.search_flights(...)
except NavigationError:
    print("Failed to navigate to search page")
except TimeoutError:
    print("Search timed out")
except ITAScrapperError as e:
    print(f"General error: {e}")
```

### Custom Configuration
```python
scrapper = ITAScrapper(
    headless=False,        # Show browser window
    timeout=60000,         # 60 second timeout
)
```

## 🧪 Testing

Run the test suite:

```bash
# All tests
pytest

# Unit tests only (fast)  
pytest -m "not slow"

# Integration tests (slow, requires browser)
pytest -m slow

# With coverage
pytest --cov=src/ita_scrapper --cov-report=html
```

## 🔧 MCP Server

Use ITA Scrapper as a Model Context Protocol server:

```python
# Install MCP support
pip install ita-scrapper[mcp]

# Create MCP server (see examples/mcp_integration.py)
from ita_scrapper.mcp import create_mcp_server
server = create_mcp_server()
```

Configure in Claude Desktop:
```json
{
  "mcpServers": {
    "ita-scrapper": {
      "command": "python",
      "args": ["/path/to/ita_scrapper_mcp_server.py"]
    }
  }
}
```

## 🌟 Examples

Check out the `/examples` directory for more usage examples:

- `basic_usage.py` - Simple flight search
- `demo_usage.py` - Interactive demo
- `matrix_examples.py` - Advanced search patterns
- `mcp_integration.py` - MCP server setup
- `test_real_sites.py` - Real-world testing

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

### Development Setup

```bash
# Clone repository
git clone https://github.com/yourusername/ita-scrapper.git
cd ita-scrapper

# Install with uv (recommended)
uv sync --all-extras

# Install Playwright browsers
uv run playwright install

# Run tests
uv run pytest

# Run linting
uv run ruff check .
uv run ruff format .
```

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## ⚠️ Disclaimer

This tool is for educational and research purposes only. Please respect the terms of service of any websites you scrape and be mindful of rate limits. The authors are not responsible for any misuse of this software.

## 🙋‍♂️ Support

- 📖 [Documentation](https://github.com/yourusername/ita-scrapper#readme)
- 🐛 [Issue Tracker](https://github.com/yourusername/ita-scrapper/issues)
- 💬 [Discussions](https://github.com/yourusername/ita-scrapper/discussions)

## 📊 Stats

- **Language**: Python 3.10+
- **Framework**: Playwright + Pydantic
- **Test Coverage**: 95%+
- **Dependencies**: Minimal, well-maintained
- **Performance**: Async/await optimized

---

Made with ❤️ for travel enthusiasts and developers!

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ita-scrapper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "automation, flight, ita, matrix, playwright, scraping, travel",
    "author": "ITA Scrapper Contributors",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/63/3d/2f1eed82ffe72e4005815533fa363a0cbcdbff73ab42f122ea6994d603b5/ita_scrapper-0.1.1.tar.gz",
    "platform": null,
    "description": "# ITA Scrapper\n\n[![PyPI version](https://badge.fury.io/py/ita-scrapper.svg)](https://badge.fury.io/py/ita-scrapper)\n[![Python versions](https://img.shields.io/pypi/pyversions/ita-scrapper.svg)](https://pypi.org/project/ita-scrapper/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![CI](https://github.com/yourusername/ita-scrapper/workflows/CI/badge.svg)](https://github.com/yourusername/ita-scrapper/actions)\n\nA powerful Python library for scraping ITA Matrix flight data using Playwright. Get flight prices, schedules, and travel information programmatically with a clean, async API.\n\n## \u2728 Features\n\n- \ud83d\udeeb **Flight Search**: Search flights between any airports worldwide\n- \ud83d\udcc5 **Flexible Dates**: Support for one-way, round-trip, and multi-city searches  \n- \ud83d\udcb0 **Price Parsing**: Parse and normalize flight prices from various formats\n- \u23f1\ufe0f **Duration Handling**: Parse flight durations and format them consistently\n- \ud83c\udf0d **Airport Codes**: Validate and normalize IATA/ICAO airport codes\n- \ud83c\udfaf **Type Safety**: Full Pydantic model support with type hints\n- \u26a1 **Async Support**: Built with async/await for high performance\n- \ufffd **Tested**: Comprehensive test suite with 95%+ coverage\n- \ud83d\udda5\ufe0f **CLI Interface**: Command-line tool for quick searches\n- \ud83d\udd27 **MCP Server**: Model Context Protocol server for AI integration\n\n## \ud83d\udce6 Installation\n\n```bash\npip install ita-scrapper\n```\n\nFor development with all extras:\n```bash\npip install ita-scrapper[dev,mcp]\n```\n\n### Install Playwright browsers:\n```bash\nplaywright install chromium\n```\n\n## \ud83d\ude80 Quick Start\n\n### Python API\n\n```python\nimport asyncio\nfrom datetime import date, timedelta\nfrom ita_scrapper import ITAScrapper, CabinClass\n\nasync def search_flights():\n    async with ITAScrapper(headless=True) as scrapper:\n        # Search for flights\n        results = await scrapper.search_flights(\n            origin=\"JFK\",\n            destination=\"LAX\", \n            departure_date=date.today() + timedelta(days=30),\n            return_date=date.today() + timedelta(days=37),\n            adults=2,\n            cabin_class=CabinClass.BUSINESS\n        )\n        \n        # Print results\n        for i, flight in enumerate(results.flights, 1):\n            print(f\"Flight {i}:\")\n            print(f\"  Price: ${flight.price}\")\n            print(f\"  Duration: {flight.duration}\")\n            print(f\"  Stops: {flight.stops}\")\n            print(f\"  Airline: {flight.airline}\")\n            print()\n\n# Run the search\nasyncio.run(search_flights())\n```\n\n### Command Line Interface\n\n```bash\n# Search for flights\nita-scrapper search --origin JFK --destination LAX \\\n    --departure-date 2024-08-15 --return-date 2024-08-22 \\\n    --adults 2 --cabin-class BUSINESS\n\n# Parse flight data\nita-scrapper parse \"2h 30m\" --type duration\nita-scrapper parse \"$1,234.56\" --type price  \nita-scrapper parse \"14:30\" --type time --reference-date 2024-08-15\n\n# Get help\nita-scrapper --help\n```\n\n## \ud83d\udcda Documentation\n\n### Quick Links\n\n- **[\ud83d\udcd6 API Documentation](docs/api.md)** - Complete API reference with examples\n- **[\ud83d\udd27 Developer Guide](docs/developer-guide.md)** - Architecture and extension guide  \n- **[\ud83d\udea8 Troubleshooting](docs/troubleshooting.md)** - Common issues and solutions\n- **[\ud83d\udcca Project Summary](PROJECT_SUMMARY.md)** - High-level project overview\n\n### API Documentation\n\nComprehensive API documentation is available in the [docs/api.md](docs/api.md) file, covering:\n\n- **Core Classes**: ITAScrapper, ITAMatrixParser\n- **Data Models**: Flight, SearchParams, FlightResult\n- **Utility Functions**: Price parsing, duration formatting, validation\n- **Exception Handling**: Complete error handling strategies\n- **Best Practices**: Recommended usage patterns\n\n### Developer Guide\n\nFor developers wanting to extend or contribute to ITA Scrapper, see [docs/developer-guide.md](docs/developer-guide.md):\n\n- **Architecture Overview**: Component design and data flow\n- **Parser Architecture**: Multi-strategy parsing system\n- **Browser Automation**: Playwright integration and anti-detection\n- **Extension Points**: Adding new parsers and data models\n- **Debugging Guide**: Tools and techniques for troubleshooting\n- **Performance Optimization**: Memory and speed optimization\n\n### Troubleshooting\n\nHaving issues? Check [docs/troubleshooting.md](docs/troubleshooting.md) for solutions to:\n\n- **Installation Issues**: Dependencies and browser setup\n- **Website Access**: Blocking, CAPTCHAs, and rate limiting  \n- **Parsing Problems**: Data extraction and validation issues\n- **Performance**: Memory usage and speed optimization\n- **Development Setup**: Environment configuration and debugging\n\n## \ud83d\ude80 Quick Start\n\n### Core Classes\n\n#### ITAScrapper\nMain scraper class for flight searches.\n\n```python\nclass ITAScrapper:\n    def __init__(self, headless: bool = True, timeout: int = 30000):\n        \"\"\"Initialize the scrapper.\"\"\"\n        \n    async def search_flights(\n        self,\n        origin: str,\n        destination: str,\n        departure_date: date,\n        return_date: Optional[date] = None,\n        adults: int = 1,\n        children: int = 0,\n        infants: int = 0,\n        cabin_class: CabinClass = CabinClass.ECONOMY\n    ) -> FlightResult:\n        \"\"\"Search for flights.\"\"\"\n```\n\n#### Models\n\n```python\nfrom ita_scrapper import (\n    Flight,           # Individual flight details\n    FlightResult,     # Search results container\n    SearchParams,     # Search parameters\n    CabinClass,       # Enum for cabin classes\n    TripType,         # Enum for trip types\n    Airport,          # Airport information\n)\n```\n\n### Utility Functions\n\n```python\nfrom ita_scrapper import (\n    parse_price,           # Parse price strings\n    parse_duration,        # Parse duration strings\n    parse_time,            # Parse time strings  \n    validate_airport_code, # Validate airport codes\n    format_duration,       # Format durations\n    is_valid_date_range,   # Validate date ranges\n)\n\n# Examples\nprice = parse_price(\"$1,234.56\")  # Returns Decimal('1234.56')\nduration = parse_duration(\"2h 30m\")  # Returns 150 (minutes)\ncode = validate_airport_code(\"jfk\")  # Returns \"JFK\"\n```\n\n## \ud83c\udfaf Advanced Usage\n\n### Context Manager\n```python\n# Recommended: Use as context manager\nasync with ITAScrapper(headless=True) as scrapper:\n    results = await scrapper.search_flights(...)\n\n# Manual management\nscrapper = ITAScrapper()\nawait scrapper.start()\ntry:\n    results = await scrapper.search_flights(...)\nfinally:\n    await scrapper.close()\n```\n\n### Error Handling\n```python\nfrom ita_scrapper import ITAScrapperError, NavigationError, TimeoutError\n\ntry:\n    async with ITAScrapper() as scrapper:\n        results = await scrapper.search_flights(...)\nexcept NavigationError:\n    print(\"Failed to navigate to search page\")\nexcept TimeoutError:\n    print(\"Search timed out\")\nexcept ITAScrapperError as e:\n    print(f\"General error: {e}\")\n```\n\n### Custom Configuration\n```python\nscrapper = ITAScrapper(\n    headless=False,        # Show browser window\n    timeout=60000,         # 60 second timeout\n)\n```\n\n## \ud83e\uddea Testing\n\nRun the test suite:\n\n```bash\n# All tests\npytest\n\n# Unit tests only (fast)  \npytest -m \"not slow\"\n\n# Integration tests (slow, requires browser)\npytest -m slow\n\n# With coverage\npytest --cov=src/ita_scrapper --cov-report=html\n```\n\n## \ud83d\udd27 MCP Server\n\nUse ITA Scrapper as a Model Context Protocol server:\n\n```python\n# Install MCP support\npip install ita-scrapper[mcp]\n\n# Create MCP server (see examples/mcp_integration.py)\nfrom ita_scrapper.mcp import create_mcp_server\nserver = create_mcp_server()\n```\n\nConfigure in Claude Desktop:\n```json\n{\n  \"mcpServers\": {\n    \"ita-scrapper\": {\n      \"command\": \"python\",\n      \"args\": [\"/path/to/ita_scrapper_mcp_server.py\"]\n    }\n  }\n}\n```\n\n## \ud83c\udf1f Examples\n\nCheck out the `/examples` directory for more usage examples:\n\n- `basic_usage.py` - Simple flight search\n- `demo_usage.py` - Interactive demo\n- `matrix_examples.py` - Advanced search patterns\n- `mcp_integration.py` - MCP server setup\n- `test_real_sites.py` - Real-world testing\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\n\n### Development Setup\n\n```bash\n# Clone repository\ngit clone https://github.com/yourusername/ita-scrapper.git\ncd ita-scrapper\n\n# Install with uv (recommended)\nuv sync --all-extras\n\n# Install Playwright browsers\nuv run playwright install\n\n# Run tests\nuv run pytest\n\n# Run linting\nuv run ruff check .\nuv run ruff format .\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \u26a0\ufe0f Disclaimer\n\nThis tool is for educational and research purposes only. Please respect the terms of service of any websites you scrape and be mindful of rate limits. The authors are not responsible for any misuse of this software.\n\n## \ud83d\ude4b\u200d\u2642\ufe0f Support\n\n- \ud83d\udcd6 [Documentation](https://github.com/yourusername/ita-scrapper#readme)\n- \ud83d\udc1b [Issue Tracker](https://github.com/yourusername/ita-scrapper/issues)\n- \ud83d\udcac [Discussions](https://github.com/yourusername/ita-scrapper/discussions)\n\n## \ud83d\udcca Stats\n\n- **Language**: Python 3.10+\n- **Framework**: Playwright + Pydantic\n- **Test Coverage**: 95%+\n- **Dependencies**: Minimal, well-maintained\n- **Performance**: Async/await optimized\n\n---\n\nMade with \u2764\ufe0f for travel enthusiasts and developers!\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python library for scraping ITA Matrix travel website using Playwright",
    "version": "0.1.1",
    "project_urls": {
        "Changelog": "https://github.com/yourusername/ita-scrapper/releases",
        "Documentation": "https://github.com/yourusername/ita-scrapper#readme",
        "Homepage": "https://github.com/yourusername/ita-scrapper",
        "Issues": "https://github.com/yourusername/ita-scrapper/issues",
        "Repository": "https://github.com/yourusername/ita-scrapper"
    },
    "split_keywords": [
        "automation",
        " flight",
        " ita",
        " matrix",
        " playwright",
        " scraping",
        " travel"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8838b7e6e9a67cf64e40153065c2c536eb8d7e3816352d84baeb38d4d849fe43",
                "md5": "4bb17096af98e748f9850e50e9e4e33e",
                "sha256": "499f562bef528aef42238856c7b15ea06759513f41fa85a6ad535f731157c922"
            },
            "downloads": -1,
            "filename": "ita_scrapper-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4bb17096af98e748f9850e50e9e4e33e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 272349,
            "upload_time": "2025-07-27T02:23:14",
            "upload_time_iso_8601": "2025-07-27T02:23:14.490447Z",
            "url": "https://files.pythonhosted.org/packages/88/38/b7e6e9a67cf64e40153065c2c536eb8d7e3816352d84baeb38d4d849fe43/ita_scrapper-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "633d2f1eed82ffe72e4005815533fa363a0cbcdbff73ab42f122ea6994d603b5",
                "md5": "de630e760533e61f93e1ac00fdb28c3c",
                "sha256": "097014bb20e4bebd3f5e814512b6a6a159aa5a3aa99a8b8944029a2c8a5664f9"
            },
            "downloads": -1,
            "filename": "ita_scrapper-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "de630e760533e61f93e1ac00fdb28c3c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 374096,
            "upload_time": "2025-07-27T02:23:15",
            "upload_time_iso_8601": "2025-07-27T02:23:15.863566Z",
            "url": "https://files.pythonhosted.org/packages/63/3d/2f1eed82ffe72e4005815533fa363a0cbcdbff73ab42f122ea6994d603b5/ita_scrapper-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-27 02:23:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yourusername",
    "github_project": "ita-scrapper",
    "github_not_found": true,
    "lcname": "ita-scrapper"
}

ITA Scrapper Contributors