scrapy-item-ingest


Namescrapy-item-ingest JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/fawadss1/scrapy_item_ingest
SummaryScrapy extension for database ingestion with job/spider tracking
upload_time2025-07-22 13:34:03
maintainerNone
docs_urlNone
authorFawad Ali
requires_python>=3.7
licenseNone
keywords scrapy database postgresql web-scraping data-pipeline
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Scrapy Item Ingest

[![PyPI Version](https://img.shields.io/pypi/v/scrapy-item-ingest.svg)](https://pypi.org/project/scrapy-item-ingest/)
[![PyPI Downloads](https://img.shields.io/pypi/dm/scrapy-item-ingest.svg)](https://pypi.org/project/scrapy-item-ingest/)
[![Supported Python Versions](https://img.shields.io/pypi/pyversions/scrapy-item-ingest.svg)](https://pypi.org/project/scrapy-item-ingest/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

[![GitHub Stars](https://img.shields.io/github/stars/fawadss1/scrapy_item_ingest.svg)](https://github.com/fawadss1/scrapy_item_ingest/stargazers)
[![GitHub Issues](https://img.shields.io/github/issues/fawadss1/scrapy_item_ingest.svg)](https://github.com/fawadss1/scrapy_item_ingest/issues)
[![GitHub Last Commit](https://img.shields.io/github/last-commit/fawadss1/scrapy_item_ingest.svg)](https://github.com/fawadss1/scrapy_item_ingest/commits)

A comprehensive Scrapy extension for ingesting scraped items, requests, and logs into PostgreSQL databases with advanced tracking capabilities. This library provides a clean, production-ready solution for storing and monitoring your Scrapy crawling operations with real-time data ingestion and comprehensive logging.

## Documentation

Full documentation is available at: [https://scrapy-item-ingest.readthedocs.io/en/latest/](https://scrapy-item-ingest.readthedocs.io/en/latest/)

## Key Features

- 🔄 **Real-time Data Ingestion**: Store items, requests, and logs as they're processed
- 📊 **Request Tracking**: Track request response times, fingerprints, and parent-child relationships
- 🔍 **Comprehensive Logging**: Capture spider events, errors, and custom messages
- 🏗️ **Flexible Schema**: Support for both auto-creation and existing table modes
- ⚙️ **Modular Design**: Use individual components or the complete pipeline
- 🛡️ **Production Ready**: Handles both development and production scenarios
- 📝 **JSONB Storage**: Store complex item data as JSONB for flexible querying
- 🐳 **Docker Support**: Complete containerization with Docker and Kubernetes
- 📈 **Performance Optimized**: Connection pooling and batch processing
- 🔧 **Easy Configuration**: Environment-based configuration with validation
- 📊 **Monitoring Ready**: Built-in metrics and health checks

## Installation

```bash
pip install scrapy-item-ingest
```

## Development

### Setting up for Development

```bash
git clone https://github.com/fawadss1/scrapy_item_ingest.git
cd scrapy_item_ingest
pip install -e ".[dev]"
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Support

For support and questions:

- **Email**: fawadstar6@gmail.com
- **Documentation**: [https://scrapy-item-ingest.readthedocs.io/](https://scrapy-item-ingest.readthedocs.io/)
- **Issues**: Please report bugs and feature requests at [GitHub Issues](https://github.com/fawadss1/scrapy_item_ingest/issues)

## Changelog

### v0.1.2 (Current)

- Initial release
- Core pipeline functionality for items, requests, and logs
- PostgreSQL database integration with JSONB storage
- Comprehensive documentation and examples
- Production deployment guides
- Docker and Kubernetes support

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/fawadss1/scrapy_item_ingest",
    "name": "scrapy-item-ingest",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "scrapy, database, postgresql, web-scraping, data-pipeline",
    "author": "Fawad Ali",
    "author_email": "fawadstar6@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ba/bf/58b7fcb6f3bd0451ab76bbbfc6a6ac61ae23fd8b4d4cee17ece6c3cc991b/scrapy_item_ingest-0.1.2.tar.gz",
    "platform": null,
    "description": "# Scrapy Item Ingest\r\n\r\n[![PyPI Version](https://img.shields.io/pypi/v/scrapy-item-ingest.svg)](https://pypi.org/project/scrapy-item-ingest/)\r\n[![PyPI Downloads](https://img.shields.io/pypi/dm/scrapy-item-ingest.svg)](https://pypi.org/project/scrapy-item-ingest/)\r\n[![Supported Python Versions](https://img.shields.io/pypi/pyversions/scrapy-item-ingest.svg)](https://pypi.org/project/scrapy-item-ingest/)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n\r\n[![GitHub Stars](https://img.shields.io/github/stars/fawadss1/scrapy_item_ingest.svg)](https://github.com/fawadss1/scrapy_item_ingest/stargazers)\r\n[![GitHub Issues](https://img.shields.io/github/issues/fawadss1/scrapy_item_ingest.svg)](https://github.com/fawadss1/scrapy_item_ingest/issues)\r\n[![GitHub Last Commit](https://img.shields.io/github/last-commit/fawadss1/scrapy_item_ingest.svg)](https://github.com/fawadss1/scrapy_item_ingest/commits)\r\n\r\nA comprehensive Scrapy extension for ingesting scraped items, requests, and logs into PostgreSQL databases with advanced tracking capabilities. This library provides a clean, production-ready solution for storing and monitoring your Scrapy crawling operations with real-time data ingestion and comprehensive logging.\r\n\r\n## Documentation\r\n\r\nFull documentation is available at: [https://scrapy-item-ingest.readthedocs.io/en/latest/](https://scrapy-item-ingest.readthedocs.io/en/latest/)\r\n\r\n## Key Features\r\n\r\n- \ud83d\udd04 **Real-time Data Ingestion**: Store items, requests, and logs as they're processed\r\n- \ud83d\udcca **Request Tracking**: Track request response times, fingerprints, and parent-child relationships\r\n- \ud83d\udd0d **Comprehensive Logging**: Capture spider events, errors, and custom messages\r\n- \ud83c\udfd7\ufe0f **Flexible Schema**: Support for both auto-creation and existing table modes\r\n- \u2699\ufe0f **Modular Design**: Use individual components or the complete pipeline\r\n- \ud83d\udee1\ufe0f **Production Ready**: Handles both development and production scenarios\r\n- \ud83d\udcdd **JSONB Storage**: Store complex item data as JSONB for flexible querying\r\n- \ud83d\udc33 **Docker Support**: Complete containerization with Docker and Kubernetes\r\n- \ud83d\udcc8 **Performance Optimized**: Connection pooling and batch processing\r\n- \ud83d\udd27 **Easy Configuration**: Environment-based configuration with validation\r\n- \ud83d\udcca **Monitoring Ready**: Built-in metrics and health checks\r\n\r\n## Installation\r\n\r\n```bash\r\npip install scrapy-item-ingest\r\n```\r\n\r\n## Development\r\n\r\n### Setting up for Development\r\n\r\n```bash\r\ngit clone https://github.com/fawadss1/scrapy_item_ingest.git\r\ncd scrapy_item_ingest\r\npip install -e \".[dev]\"\r\n```\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## Support\r\n\r\nFor support and questions:\r\n\r\n- **Email**: fawadstar6@gmail.com\r\n- **Documentation**: [https://scrapy-item-ingest.readthedocs.io/](https://scrapy-item-ingest.readthedocs.io/)\r\n- **Issues**: Please report bugs and feature requests at [GitHub Issues](https://github.com/fawadss1/scrapy_item_ingest/issues)\r\n\r\n## Changelog\r\n\r\n### v0.1.2 (Current)\r\n\r\n- Initial release\r\n- Core pipeline functionality for items, requests, and logs\r\n- PostgreSQL database integration with JSONB storage\r\n- Comprehensive documentation and examples\r\n- Production deployment guides\r\n- Docker and Kubernetes support\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Scrapy extension for database ingestion with job/spider tracking",
    "version": "0.1.2",
    "project_urls": {
        "Documentation": "https://scrapy-item-ingest.readthedocs.io/",
        "Homepage": "https://github.com/fawadss1/scrapy_item_ingest",
        "Source": "https://github.com/fawadss1/scrapy_item_ingest",
        "Tracker": "https://github.com/fawadss1/scrapy_item_ingest/issues"
    },
    "split_keywords": [
        "scrapy",
        " database",
        " postgresql",
        " web-scraping",
        " data-pipeline"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7d2ec4870c6e94fd41fd5758ed2205c436091a9a5181f53ce17503174df2b7c4",
                "md5": "81c39490ac28a74238e0b60232626f9f",
                "sha256": "8da3188a4a84978588cb9bbace4aa1e619833ae92d2696e6cc9e97eeb28ec9d5"
            },
            "downloads": -1,
            "filename": "scrapy_item_ingest-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "81c39490ac28a74238e0b60232626f9f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 17206,
            "upload_time": "2025-07-22T13:34:02",
            "upload_time_iso_8601": "2025-07-22T13:34:02.783947Z",
            "url": "https://files.pythonhosted.org/packages/7d/2e/c4870c6e94fd41fd5758ed2205c436091a9a5181f53ce17503174df2b7c4/scrapy_item_ingest-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "babf58b7fcb6f3bd0451ab76bbbfc6a6ac61ae23fd8b4d4cee17ece6c3cc991b",
                "md5": "6f7d370f3e7bba87ef6b0af4fc69bacf",
                "sha256": "97cb0abfe8f486b04f8a3eb8b54bcd357d19778ba6688710944b8a6ce704e769"
            },
            "downloads": -1,
            "filename": "scrapy_item_ingest-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "6f7d370f3e7bba87ef6b0af4fc69bacf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 13941,
            "upload_time": "2025-07-22T13:34:03",
            "upload_time_iso_8601": "2025-07-22T13:34:03.805138Z",
            "url": "https://files.pythonhosted.org/packages/ba/bf/58b7fcb6f3bd0451ab76bbbfc6a6ac61ae23fd8b4d4cee17ece6c3cc991b/scrapy_item_ingest-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-22 13:34:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fawadss1",
    "github_project": "scrapy_item_ingest",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "scrapy-item-ingest"
}
        
Elapsed time: 0.73127s