# Scrapy Item Ingest
[](https://pypi.org/project/scrapy-item-ingest/)
[](https://pypi.org/project/scrapy-item-ingest/)
[](https://pypi.org/project/scrapy-item-ingest/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/fawadss1/scrapy_item_ingest/stargazers)
[](https://github.com/fawadss1/scrapy_item_ingest/issues)
[](https://github.com/fawadss1/scrapy_item_ingest/commits)
A comprehensive Scrapy extension for ingesting scraped items, requests, and logs into PostgreSQL databases with advanced tracking capabilities. This library provides a clean, production-ready solution for storing and monitoring your Scrapy crawling operations with real-time data ingestion and comprehensive logging.
## Documentation
Full documentation is available at: [https://scrapy-item-ingest.readthedocs.io/en/latest/](https://scrapy-item-ingest.readthedocs.io/en/latest/)
## Key Features
- 🔄 **Real-time Data Ingestion**: Store items, requests, and logs as they're processed
- 📊 **Request Tracking**: Track request response times, fingerprints, and parent-child relationships
- 🔍 **Comprehensive Logging**: Capture spider events, errors, and custom messages
- 🏗️ **Flexible Schema**: Support for both auto-creation and existing table modes
- ⚙️ **Modular Design**: Use individual components or the complete pipeline
- 🛡️ **Production Ready**: Handles both development and production scenarios
- 📝 **JSONB Storage**: Store complex item data as JSONB for flexible querying
- 🐳 **Docker Support**: Complete containerization with Docker and Kubernetes
- 📈 **Performance Optimized**: Connection pooling and batch processing
- 🔧 **Easy Configuration**: Environment-based configuration with validation
- 📊 **Monitoring Ready**: Built-in metrics and health checks
## Installation
```bash
pip install scrapy-item-ingest
```
## Development
### Setting up for Development
```bash
git clone https://github.com/fawadss1/scrapy_item_ingest.git
cd scrapy_item_ingest
pip install -e ".[dev]"
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Support
For support and questions:
- **Email**: fawadstar6@gmail.com
- **Documentation**: [https://scrapy-item-ingest.readthedocs.io/](https://scrapy-item-ingest.readthedocs.io/)
- **Issues**: Please report bugs and feature requests at [GitHub Issues](https://github.com/fawadss1/scrapy_item_ingest/issues)
## Changelog
### v0.1.2 (Current)
- Initial release
- Core pipeline functionality for items, requests, and logs
- PostgreSQL database integration with JSONB storage
- Comprehensive documentation and examples
- Production deployment guides
- Docker and Kubernetes support
Raw data
{
"_id": null,
"home_page": "https://github.com/fawadss1/scrapy_item_ingest",
"name": "scrapy-item-ingest",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "scrapy, database, postgresql, web-scraping, data-pipeline",
"author": "Fawad Ali",
"author_email": "fawadstar6@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/ba/bf/58b7fcb6f3bd0451ab76bbbfc6a6ac61ae23fd8b4d4cee17ece6c3cc991b/scrapy_item_ingest-0.1.2.tar.gz",
"platform": null,
"description": "# Scrapy Item Ingest\r\n\r\n[](https://pypi.org/project/scrapy-item-ingest/)\r\n[](https://pypi.org/project/scrapy-item-ingest/)\r\n[](https://pypi.org/project/scrapy-item-ingest/)\r\n[](https://opensource.org/licenses/MIT)\r\n\r\n[](https://github.com/fawadss1/scrapy_item_ingest/stargazers)\r\n[](https://github.com/fawadss1/scrapy_item_ingest/issues)\r\n[](https://github.com/fawadss1/scrapy_item_ingest/commits)\r\n\r\nA comprehensive Scrapy extension for ingesting scraped items, requests, and logs into PostgreSQL databases with advanced tracking capabilities. This library provides a clean, production-ready solution for storing and monitoring your Scrapy crawling operations with real-time data ingestion and comprehensive logging.\r\n\r\n## Documentation\r\n\r\nFull documentation is available at: [https://scrapy-item-ingest.readthedocs.io/en/latest/](https://scrapy-item-ingest.readthedocs.io/en/latest/)\r\n\r\n## Key Features\r\n\r\n- \ud83d\udd04 **Real-time Data Ingestion**: Store items, requests, and logs as they're processed\r\n- \ud83d\udcca **Request Tracking**: Track request response times, fingerprints, and parent-child relationships\r\n- \ud83d\udd0d **Comprehensive Logging**: Capture spider events, errors, and custom messages\r\n- \ud83c\udfd7\ufe0f **Flexible Schema**: Support for both auto-creation and existing table modes\r\n- \u2699\ufe0f **Modular Design**: Use individual components or the complete pipeline\r\n- \ud83d\udee1\ufe0f **Production Ready**: Handles both development and production scenarios\r\n- \ud83d\udcdd **JSONB Storage**: Store complex item data as JSONB for flexible querying\r\n- \ud83d\udc33 **Docker Support**: Complete containerization with Docker and Kubernetes\r\n- \ud83d\udcc8 **Performance Optimized**: Connection pooling and batch processing\r\n- \ud83d\udd27 **Easy Configuration**: Environment-based configuration with validation\r\n- \ud83d\udcca **Monitoring Ready**: Built-in metrics and health checks\r\n\r\n## Installation\r\n\r\n```bash\r\npip install scrapy-item-ingest\r\n```\r\n\r\n## Development\r\n\r\n### Setting up for Development\r\n\r\n```bash\r\ngit clone https://github.com/fawadss1/scrapy_item_ingest.git\r\ncd scrapy_item_ingest\r\npip install -e \".[dev]\"\r\n```\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## Support\r\n\r\nFor support and questions:\r\n\r\n- **Email**: fawadstar6@gmail.com\r\n- **Documentation**: [https://scrapy-item-ingest.readthedocs.io/](https://scrapy-item-ingest.readthedocs.io/)\r\n- **Issues**: Please report bugs and feature requests at [GitHub Issues](https://github.com/fawadss1/scrapy_item_ingest/issues)\r\n\r\n## Changelog\r\n\r\n### v0.1.2 (Current)\r\n\r\n- Initial release\r\n- Core pipeline functionality for items, requests, and logs\r\n- PostgreSQL database integration with JSONB storage\r\n- Comprehensive documentation and examples\r\n- Production deployment guides\r\n- Docker and Kubernetes support\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Scrapy extension for database ingestion with job/spider tracking",
"version": "0.1.2",
"project_urls": {
"Documentation": "https://scrapy-item-ingest.readthedocs.io/",
"Homepage": "https://github.com/fawadss1/scrapy_item_ingest",
"Source": "https://github.com/fawadss1/scrapy_item_ingest",
"Tracker": "https://github.com/fawadss1/scrapy_item_ingest/issues"
},
"split_keywords": [
"scrapy",
" database",
" postgresql",
" web-scraping",
" data-pipeline"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "7d2ec4870c6e94fd41fd5758ed2205c436091a9a5181f53ce17503174df2b7c4",
"md5": "81c39490ac28a74238e0b60232626f9f",
"sha256": "8da3188a4a84978588cb9bbace4aa1e619833ae92d2696e6cc9e97eeb28ec9d5"
},
"downloads": -1,
"filename": "scrapy_item_ingest-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "81c39490ac28a74238e0b60232626f9f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 17206,
"upload_time": "2025-07-22T13:34:02",
"upload_time_iso_8601": "2025-07-22T13:34:02.783947Z",
"url": "https://files.pythonhosted.org/packages/7d/2e/c4870c6e94fd41fd5758ed2205c436091a9a5181f53ce17503174df2b7c4/scrapy_item_ingest-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "babf58b7fcb6f3bd0451ab76bbbfc6a6ac61ae23fd8b4d4cee17ece6c3cc991b",
"md5": "6f7d370f3e7bba87ef6b0af4fc69bacf",
"sha256": "97cb0abfe8f486b04f8a3eb8b54bcd357d19778ba6688710944b8a6ce704e769"
},
"downloads": -1,
"filename": "scrapy_item_ingest-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "6f7d370f3e7bba87ef6b0af4fc69bacf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 13941,
"upload_time": "2025-07-22T13:34:03",
"upload_time_iso_8601": "2025-07-22T13:34:03.805138Z",
"url": "https://files.pythonhosted.org/packages/ba/bf/58b7fcb6f3bd0451ab76bbbfc6a6ac61ae23fd8b4d4cee17ece6c3cc991b/scrapy_item_ingest-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-22 13:34:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fawadss1",
"github_project": "scrapy_item_ingest",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "scrapy-item-ingest"
}