# Zerve Data Platform
An enterprise-grade ETL and data processing platform for automated e-commerce data extraction, AI-powered enrichment, and pipeline orchestration.
## Features
- **Multi-stage Pipeline Framework** - Orchestrate complex ETL workflows with checkpointing and progress tracking
- **Web Scraping Automation** - Selenium-based browser automation for e-commerce sites
- **AI-Powered Data Enrichment** - Multiple LLM provider support (OpenAI, Google Gemini, Ollama, HuggingFace)
- **Cloud Integration** - AWS S3 and Spark data lake support
- **Database Connectors** - PostgreSQL and Spark SQL with auto-schema generation
- **Distributed Processing** - Apache Spark for big data ETL workflows
## Installation
### Development Installation
```bash
# Clone the repository
git clone https://github.com/zerveme/zervemedata.git
cd zervedataplatform
# Install in editable mode with development dependencies
pip install -e ".[dev]"
```
### Production Installation
```bash
pip install zervedataplatform
```
## Quick Start
### Import the package
```python
from pipeline import DataPipeline, DataConnectorBase
from connectors.ai import GenAIManager
from connectors.sql_connectors import PostgresSqlConnector
from connectors.cloud_storage_connectors import S3CloudConnector
from utils import Utility
# Configure your pipeline
config = Utility.read_in_json_file("config.json")
# Create AI connector
ai_manager = GenAIManager(config["ai_config"])
# Create database connector
db = PostgresSqlConnector(config["db_config"])
# Create and run pipeline
pipeline = DataPipeline()
# ... add your jobs
pipeline.run_data_pipeline()
```
## Architecture
```
zervedataplatform/
├── abstractions/ # Abstract base classes and interfaces
├── connectors/ # Database, cloud, and AI connectors
│ ├── ai/ # OpenAI, Gemini, LangChain, Google Vision
│ ├── sql_connectors/ # PostgreSQL, Spark SQL
│ └── cloud_storage_connectors/ # S3, Spark Cloud
├── pipeline/ # Pipeline orchestration framework
├── model_transforms/ # Database models and schemas
├── utils/ # Utilities and helpers
└── test/ # Unit tests
```
## Key Components
### Pipeline Framework
- **5-Stage Execution**: `initialize → pre_validate → read → main → output`
- **Activity Logging**: JSON-based progress tracking with hierarchical structure
- **Checkpoint/Resume**: Resume long-running pipelines from failure points
### AI Connectors
- **Multi-Provider Support**: OpenAI, Google Gemini, Ollama (local), HuggingFace
- **Unified Interface**: LangChain abstraction layer
- **Auto-Detection**: Configuration-driven provider selection
### Data Processing
- **Spark Integration**: Distributed processing for large datasets
- **Pandas/Spark**: Seamless DataFrame conversions
- **ETL Utilities**: High-level operations for common ETL tasks
## Configuration
Create configuration files in `default_configs/`:
```json
// configuration.json
{
"db_config": "default_configs/db_config.json",
"run_config": "default_configs/run.json",
"ai_api_config": "default_configs/google_api_config.json",
"web_config": "default_configs/web_config.json",
"cloud_config": "default_configs/s3_config.json"
}
```
See the `default_configs/` directory for configuration examples.
## Requirements
- Python 3.11+
- Apache Spark 3.5.2
- PostgreSQL (optional, for SQL connector)
- AWS credentials (optional, for S3 connector)
- Google Cloud credentials (optional, for Vision API)
## Development
```bash
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run tests with coverage
pytest --cov=. --cov-report=html
# Format code
black .
# Lint code
flake8
```
## License
Proprietary - © 2025 Zerveme
## Support
For issues and questions, please contact: support@zerveme.com
Raw data
{
"_id": null,
"home_page": null,
"name": "zervedataplatform",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "etl, data-pipeline, web-scraping, ai, e-commerce, spark, llm",
"author": null,
"author_email": "Zerveme <noreply@zerveme.com>",
"download_url": "https://files.pythonhosted.org/packages/a9/34/cb467ef0b1676ef7ced6c6e799673dd127bc912f622ed310cca09cb90823/zervedataplatform-0.1.1.tar.gz",
"platform": null,
"description": "# Zerve Data Platform\n\nAn enterprise-grade ETL and data processing platform for automated e-commerce data extraction, AI-powered enrichment, and pipeline orchestration.\n\n## Features\n\n- **Multi-stage Pipeline Framework** - Orchestrate complex ETL workflows with checkpointing and progress tracking\n- **Web Scraping Automation** - Selenium-based browser automation for e-commerce sites\n- **AI-Powered Data Enrichment** - Multiple LLM provider support (OpenAI, Google Gemini, Ollama, HuggingFace)\n- **Cloud Integration** - AWS S3 and Spark data lake support\n- **Database Connectors** - PostgreSQL and Spark SQL with auto-schema generation\n- **Distributed Processing** - Apache Spark for big data ETL workflows\n\n## Installation\n\n### Development Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/zerveme/zervemedata.git\ncd zervedataplatform\n\n# Install in editable mode with development dependencies\npip install -e \".[dev]\"\n```\n\n### Production Installation\n\n```bash\npip install zervedataplatform\n```\n\n## Quick Start\n\n### Import the package\n\n```python\nfrom pipeline import DataPipeline, DataConnectorBase\nfrom connectors.ai import GenAIManager\nfrom connectors.sql_connectors import PostgresSqlConnector\nfrom connectors.cloud_storage_connectors import S3CloudConnector\nfrom utils import Utility\n\n# Configure your pipeline\nconfig = Utility.read_in_json_file(\"config.json\")\n\n# Create AI connector\nai_manager = GenAIManager(config[\"ai_config\"])\n\n# Create database connector\ndb = PostgresSqlConnector(config[\"db_config\"])\n\n# Create and run pipeline\npipeline = DataPipeline()\n# ... add your jobs\npipeline.run_data_pipeline()\n```\n\n## Architecture\n\n```\nzervedataplatform/\n\u251c\u2500\u2500 abstractions/ # Abstract base classes and interfaces\n\u251c\u2500\u2500 connectors/ # Database, cloud, and AI connectors\n\u2502 \u251c\u2500\u2500 ai/ # OpenAI, Gemini, LangChain, Google Vision\n\u2502 \u251c\u2500\u2500 sql_connectors/ # PostgreSQL, Spark SQL\n\u2502 \u2514\u2500\u2500 cloud_storage_connectors/ # S3, Spark Cloud\n\u251c\u2500\u2500 pipeline/ # Pipeline orchestration framework\n\u251c\u2500\u2500 model_transforms/ # Database models and schemas\n\u251c\u2500\u2500 utils/ # Utilities and helpers\n\u2514\u2500\u2500 test/ # Unit tests\n```\n\n## Key Components\n\n### Pipeline Framework\n- **5-Stage Execution**: `initialize \u2192 pre_validate \u2192 read \u2192 main \u2192 output`\n- **Activity Logging**: JSON-based progress tracking with hierarchical structure\n- **Checkpoint/Resume**: Resume long-running pipelines from failure points\n\n### AI Connectors\n- **Multi-Provider Support**: OpenAI, Google Gemini, Ollama (local), HuggingFace\n- **Unified Interface**: LangChain abstraction layer\n- **Auto-Detection**: Configuration-driven provider selection\n\n### Data Processing\n- **Spark Integration**: Distributed processing for large datasets\n- **Pandas/Spark**: Seamless DataFrame conversions\n- **ETL Utilities**: High-level operations for common ETL tasks\n\n## Configuration\n\nCreate configuration files in `default_configs/`:\n\n```json\n// configuration.json\n{\n \"db_config\": \"default_configs/db_config.json\",\n \"run_config\": \"default_configs/run.json\",\n \"ai_api_config\": \"default_configs/google_api_config.json\",\n \"web_config\": \"default_configs/web_config.json\",\n \"cloud_config\": \"default_configs/s3_config.json\"\n}\n```\n\nSee the `default_configs/` directory for configuration examples.\n\n## Requirements\n\n- Python 3.11+\n- Apache Spark 3.5.2\n- PostgreSQL (optional, for SQL connector)\n- AWS credentials (optional, for S3 connector)\n- Google Cloud credentials (optional, for Vision API)\n\n## Development\n\n```bash\n# Install development dependencies\npip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Run tests with coverage\npytest --cov=. --cov-report=html\n\n# Format code\nblack .\n\n# Lint code\nflake8\n```\n\n## License\n\nProprietary - \u00a9 2025 Zerveme\n\n## Support\n\nFor issues and questions, please contact: support@zerveme.com\n",
"bugtrack_url": null,
"license": null,
"summary": "E-commerce data extraction and processing platform with AI-powered enrichment",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/zerveme/zervemedata",
"Repository": "https://github.com/zerveme/zervemedata"
},
"split_keywords": [
"etl",
" data-pipeline",
" web-scraping",
" ai",
" e-commerce",
" spark",
" llm"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "7960edb7c4aabf91f126a888d626d2ac6f33e279f6ff4134fb371e516dc6fad7",
"md5": "c4167fa01639f1bc97b77c0082a781db",
"sha256": "f18094ea05f49af1d3a8fcc02d9b66a9db8714efe643f1a468a5985b6a556a03"
},
"downloads": -1,
"filename": "zervedataplatform-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c4167fa01639f1bc97b77c0082a781db",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 74049,
"upload_time": "2025-10-25T03:26:49",
"upload_time_iso_8601": "2025-10-25T03:26:49.456165Z",
"url": "https://files.pythonhosted.org/packages/79/60/edb7c4aabf91f126a888d626d2ac6f33e279f6ff4134fb371e516dc6fad7/zervedataplatform-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a934cb467ef0b1676ef7ced6c6e799673dd127bc912f622ed310cca09cb90823",
"md5": "1bbc803a4f3c9c3bc91dea0fc45ca2c4",
"sha256": "702631b84637d7ac5632d0a6f870b9c7e0356799db4318818cfe2b8f5a105bdf"
},
"downloads": -1,
"filename": "zervedataplatform-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "1bbc803a4f3c9c3bc91dea0fc45ca2c4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 91803,
"upload_time": "2025-10-25T03:26:50",
"upload_time_iso_8601": "2025-10-25T03:26:50.804930Z",
"url": "https://files.pythonhosted.org/packages/a9/34/cb467ef0b1676ef7ced6c6e799673dd127bc912f622ed310cca09cb90823/zervedataplatform-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-25 03:26:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "zerveme",
"github_project": "zervemedata",
"github_not_found": true,
"lcname": "zervedataplatform"
}