zervedataplatform


Namezervedataplatform JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryE-commerce data extraction and processing platform with AI-powered enrichment
upload_time2025-10-25 03:26:50
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords etl data-pipeline web-scraping ai e-commerce spark llm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Zerve Data Platform

An enterprise-grade ETL and data processing platform for automated e-commerce data extraction, AI-powered enrichment, and pipeline orchestration.

## Features

- **Multi-stage Pipeline Framework** - Orchestrate complex ETL workflows with checkpointing and progress tracking
- **Web Scraping Automation** - Selenium-based browser automation for e-commerce sites
- **AI-Powered Data Enrichment** - Multiple LLM provider support (OpenAI, Google Gemini, Ollama, HuggingFace)
- **Cloud Integration** - AWS S3 and Spark data lake support
- **Database Connectors** - PostgreSQL and Spark SQL with auto-schema generation
- **Distributed Processing** - Apache Spark for big data ETL workflows

## Installation

### Development Installation

```bash
# Clone the repository
git clone https://github.com/zerveme/zervemedata.git
cd zervedataplatform

# Install in editable mode with development dependencies
pip install -e ".[dev]"
```

### Production Installation

```bash
pip install zervedataplatform
```

## Quick Start

### Import the package

```python
from pipeline import DataPipeline, DataConnectorBase
from connectors.ai import GenAIManager
from connectors.sql_connectors import PostgresSqlConnector
from connectors.cloud_storage_connectors import S3CloudConnector
from utils import Utility

# Configure your pipeline
config = Utility.read_in_json_file("config.json")

# Create AI connector
ai_manager = GenAIManager(config["ai_config"])

# Create database connector
db = PostgresSqlConnector(config["db_config"])

# Create and run pipeline
pipeline = DataPipeline()
# ... add your jobs
pipeline.run_data_pipeline()
```

## Architecture

```
zervedataplatform/
├── abstractions/          # Abstract base classes and interfaces
├── connectors/           # Database, cloud, and AI connectors
│   ├── ai/              # OpenAI, Gemini, LangChain, Google Vision
│   ├── sql_connectors/  # PostgreSQL, Spark SQL
│   └── cloud_storage_connectors/  # S3, Spark Cloud
├── pipeline/            # Pipeline orchestration framework
├── model_transforms/    # Database models and schemas
├── utils/              # Utilities and helpers
└── test/               # Unit tests
```

## Key Components

### Pipeline Framework
- **5-Stage Execution**: `initialize → pre_validate → read → main → output`
- **Activity Logging**: JSON-based progress tracking with hierarchical structure
- **Checkpoint/Resume**: Resume long-running pipelines from failure points

### AI Connectors
- **Multi-Provider Support**: OpenAI, Google Gemini, Ollama (local), HuggingFace
- **Unified Interface**: LangChain abstraction layer
- **Auto-Detection**: Configuration-driven provider selection

### Data Processing
- **Spark Integration**: Distributed processing for large datasets
- **Pandas/Spark**: Seamless DataFrame conversions
- **ETL Utilities**: High-level operations for common ETL tasks

## Configuration

Create configuration files in `default_configs/`:

```json
// configuration.json
{
  "db_config": "default_configs/db_config.json",
  "run_config": "default_configs/run.json",
  "ai_api_config": "default_configs/google_api_config.json",
  "web_config": "default_configs/web_config.json",
  "cloud_config": "default_configs/s3_config.json"
}
```

See the `default_configs/` directory for configuration examples.

## Requirements

- Python 3.11+
- Apache Spark 3.5.2
- PostgreSQL (optional, for SQL connector)
- AWS credentials (optional, for S3 connector)
- Google Cloud credentials (optional, for Vision API)

## Development

```bash
# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=. --cov-report=html

# Format code
black .

# Lint code
flake8
```

## License

Proprietary - © 2025 Zerveme

## Support

For issues and questions, please contact: support@zerveme.com

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "zervedataplatform",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "etl, data-pipeline, web-scraping, ai, e-commerce, spark, llm",
    "author": null,
    "author_email": "Zerveme <noreply@zerveme.com>",
    "download_url": "https://files.pythonhosted.org/packages/a9/34/cb467ef0b1676ef7ced6c6e799673dd127bc912f622ed310cca09cb90823/zervedataplatform-0.1.1.tar.gz",
    "platform": null,
    "description": "# Zerve Data Platform\n\nAn enterprise-grade ETL and data processing platform for automated e-commerce data extraction, AI-powered enrichment, and pipeline orchestration.\n\n## Features\n\n- **Multi-stage Pipeline Framework** - Orchestrate complex ETL workflows with checkpointing and progress tracking\n- **Web Scraping Automation** - Selenium-based browser automation for e-commerce sites\n- **AI-Powered Data Enrichment** - Multiple LLM provider support (OpenAI, Google Gemini, Ollama, HuggingFace)\n- **Cloud Integration** - AWS S3 and Spark data lake support\n- **Database Connectors** - PostgreSQL and Spark SQL with auto-schema generation\n- **Distributed Processing** - Apache Spark for big data ETL workflows\n\n## Installation\n\n### Development Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/zerveme/zervemedata.git\ncd zervedataplatform\n\n# Install in editable mode with development dependencies\npip install -e \".[dev]\"\n```\n\n### Production Installation\n\n```bash\npip install zervedataplatform\n```\n\n## Quick Start\n\n### Import the package\n\n```python\nfrom pipeline import DataPipeline, DataConnectorBase\nfrom connectors.ai import GenAIManager\nfrom connectors.sql_connectors import PostgresSqlConnector\nfrom connectors.cloud_storage_connectors import S3CloudConnector\nfrom utils import Utility\n\n# Configure your pipeline\nconfig = Utility.read_in_json_file(\"config.json\")\n\n# Create AI connector\nai_manager = GenAIManager(config[\"ai_config\"])\n\n# Create database connector\ndb = PostgresSqlConnector(config[\"db_config\"])\n\n# Create and run pipeline\npipeline = DataPipeline()\n# ... add your jobs\npipeline.run_data_pipeline()\n```\n\n## Architecture\n\n```\nzervedataplatform/\n\u251c\u2500\u2500 abstractions/          # Abstract base classes and interfaces\n\u251c\u2500\u2500 connectors/           # Database, cloud, and AI connectors\n\u2502   \u251c\u2500\u2500 ai/              # OpenAI, Gemini, LangChain, Google Vision\n\u2502   \u251c\u2500\u2500 sql_connectors/  # PostgreSQL, Spark SQL\n\u2502   \u2514\u2500\u2500 cloud_storage_connectors/  # S3, Spark Cloud\n\u251c\u2500\u2500 pipeline/            # Pipeline orchestration framework\n\u251c\u2500\u2500 model_transforms/    # Database models and schemas\n\u251c\u2500\u2500 utils/              # Utilities and helpers\n\u2514\u2500\u2500 test/               # Unit tests\n```\n\n## Key Components\n\n### Pipeline Framework\n- **5-Stage Execution**: `initialize \u2192 pre_validate \u2192 read \u2192 main \u2192 output`\n- **Activity Logging**: JSON-based progress tracking with hierarchical structure\n- **Checkpoint/Resume**: Resume long-running pipelines from failure points\n\n### AI Connectors\n- **Multi-Provider Support**: OpenAI, Google Gemini, Ollama (local), HuggingFace\n- **Unified Interface**: LangChain abstraction layer\n- **Auto-Detection**: Configuration-driven provider selection\n\n### Data Processing\n- **Spark Integration**: Distributed processing for large datasets\n- **Pandas/Spark**: Seamless DataFrame conversions\n- **ETL Utilities**: High-level operations for common ETL tasks\n\n## Configuration\n\nCreate configuration files in `default_configs/`:\n\n```json\n// configuration.json\n{\n  \"db_config\": \"default_configs/db_config.json\",\n  \"run_config\": \"default_configs/run.json\",\n  \"ai_api_config\": \"default_configs/google_api_config.json\",\n  \"web_config\": \"default_configs/web_config.json\",\n  \"cloud_config\": \"default_configs/s3_config.json\"\n}\n```\n\nSee the `default_configs/` directory for configuration examples.\n\n## Requirements\n\n- Python 3.11+\n- Apache Spark 3.5.2\n- PostgreSQL (optional, for SQL connector)\n- AWS credentials (optional, for S3 connector)\n- Google Cloud credentials (optional, for Vision API)\n\n## Development\n\n```bash\n# Install development dependencies\npip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Run tests with coverage\npytest --cov=. --cov-report=html\n\n# Format code\nblack .\n\n# Lint code\nflake8\n```\n\n## License\n\nProprietary - \u00a9 2025 Zerveme\n\n## Support\n\nFor issues and questions, please contact: support@zerveme.com\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "E-commerce data extraction and processing platform with AI-powered enrichment",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/zerveme/zervemedata",
        "Repository": "https://github.com/zerveme/zervemedata"
    },
    "split_keywords": [
        "etl",
        " data-pipeline",
        " web-scraping",
        " ai",
        " e-commerce",
        " spark",
        " llm"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7960edb7c4aabf91f126a888d626d2ac6f33e279f6ff4134fb371e516dc6fad7",
                "md5": "c4167fa01639f1bc97b77c0082a781db",
                "sha256": "f18094ea05f49af1d3a8fcc02d9b66a9db8714efe643f1a468a5985b6a556a03"
            },
            "downloads": -1,
            "filename": "zervedataplatform-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c4167fa01639f1bc97b77c0082a781db",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 74049,
            "upload_time": "2025-10-25T03:26:49",
            "upload_time_iso_8601": "2025-10-25T03:26:49.456165Z",
            "url": "https://files.pythonhosted.org/packages/79/60/edb7c4aabf91f126a888d626d2ac6f33e279f6ff4134fb371e516dc6fad7/zervedataplatform-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a934cb467ef0b1676ef7ced6c6e799673dd127bc912f622ed310cca09cb90823",
                "md5": "1bbc803a4f3c9c3bc91dea0fc45ca2c4",
                "sha256": "702631b84637d7ac5632d0a6f870b9c7e0356799db4318818cfe2b8f5a105bdf"
            },
            "downloads": -1,
            "filename": "zervedataplatform-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "1bbc803a4f3c9c3bc91dea0fc45ca2c4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 91803,
            "upload_time": "2025-10-25T03:26:50",
            "upload_time_iso_8601": "2025-10-25T03:26:50.804930Z",
            "url": "https://files.pythonhosted.org/packages/a9/34/cb467ef0b1676ef7ced6c6e799673dd127bc912f622ed310cca09cb90823/zervedataplatform-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-25 03:26:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "zerveme",
    "github_project": "zervemedata",
    "github_not_found": true,
    "lcname": "zervedataplatform"
}
        
Elapsed time: 1.53518s