# Config-Driven Data Loading Framework
Framework that eliminates repetitive data loading code by using configuration files. Instead of writing custom code for each data source, we configure once and reuse everywhere.
## **Framework Architecture**
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ DATA SOURCES │ │ ORCHESTRATOR │ │ Database │
│ │ │ │ │ Schema │
│ • CSV Files │────│ Config Reader │────│ │
│ • Excel Files │ │ Data Processor │ │ • api_data │
│ • REST APIs │ │ Column Mapper │ │ • market_trends │
│ • JSON Files │ │ Database Writer │ │ • risk_metrics │
└─────────────────┘ └──────────────────┘ └─────────────────┘
```
## **Core Components**
- **DataSourceFactory**: Creates appropriate loaders based on configuration type
- **DataOrchestrator**: Manages the entire pipeline with error handling and retry logic [One Time Implementation]
- **DataProcessor**: Handles transformations and column mapping
- **DatabaseWriter**: Executes batch operations to Database schema tables
---
## **Sample Configuration Files**
### **Data Sources Configuration (data-sources.yml)**
```yaml
# Data Sources Configuration for Database Schema
data_sources:
market_data_csv:
type: "csv"
source:
file_path: "/data/market/daily_rates.csv"
delimiter: ","
header: true
encoding: "UTF-8"
target:
schema: "MarketData"
table: "market_trends"
batch_size: 500
column_mapping:
- source: "date" → target: "trade_date"
- source: "currency_pair" → target: "currency"
- source: "rate" → target: "exchange_rate"
- source: "volume" → target: "trading_volume"
risk_metrics_excel:
type: "excel"
source:
file_path: "/data/risk/monthly_risk.xlsx"
sheet_name: "RiskData"
skip_rows: 1
target:
schema: "RiskMetrics"
table: "risk_metrics"
batch_size: 200
column_mapping:
- source: "Portfolio ID" → target: "portfolio_id"
- source: "VaR 95%" → target: "var_95"
- source: "Expected Shortfall" → target: "expected_shortfall"
- source: "Liquidity Score" → target: "liquidity_score"
validation:
required_columns: ["Portfolio ID", "VaR 95%"]
data_quality_checks: true
rest_api_data:
type: "rest_api"
source:
url: "https://api.provider.com/v1/liq"
method: "GET"
headers:
Authorization: "Bearer ${API_TOKEN}"
Content-Type: "application/json"
timeout: 30
retry_attempts: 3
target:
schema: "APIData"
table: "forecast_data"
batch_size: 1000
column_mapping:
- source: "id" → target: "id"
- source: "assetClass" → target: "asset_class"
- source: "predictedLiquidity" → target: "predicted_liquidity"
- source: "confidenceLevel" → target: "confidence_level"
- source: "forecastDate" → target: "forecast_date"
config_json:
type: "json"
source:
file_path: "/config/portfolio_settings.json"
json_path: "$.portfolios[*]"
target:
schema: "ConfigData"
table: "portfolio_config"
batch_size: 100
column_mapping:
- source: "id" → target: "portfolio_id"
- source: "name" → target: "portfolio_name"
- source: "riskProfile" → target: "risk_profile"
- source: "liquidityThreshold" → target: "liquidity_threshold"
# Global Settings
global_settings:
error_handling:
continue_on_error: true
error_threshold: 10
notification_email: "dev-team@company.com"
data_quality:
enable_validation: true
null_value_handling: "skip"
duplicate_handling: "ignore"
performance:
connection_pool_size: 10
query_timeout: 300
memory_limit: "2GB"
```
## **High Level Architecture Diagram**

## **Class Diagram**

## Sequence Diagram

## **Key Developer Benefits**
- **Code Reusability**: Write once, configure multiple times - no duplicate data loading logic
- **Maintenance Reduction**: Single codebase handles all data sources through configuration
- **Easy Onboarding**: New data sources added via YAML files, not code changes
- **Error Handling**: Built-in retry logic and comprehensive error reporting
- **Performance**: Batch processing and connection pooling optimize database operations
Raw data
{
"_id": null,
"home_page": null,
"name": "config-driven-data-ingestion",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "data-ingestion, ETL, csv, json, yaml, sqlalchemy, database, batch-processing",
"author": null,
"author_email": "Sathwick <sathwick@outlook.in>",
"download_url": "https://files.pythonhosted.org/packages/37/0f/ecd4c3a4e7f4dc05ecbf589f53ffb7827fabaedbc789ae34884655d10a86/config_driven_data_ingestion-1.0.2.tar.gz",
"platform": null,
"description": "# Config-Driven Data Loading Framework\n\nFramework that eliminates repetitive data loading code by using configuration files. Instead of writing custom code for each data source, we configure once and reuse everywhere.\n\n## **Framework Architecture**\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 DATA SOURCES \u2502 \u2502 ORCHESTRATOR \u2502 \u2502 Database \u2502\n\u2502 \u2502 \u2502 \u2502 \u2502 Schema \u2502\n\u2502 \u2022 CSV Files \u2502\u2500\u2500\u2500\u2500\u2502 Config Reader \u2502\u2500\u2500\u2500\u2500\u2502 \u2502\n\u2502 \u2022 Excel Files \u2502 \u2502 Data Processor \u2502 \u2502 \u2022 api_data \u2502\n\u2502 \u2022 REST APIs \u2502 \u2502 Column Mapper \u2502 \u2502 \u2022 market_trends \u2502\n\u2502 \u2022 JSON Files \u2502 \u2502 Database Writer \u2502 \u2502 \u2022 risk_metrics \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n## **Core Components**\n\n- **DataSourceFactory**: Creates appropriate loaders based on configuration type \n- **DataOrchestrator**: Manages the entire pipeline with error handling and retry logic [One Time Implementation] \n- **DataProcessor**: Handles transformations and column mapping \n- **DatabaseWriter**: Executes batch operations to Database schema tables \n\n---\n\n## **Sample Configuration Files**\n\n### **Data Sources Configuration (data-sources.yml)**\n```yaml\n# Data Sources Configuration for Database Schema\ndata_sources:\n market_data_csv:\n type: \"csv\"\n source:\n file_path: \"/data/market/daily_rates.csv\"\n delimiter: \",\"\n header: true\n encoding: \"UTF-8\"\n target:\n schema: \"MarketData\"\n table: \"market_trends\"\n batch_size: 500\n column_mapping:\n - source: \"date\" \u2192 target: \"trade_date\"\n - source: \"currency_pair\" \u2192 target: \"currency\"\n - source: \"rate\" \u2192 target: \"exchange_rate\"\n - source: \"volume\" \u2192 target: \"trading_volume\"\n risk_metrics_excel:\n type: \"excel\"\n source:\n file_path: \"/data/risk/monthly_risk.xlsx\"\n sheet_name: \"RiskData\"\n skip_rows: 1\n target:\n schema: \"RiskMetrics\"\n table: \"risk_metrics\"\n batch_size: 200\n column_mapping:\n - source: \"Portfolio ID\" \u2192 target: \"portfolio_id\"\n - source: \"VaR 95%\" \u2192 target: \"var_95\"\n - source: \"Expected Shortfall\" \u2192 target: \"expected_shortfall\"\n - source: \"Liquidity Score\" \u2192 target: \"liquidity_score\"\n validation:\n required_columns: [\"Portfolio ID\", \"VaR 95%\"]\n data_quality_checks: true\n rest_api_data:\n type: \"rest_api\"\n source:\n url: \"https://api.provider.com/v1/liq\"\n method: \"GET\"\n headers:\n Authorization: \"Bearer ${API_TOKEN}\"\n Content-Type: \"application/json\"\n timeout: 30\n retry_attempts: 3\n target:\n schema: \"APIData\"\n table: \"forecast_data\"\n batch_size: 1000\n column_mapping:\n - source: \"id\" \u2192 target: \"id\"\n - source: \"assetClass\" \u2192 target: \"asset_class\"\n - source: \"predictedLiquidity\" \u2192 target: \"predicted_liquidity\"\n - source: \"confidenceLevel\" \u2192 target: \"confidence_level\"\n - source: \"forecastDate\" \u2192 target: \"forecast_date\"\n config_json:\n type: \"json\"\n source:\n file_path: \"/config/portfolio_settings.json\"\n json_path: \"$.portfolios[*]\"\n target:\n schema: \"ConfigData\"\n table: \"portfolio_config\"\n batch_size: 100\n column_mapping:\n - source: \"id\" \u2192 target: \"portfolio_id\"\n - source: \"name\" \u2192 target: \"portfolio_name\"\n - source: \"riskProfile\" \u2192 target: \"risk_profile\"\n - source: \"liquidityThreshold\" \u2192 target: \"liquidity_threshold\"\n\n# Global Settings\nglobal_settings:\n error_handling:\n continue_on_error: true\n error_threshold: 10\n notification_email: \"dev-team@company.com\"\n data_quality:\n enable_validation: true\n null_value_handling: \"skip\"\n duplicate_handling: \"ignore\"\n performance:\n connection_pool_size: 10\n query_timeout: 300\n memory_limit: \"2GB\"\n```\n\n## **High Level Architecture Diagram**\n\n\n\n## **Class Diagram**\n\n\n## Sequence Diagram\n\n\n## **Key Developer Benefits**\n\n- **Code Reusability**: Write once, configure multiple times - no duplicate data loading logic\n- **Maintenance Reduction**: Single codebase handles all data sources through configuration\n- **Easy Onboarding**: New data sources added via YAML files, not code changes\n- **Error Handling**: Built-in retry logic and comprehensive error reporting\n- **Performance**: Batch processing and connection pooling optimize database operations\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A comprehensive, config-driven data ingestion library for Python",
"version": "1.0.2",
"project_urls": {
"Bug Tracker": "https://github.com/sathwickreddyy/python_projects/issues",
"Homepage": "https://github.com/sathwickreddyy/python_projects/tree/main/config_driven_loading"
},
"split_keywords": [
"data-ingestion",
" etl",
" csv",
" json",
" yaml",
" sqlalchemy",
" database",
" batch-processing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f40ce01be64e362eea02e22493d9933ecb50e462a489542527e2c070a9802850",
"md5": "fc0ad9f20bf6c42c07aa3351d1e12ed6",
"sha256": "47972c4cd82996a78e19d1115de202dba57ed0c6a5989a2ccdcd29df08a53061"
},
"downloads": -1,
"filename": "config_driven_data_ingestion-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fc0ad9f20bf6c42c07aa3351d1e12ed6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 28896,
"upload_time": "2025-07-17T10:22:23",
"upload_time_iso_8601": "2025-07-17T10:22:23.630521Z",
"url": "https://files.pythonhosted.org/packages/f4/0c/e01be64e362eea02e22493d9933ecb50e462a489542527e2c070a9802850/config_driven_data_ingestion-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "370fecd4c3a4e7f4dc05ecbf589f53ffb7827fabaedbc789ae34884655d10a86",
"md5": "3eea9bff3185aba279cdfa507a5d692c",
"sha256": "f08cd627bdba81e28e121fa35fcba2667bb567005f429b1ce20129130f216982"
},
"downloads": -1,
"filename": "config_driven_data_ingestion-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "3eea9bff3185aba279cdfa507a5d692c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 25138,
"upload_time": "2025-07-17T10:22:25",
"upload_time_iso_8601": "2025-07-17T10:22:25.061512Z",
"url": "https://files.pythonhosted.org/packages/37/0f/ecd4c3a4e7f4dc05ecbf589f53ffb7827fabaedbc789ae34884655d10a86/config_driven_data_ingestion-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-17 10:22:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "sathwickreddyy",
"github_project": "python_projects",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "config-driven-data-ingestion"
}