# CausalLLM: High-Performance Causal Inference Library
[](https://choosealicense.com/licenses/mit/)
[](https://www.python.org/downloads/)
[](https://badge.fury.io/py/causallm)
[](https://github.com/rdmurugan/causallm/stargazers)
[](https://pypi.org/project/causallm/)
**CausalLLM** is a powerful Python library that combines statistical causal inference methods with advanced language models to discover causal relationships and estimate treatment effects. It provides enterprise-grade performance with 10x faster computations and 80% memory reduction while maintaining statistical rigor.
## ๐ **New in v4.2.0: Enterprise-Grade Monitoring & Testing!**
**Production-Ready Causal Inference!** CausalLLM now includes comprehensive monitoring, observability, and advanced testing capabilities:
### ๐ **Monitoring & Observability**
- **๐ Metrics Collection**: Track performance, usage patterns, and system health
- **๐ฅ Health Checks**: Monitor components, dependencies, and operational status
- **โก Performance Profiling**: Detailed memory usage and execution timing analysis
### ๐งช **Extended Testing Framework**
- **๐ฒ Property-Based Testing**: Automated property verification with Hypothesis
- **๐ Performance Benchmarks**: Algorithm comparison and scaling analysis
- **๐งฌ Mutation Testing**: Assess test suite quality and coverage gaps
### ๐ **Legacy Features (v4.1.0)**
- ๐ฅ๏ธ **Command Line Interface**: Run causal analysis directly from your terminal
- ๐ **Interactive Web Interface**: Point-and-click analysis with Streamlit
- ๐ **Python Library**: Full programmatic control (as before)
---
## ๐ Performance Highlights
- **10x Faster Computations**: Vectorized algorithms with Numba JIT compilation
- **80% Memory Reduction**: Intelligent data chunking and lazy evaluation
- **Unlimited Scale**: Handle datasets with millions of rows through streaming processing
- **Smart Caching**: 80%+ cache hit rates for repeated analyses
- **Parallel Processing**: Async computations with automatic resource management
- **Zero Configuration**: Performance optimizations work automatically
---
## ๐ Table of Contents
1. [Quick Start - CLI & Web](#-quick-start---cli--web) โญ **New**
2. [Quick Start - Python](#-quick-start---python)
3. [Installation](#-installation)
4. [Key Features](#-key-features)
5. [Domain Examples](#-domain-examples)
6. [Core Components](#-core-components)
7. [Performance](#-performance)
8. [API Documentation](#-api-documentation)
9. [Advanced Features](#-advanced-features)
10. [Support & Community](#-support--community)
---
## ๐ Quick Start - CLI & Web
### ๐ฅ๏ธ Command Line Interface
**Perfect for data scientists and analysts who prefer terminal-based workflows:**
```bash
# Install CausalLLM
pip install causallm
# Discover causal relationships
causallm discover --data healthcare_data.csv \
--variables "age,treatment,outcome" \
--domain healthcare \
--output results.json
# Estimate treatment effects
causallm effect --data experiment.csv \
--treatment drug \
--outcome recovery \
--confounders "age,gender" \
--output effects.json
# Generate counterfactual scenarios
causallm counterfactual --data patient_data.csv \
--intervention "treatment=1" \
--samples 200 \
--output scenarios.json
# Get help and examples
causallm info --examples
```
**CLI Features:**
- ๐ **Causal Discovery**: Find relationships in your data automatically
- โก **Effect Estimation**: Quantify treatment impacts with confidence intervals
- ๐ฎ **Counterfactual Analysis**: Generate "what-if" scenarios
- ๐ **Multiple Formats**: Support for CSV, JSON input/output
- ๐ฅ **Domain Context**: Healthcare, marketing, education, insurance
- ๐ **Built-in Help**: Examples and documentation at your fingertips
### ๐ Interactive Web Interface
**Perfect for business users, researchers, and anyone who prefers point-and-click analysis:**
```bash
# Install with web interface
pip install "causallm[ui]"
# Launch interactive web interface
causallm web --port 8080
# Open browser to http://localhost:8080
```
**Web Interface Features:**
- ๐ **Drag & Drop Data**: Upload CSV/JSON files or use sample datasets
- ๐ฏ **Visual Analysis**: Interactive graphs and visualizations
- ๐ **Real-time Results**: See analysis results as you configure parameters
- ๐งญ **Guided Workflow**: Step-by-step tabs for discovery, effects, and counterfactuals
- ๐ **Built-in Documentation**: Examples and guides integrated in the interface
- ๐ **Export Results**: Download analysis results and visualizations
**Sample Web Analysis Workflow:**
1. **Upload Data** โ CSV/JSON files or choose from healthcare/marketing samples
2. **Discover Relationships** โ Select variables, choose domain context, view causal graph
3. **Estimate Effects** โ Pick treatment/outcome, control for confounders, see confidence intervals
4. **Explore Counterfactuals** โ Set interventions, generate scenarios, understand impacts
5. **Export & Share** โ Download results, graphs, and analysis reports
### ๐ฑ Installation Options
```bash
# Basic installation (CLI + Python library)
pip install causallm
# With web interface (adds Streamlit, Dash, Gradio)
pip install "causallm[ui]"
# With plugin support (LangChain, transformers)
pip install "causallm[plugins]"
# Full installation (everything)
pip install "causallm[full]"
```
---
## ๐ Quick Start - Python
### Basic High-Performance Analysis with Configuration
```python
from causallm import EnhancedCausalLLM
import pandas as pd
# Initialize with automatic configuration (uses environment variables and defaults)
causal_llm = EnhancedCausalLLM()
# OR initialize with specific configuration overrides
causal_llm = EnhancedCausalLLM(
llm_provider='openai', # LLM provider
use_async=True, # Enable async processing
cache_dir='./cache' # Enable persistent caching
)
# Load your data (supports very large datasets)
data = pd.read_csv("your_large_data.csv") # Can handle millions of rows
# Comprehensive analysis with standardized parameter names
results = causal_llm.comprehensive_analysis(
data=data, # Standardized: 'data' (not 'df')
treatment_variable='treatment_col', # Standardized: 'treatment_variable'
outcome_variable='outcome_col', # Standardized: 'outcome_variable'
domain_context='healthcare' # Standardized: 'domain_context'
)
print(f"Effect estimate: {results.inference_results}")
print(f"Confidence: {results.confidence_score}")
```
### Configuration-Based Setup
```python
from causallm.config import CausalLLMConfig
# Create custom configuration
config = CausalLLMConfig()
config.llm.provider = 'openai'
config.performance.use_async = True
config.performance.chunk_size = 50000
config.statistical.significance_level = 0.01
# Initialize with configuration
causal_llm = EnhancedCausalLLM(config=config)
# Or use configuration file
causal_llm = EnhancedCausalLLM(config_file='my_config.json')
```
### Environment Variable Configuration
```bash
# Set environment variables for automatic configuration
export CAUSALLM_LLM_PROVIDER=openai
export CAUSALLM_USE_ASYNC=true
export CAUSALLM_CHUNK_SIZE=10000
export CAUSALLM_CACHE_DIR=./cache
export OPENAI_API_KEY=your-api-key
# No configuration needed - automatically uses environment variables
python -c "
from causallm import EnhancedCausalLLM
causal_llm = EnhancedCausalLLM() # Automatically configured
"
```
### Memory-Efficient Processing for Large Datasets
```python
from causallm.core.data_processing import DataChunker, StreamingDataProcessor
# Process datasets that don't fit in memory
processor = StreamingDataProcessor()
def analyze_chunk(chunk_data):
return chunk_data.corr()
# Stream and process large CSV files
results = processor.process_streaming(
"very_large_data.csv",
analyze_chunk,
aggregation_func=lambda results: pd.concat(results).mean()
)
```
---
## ๐ฆ Installation
**Choose the installation that fits your workflow:**
```bash
# Basic installation (CLI + Python library)
pip install causallm
# With monitoring and testing features (recommended for development)
pip install "causallm[testing]"
# With web interface (recommended for most users)
pip install "causallm[ui]"
# With plugin support (LangChain, transformers, etc.)
pip install "causallm[plugins]"
# Full installation (everything - web, plugins, dev tools)
pip install "causallm[full]"
# Development installation
pip install "causallm[dev]"
```
**After Installation:**
```bash
# Test CLI
causallm --help
# Launch web interface (if installed with [ui])
causallm web
# Use in Python
python -c "from causallm import CausalLLM; print('Ready!')"
```
---
## โจ Key Features
### ๐ **Monitoring & Observability** โญ *New in v4.2.0*
- **Comprehensive Metrics Collection**: Track performance, usage patterns, and system health with thread-safe collectors
- **Advanced Health Checks**: Monitor system resources, database connectivity, LLM provider APIs, and custom components
- **Performance Profiling**: Detailed memory usage tracking, execution timing, and statistical analysis
- **Real-time Monitoring**: Background monitoring with configurable intervals and alerting
- **Export & Integration**: JSON export for external monitoring systems (Prometheus, Grafana, etc.)
### ๐งช **Extended Testing Framework** โญ *New in v4.2.0*
- **Property-Based Testing**: Automated property verification using Hypothesis with causal-specific strategies
- **Performance Benchmarks**: Algorithm comparison, scaling analysis, and statistical performance evaluation
- **Mutation Testing**: Assess test suite quality with AST-based code mutations and survival analysis
- **Causal Test Strategies**: Generate realistic datasets with known causal structures for robust testing
- **Comprehensive Test Runner**: Unified test execution with detailed reporting and analysis
### ๐ฅ๏ธ **CLI & Web Interfaces** โญ *New in v4.1.0*
- **Command Line Tool**: `causallm` command for terminal-based analysis
- **Interactive Web Interface**: Streamlit-based GUI for point-and-click analysis
- **No Python Required**: Full causal inference without programming
- **Multiple Input Formats**: CSV, JSON data support with sample datasets
- **Export Capabilities**: Download results, graphs, and analysis reports
### ๐ฏ **Standardized Interfaces** โญ *New*
- **Consistent Parameter Names**: Same parameter names across all components (`data`, `treatment_variable`, `outcome_variable`)
- **Unified Async Support**: All methods support both sync and async with identical interfaces
- **Protocol-Based Design**: Type-safe interfaces ensuring consistency
- **Rich Metadata**: Comprehensive analysis metadata with execution tracking
### โ๏ธ **Centralized Configuration** โญ *New*
- **Environment Variable Support**: Automatic configuration from environment variables
- **Configuration Files**: JSON-based configuration with validation
- **Multiple Environments**: Development, testing, and production configurations
- **Dynamic Updates**: Runtime configuration updates with validation
### ๐ง Statistical Causal Inference
- **Multiple Methods**: Linear regression, propensity score matching, instrumental variables, doubly robust estimation
- **Assumption Testing**: Automated validation of causal inference assumptions
- **Robustness Checks**: Cross-validation across multiple statistical approaches
- **Performance Optimized**: Vectorized algorithms for large-scale analysis
### ๐ Causal Structure Discovery
- **PC Algorithm**: Implementation for discovering relationships from data
- **Parallel Processing**: Async independence testing for faster discovery
- **LLM Enhancement**: Optional integration with language models for domain expertise
- **Scalable**: Chunked processing for very large variable sets
### ๐ญ Domain-Specific Packages
- **[Healthcare](#healthcare-domain)**: Clinical trial analysis, treatment effectiveness, patient outcomes
- **[Insurance](#insurance-domain)**: Risk assessment, premium optimization, claims analysis
- **[Marketing](#marketing-domain)**: Campaign attribution, ROI optimization, customer analytics
- **Education**: Student outcomes, intervention analysis, policy evaluation
- **Experimentation**: A/B testing, experimental design validation
### ๐ง Advanced Performance Features
- **Data Chunking**: Automatic memory-efficient processing of large datasets
- **Intelligent Caching**: Multi-tier caching (memory + disk) with smart invalidation
- **Vectorized Algorithms**: Numba-optimized statistical computations
- **Async Processing**: Parallel execution of independent computations
- **Lazy Evaluation**: Deferred computation until results are needed
- **Resource Monitoring**: Automatic memory and CPU usage optimization
### ๐ LLM Integrations
- **Multiple Providers**: OpenAI, Anthropic, LLaMA, local models
- **Optional Usage**: Library works fully without API keys using statistical methods
- **MCP Support**: Model Context Protocol for advanced integrations
---
## ๐ Monitoring & Testing Examples
### Quick Start: Monitoring in Production
```python
from causallm.monitoring import configure_metrics, get_global_health_checker
from causallm.monitoring.profiler import profile, profile_block
# Configure comprehensive monitoring
collector = configure_metrics(enabled=True, collection_interval=30)
health_checker = get_global_health_checker()
# Profile your causal inference functions
@profile(name="causal_discovery", track_memory=True)
async def run_causal_analysis(data):
# Your causal inference code
results = await causal_llm.discover_causal_relationships(data, variables)
# Manual metrics recording
collector.record_causal_discovery(
variables_count=len(variables),
duration=time.time() - start_time,
method='PC',
success=True
)
return results
# Monitor system health
health_status = await health_checker.run_all_health_checks()
print(f"System status: {health_status}")
```
### Property-Based Testing for Causal Methods
```python
from causallm.testing import CausalDataStrategy, causal_hypothesis_test
from hypothesis import given
class TestCausalInference:
@given(CausalDataStrategy.numeric_data(['X', 'Y', 'Z'], min_rows=100))
@causal_hypothesis_test(
strategy=CausalDataStrategy.numeric_data(['X', 'Y', 'Z']),
property_func=lambda result, data: result is not None
)
def test_causal_discovery_properties(self, data):
"""Test that causal discovery returns valid results."""
result = my_causal_discovery_function(data)
# Property: Results should be deterministic for same data
result2 = my_causal_discovery_function(data)
assert result == result2
# Property: Number of edges should be reasonable
assert len(result.edges) <= len(data.columns) ** 2
return result
```
### Performance Benchmarking
```python
from causallm.testing import BenchmarkSuite, CausalBenchmarkSuite
# Compare different causal discovery algorithms
algorithms = {
'pc_algorithm': my_pc_implementation,
'ges_algorithm': my_ges_implementation,
'direct_lingam': my_lingam_implementation
}
benchmark_suite = CausalBenchmarkSuite(
data_sizes=[100, 500, 1000, 5000],
variable_counts=[5, 10, 15, 20]
)
# Run comprehensive benchmarks
results = {}
for name, algorithm in algorithms.items():
results[name] = benchmark_suite.benchmark_causal_discovery(algorithm, name)
# Compare performance
comparison = benchmark_suite.compare_algorithms(
{name: benchmark_suite.results[name] for name in algorithms.keys()}
)
print(f"Fastest algorithm: {comparison['_summary']['fastest_algorithm']}")
print(f"Most memory efficient: {comparison['_summary']['most_memory_efficient']}")
```
### Mutation Testing for Test Quality
```python
from causallm.testing import MutationTestRunner, MutationTestConfig
# Configure mutation testing
config = MutationTestConfig(
target_files=['causallm/core/causal_discovery.py'],
test_command='pytest tests/test_causal_discovery.py -v',
mutation_score_threshold=0.8,
max_mutations_per_file=50
)
# Run mutation tests
runner = MutationTestRunner(config)
results = runner.run_mutation_tests()
print(f"Mutation Score: {results['mutation_score']:.2%}")
print(f"Test Quality: {'Good' if results['passed_threshold'] else 'Needs Improvement'}")
# Analyze weak spots
if not results['passed_threshold']:
print("Files needing better tests:")
for file_path, stats in results['results_by_file'].items():
if stats['mutation_score'] < 0.7:
print(f" {file_path}: {stats['mutation_score']:.2%}")
```
### Complete Monitoring Dashboard
```python
import asyncio
from causallm.monitoring import MetricsCollector, HealthChecker, PerformanceProfiler
class CausalLLMMonitor:
def __init__(self):
self.metrics = MetricsCollector(enabled=True)
self.health_checker = HealthChecker(enabled=True)
self.profiler = PerformanceProfiler(enabled=True)
async def get_system_status(self):
"""Get comprehensive system status."""
# Health checks
health_results = await self.health_checker.run_all_health_checks()
overall_health = self.health_checker.get_overall_health()
# Performance metrics
metrics_summary = self.metrics.get_metrics_summary()
performance_summary = self.profiler.get_performance_summary()
return {
'health': overall_health,
'metrics': metrics_summary,
'performance': performance_summary,
'timestamp': datetime.now().isoformat()
}
async def monitor_continuously(self, interval=60):
"""Continuous monitoring loop."""
await self.health_checker.start_background_monitoring(interval)
while True:
status = await self.get_system_status()
# Alert on issues
if status['health']['status'] != 'healthy':
await self.send_alert(status)
await asyncio.sleep(interval)
# Usage
monitor = CausalLLMMonitor()
status = await monitor.get_system_status()
```
---
## ๐ฅ Domain Examples
### Healthcare Domain
Transform clinical data analysis with domain-specific expertise:
```python
from causallm import HealthcareDomain, EnhancedCausalLLM
# Initialize with healthcare configuration
causal_llm = EnhancedCausalLLM(
config_file='healthcare_config.json', # Domain-specific configuration
domain_context='healthcare'
)
healthcare = HealthcareDomain()
# Generate realistic clinical trial data (scalable)
clinical_data = healthcare.generate_clinical_trial_data(
n_patients=100000, # Large dataset support
treatment_arms=['control', 'treatment_a', 'treatment_b']
)
# Treatment effectiveness analysis with standardized interface
results = causal_llm.estimate_causal_effect(
data=clinical_data, # Standardized parameter
treatment_variable='treatment_group', # Standardized parameter
outcome_variable='recovery_time', # Standardized parameter
covariate_variables=['age', 'baseline_severity', 'comorbidities']
)
print(f"Treatment effect: {results.primary_effect.estimate:.2f} days")
print(f"Confidence interval: {results.primary_effect.confidence_interval}")
print(f"Clinical significance: {results.interpretation}")
```
**Healthcare Features:**
- Clinical trial data generation with proper randomization
- Treatment effectiveness analysis with medical context
- Safety analysis and adverse event evaluation
- Patient outcome prediction with clinical insights
### Insurance Domain
Optimize risk assessment and premium pricing:
```python
from causallm import InsuranceDomain, EnhancedCausalLLM
# Initialize with insurance-optimized configuration
causal_llm = EnhancedCausalLLM(
config_file='insurance_config.json',
use_async=True, # Handle large policy datasets
chunk_size=50000 # Optimize for policy data
)
insurance = InsuranceDomain()
# Generate large-scale policy data
policy_data = insurance.generate_stop_loss_data(n_policies=500000)
# Risk factor analysis with standardized interface
risk_results = causal_llm.estimate_causal_effect(
data=policy_data, # Standardized parameter
treatment_variable='industry_type', # Standardized parameter
outcome_variable='total_claim_amount', # Standardized parameter
covariate_variables=['company_size', 'policy_limit', 'geographic_region']
)
print(f"Industry risk effect: ${risk_results.primary_effect.estimate:,.0f}")
print(f"Statistical significance: p = {risk_results.primary_effect.p_value:.6f}")
print(f"Confidence level: {risk_results.confidence_level}")
```
**Insurance Features:**
- Stop loss insurance data simulation
- Risk factor analysis with actuarial insights
- Premium optimization recommendations
- Claims prediction and underwriting support
### Marketing Domain
Master campaign attribution and ROI optimization:
```python
from causallm.domains.marketing import MarketingDomain
from causallm import EnhancedCausalLLM
# Initialize with marketing-optimized configuration
causal_llm = EnhancedCausalLLM(
config_file='marketing_config.json',
llm_provider='openai', # For enhanced attribution insights
use_async=True # Handle large touchpoint datasets
)
marketing = MarketingDomain(enable_performance_optimizations=True)
# Generate sample marketing data
marketing_data = marketing.generate_marketing_data(
n_customers=10000,
n_touchpoints=30000
)
# Comprehensive attribution analysis with standardized interface
attribution_result = causal_llm.comprehensive_analysis(
data=marketing_data, # Standardized parameter
treatment_variable='channel_spend', # Standardized parameter
outcome_variable='conversion_value', # Standardized parameter
covariate_variables=['customer_segment', 'touchpoint_sequence'],
domain_context='marketing' # Standardized parameter
)
print(f"Overall attribution confidence: {attribution_result.confidence_score:.2f}")
for insight in attribution_result.actionable_insights[:3]:
print(f"โข {insight}")
```
**Marketing Features:**
- Multi-touch attribution modeling (first-touch, last-touch, data-driven, Shapley)
- Campaign ROI analysis and optimization
- Cross-device and cross-channel attribution
- Customer lifetime value modeling
**Quick Reference - Attribution Models:**
| Model | Best For | Description |
|-------|----------|-------------|
| `data_driven` | **Recommended** | Uses causal inference for attribution |
| `first_touch` | Brand awareness | 100% credit to first interaction |
| `last_touch` | Direct response | 100% credit to last interaction |
| `linear` | Balanced view | Equal credit across touchpoints |
| `shapley` | Advanced | Game theory based attribution |
---
## ๐๏ธ Core Components
### EnhancedCausalLLM
High-performance main class with **standardized interfaces** and **centralized configuration management**.
```python
from causallm import EnhancedCausalLLM
from causallm.config import CausalLLMConfig
# Configuration-driven initialization (recommended)
causal_llm = EnhancedCausalLLM(config_file='my_config.json')
# OR with parameter overrides
causal_llm = EnhancedCausalLLM(
config_file='base_config.json',
llm_provider='openai', # Override configuration
use_async=True, # Enable async processing
cache_dir='./cache' # Custom cache location
)
# OR programmatic configuration
config = CausalLLMConfig()
config.llm.provider = 'openai'
config.llm.model = 'gpt-4'
config.performance.use_async = True
config.statistical.significance_level = 0.01
causal_llm = EnhancedCausalLLM(config=config)
# OR automatic configuration from environment variables
causal_llm = EnhancedCausalLLM() # Uses env vars + defaults
```
#### **New Configuration Features:**
- **Environment Variable Support**: Automatic configuration from `CAUSALLM_*` environment variables
- **Configuration Files**: JSON-based configuration with validation and inheritance
- **Dynamic Updates**: Runtime configuration changes with `update_configuration()`
- **Performance Metrics**: Built-in execution tracking with `get_performance_metrics()`
### Statistical Methods (Performance Optimized)
- **Vectorized Linear Regression**: NumPy/Numba optimized for large datasets
- **Fast Propensity Score Matching**: Efficient matching algorithms with parallel processing
- **Optimized Instrumental Variables**: Matrix operations optimized for speed
- **Parallel PC Algorithm**: Concurrent independence testing for causal discovery
### Domain Packages (Scalable)
Pre-configured, performance-optimized components for specific industries with built-in expertise and realistic data generators.
---
## โก Performance
### Dataset Size Support
- **Small Datasets** (< 10K rows): Instant analysis with full feature set
- **Medium Datasets** (10K - 100K rows): Automatic optimization, ~2-5x speedup
- **Large Datasets** (100K - 1M rows): Chunked processing, async operations
- **Very Large Datasets** (> 1M rows): Streaming analysis, distributed computing
### Speed Improvements
- **Correlation Analysis**: 10x faster with Numba vectorization
- **Causal Discovery**: 5x faster with parallel independence testing
- **Effect Estimation**: 3x faster with optimized matching algorithms
- **Repeated Analysis**: 20x+ faster with intelligent caching
### Memory Efficiency
- **Data Chunking**: Process datasets 10x larger than available RAM
- **Lazy Evaluation**: 60-80% memory reduction through deferred computation
- **Smart Caching**: Configurable memory vs. disk trade-offs
### Performance Configuration Examples
```python
# Small datasets (< 10K rows)
causal_llm = EnhancedCausalLLM(
enable_performance_optimizations=False # Overhead not worth it
)
# Large datasets (100K+ rows)
causal_llm = EnhancedCausalLLM(
enable_performance_optimizations=True,
chunk_size=50000,
use_async=True,
cache_dir="./cache",
max_memory_usage_gb=8
)
```
---
## ๐ API Documentation
### Core Methods
#### `comprehensive_analysis()`
Complete end-to-end causal analysis combining discovery and inference.
```python
analysis = causal_llm.comprehensive_analysis(
data=df, # Required: Your dataset
treatment='campaign', # Optional: Specific treatment
outcome='revenue', # Optional: Specific outcome
domain='marketing', # Optional: Domain context
covariates=['age', 'income'] # Optional: Control variables
)
```
**Returns:** `ComprehensiveCausalAnalysis` with:
- `discovery_results`: Causal structure findings
- `inference_results`: Detailed effect estimates
- `domain_recommendations`: Domain-specific advice
- `actionable_insights`: List of actionable findings
- `confidence_score`: Overall analysis confidence (0-1)
#### `discover_causal_relationships()`
Automatically discover causal relationships in your data.
```python
discovery = causal_llm.discover_causal_relationships(
data=df,
variables=['age', 'treatment', 'outcome'],
domain='healthcare'
)
```
**Returns:** `CausalDiscoveryResult` with discovered edges, confounders, and domain insights.
#### `estimate_causal_effect()`
Estimate the causal effect of a treatment on an outcome.
```python
effect = causal_llm.estimate_causal_effect(
data=df,
treatment='new_drug',
outcome='recovery_rate',
covariates=['age', 'severity'],
method='comprehensive' # 'regression', 'matching', 'iv'
)
```
**Returns:** `CausalInferenceResult` with effect estimates, confidence intervals, and robustness checks.
### Statistical Methods
Available through `StatisticalCausalInference`:
- `CausalMethod.LINEAR_REGRESSION`: Standard regression with covariates
- `CausalMethod.MATCHING`: Propensity score matching
- `CausalMethod.INSTRUMENTAL_VARIABLES`: Two-stage least squares
- `CausalMethod.REGRESSION_DISCONTINUITY`: RDD (if applicable)
- `CausalMethod.DIFFERENCE_IN_DIFFERENCES`: DiD (if applicable)
### Domain Packages API
Each domain package provides:
- **Data Generators**: Realistic synthetic data with proper causal structure
- **Domain Knowledge**: Expert knowledge about relationships and confounders
- **Analysis Templates**: Pre-configured workflows with domain-specific interpretation
---
## ๐ง Advanced Features
### Cached Analysis for Faster Iterations
```python
from causallm.core.caching import StatisticalComputationCache
# Enable persistent caching across sessions
causal_llm = EnhancedCausalLLM(cache_dir="./causallm_cache")
# First run computes and caches
result1 = causal_llm.estimate_causal_effect(data, 'treatment', 'outcome')
# Second run uses cached results (10x+ faster)
result2 = causal_llm.estimate_causal_effect(data, 'treatment', 'outcome')
```
### Async Processing for Maximum Performance
```python
import asyncio
from causallm.core.async_processing import AsyncCausalAnalysis
async def parallel_analysis():
async_causal = AsyncCausalAnalysis()
# Parallel correlation analysis
corr_matrix = await async_causal.parallel_correlation_analysis(
large_data, chunk_size=5000
)
# Parallel bootstrap analysis
bootstrap_results = await async_causal.parallel_bootstrap_analysis(
large_data, analysis_func=my_analysis, n_bootstrap=1000
)
return corr_matrix, bootstrap_results
# Run async analysis
results = asyncio.run(parallel_analysis())
```
### MCP Server Integration
CausalLLM provides Model Context Protocol (MCP) server capabilities for integration with Claude Desktop, VS Code, and other MCP-enabled applications:
```bash
# Start MCP server for integration with Claude Desktop, VS Code, etc.
python -m causallm.mcp.server --port 8000
```
**Available MCP tools:**
- `simulate_counterfactual`: Generate counterfactual scenarios
- `analyze_treatment_effect`: High-performance treatment analysis
- `extract_causal_edges`: Parallel causal relationship extraction
- `generate_reasoning_prompt`: LLM-enhanced causal reasoning
### Statistical Rigor with Performance
- **Assumption Validation**: Automated testing with parallel processing
- **Robustness Checks**: Cross-validation across multiple optimized methods
- **Confidence Intervals**: Uncertainty quantification with bootstrap parallelization
- **Effect Size Interpretation**: Statistical and practical significance assessment
- **Performance Monitoring**: Automatic benchmarking and optimization suggestions
---
## ๐ Requirements
### Core Dependencies
- Python 3.9+
- pandas >= 1.3.0
- numpy >= 1.21.0
- scikit-learn >= 1.0.0
- scipy >= 1.7.0
### Performance Dependencies (Automatically Installed)
- numba >= 0.56.0 (JIT compilation)
- dask >= 2022.1.0 (distributed computing)
- psutil >= 5.8.0 (resource monitoring)
### Optional Dependencies
- openai >= 1.0.0 (LLM features)
- anthropic (Claude integration)
- aiofiles (async file operations)
---
## ๐ค Support & Community
### Getting Help
- **GitHub Issues**: [Report bugs & request features](https://github.com/rdmurugan/causallm/issues)
- **GitHub Discussions**: [Community support & questions](https://github.com/rdmurugan/causallm/discussions)
- **Performance Issues**: Tag with 'performance' label
- **Email Support**: durai@infinidatum.net
- **LinkedIn**: [Durai Rajamanickam](https://www.linkedin.com/in/durai-rajamanickam)
### ๐ Documentation
- **๐ [Documentation Index](docs/DOCUMENTATION_INDEX.md)**: Complete documentation guide and navigation
- **๐ง [API Reference](docs/API_REFERENCE.md)**: Complete API documentation with all classes and methods
- **๐ [Complete User Guide](docs/COMPLETE_USER_GUIDE.md)**: Comprehensive guide with examples and best practices
- **โก [Performance Guide](docs/PERFORMANCE_GUIDE.md)**: Optimization tips and benchmarks
- **๐ญ [Domain Packages Guide](docs/DOMAIN_PACKAGES.md)**: Industry-specific components and examples
- **๐ [MCP Usage Guide](docs/MCP_USAGE.md)**: Model Context Protocol integration
- **๐ [Usage Examples](docs/USAGE_EXAMPLES.md)**: Real-world use cases across domains
- **๐ [Marketing Quick Reference](docs/MARKETING_QUICK_REFERENCE.md)**: Marketing attribution guide
- **๐ก [Examples Directory](examples/)**: Runnable code examples and tutorials
### Contributing
We welcome contributions! Areas where help is needed:
- Additional domain packages (finance, retail, manufacturing)
- New statistical methods with performance optimization
- Advanced caching strategies
- Distributed computing enhancements
See **[CONTRIBUTING.md](CONTRIBUTING.md)** for guidelines.
### Performance Support & Benchmarking
```python
# Built-in performance demo
from causallm.performance_demo import PerformanceBenchmark
benchmark = PerformanceBenchmark()
results = benchmark.run_comprehensive_benchmark([10000, 50000, 100000])
print(benchmark.generate_performance_report())
```
---
## ๐ License
MIT License - see [LICENSE](LICENSE) file for details.
---
## ๐ Citation
If you use CausalLLM in your research:
```bibtex
@software{causallm2024,
title={CausalLLM: High-Performance Causal Inference Library},
author={Durai Rajamanickam},
year={2024},
url={https://github.com/rdmurugan/causallm},
note={Performance-optimized causal inference with statistical rigor}
}
```
---
## ๐ข About
CausalLLM is developed and maintained by **Durai Rajamanickam**, with contributions from the open source community. The library aims to make causal inference more accessible while maintaining statistical rigor and providing enterprise-grade performance for production use cases.
---
**โจ Ready to discover causal insights in your data? Start with `pip install causallm` and explore the [examples](examples/) directory!**
Raw data
{
"_id": null,
"home_page": "https://github.com/rdmurugan/causallm",
"name": "causallm",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "causal-inference, machine-learning, statistics, llm, artificial-intelligence, monitoring, testing, property-based-testing, benchmarking, mutation-testing",
"author": "CausalLLM Team",
"author_email": "CausalLLM Team <durai@infinidatum.net>",
"download_url": "https://files.pythonhosted.org/packages/69/9e/e242b568cd6d9fcc47bf29e5883a593c15b0da72ab894d2e7757827b750e/causallm-4.2.0.tar.gz",
"platform": null,
"description": "# CausalLLM: High-Performance Causal Inference Library\n\n[](https://choosealicense.com/licenses/mit/)\n[](https://www.python.org/downloads/)\n[](https://badge.fury.io/py/causallm)\n[](https://github.com/rdmurugan/causallm/stargazers)\n[](https://pypi.org/project/causallm/)\n\n**CausalLLM** is a powerful Python library that combines statistical causal inference methods with advanced language models to discover causal relationships and estimate treatment effects. It provides enterprise-grade performance with 10x faster computations and 80% memory reduction while maintaining statistical rigor.\n\n## \ud83c\udd95 **New in v4.2.0: Enterprise-Grade Monitoring & Testing!**\n\n**Production-Ready Causal Inference!** CausalLLM now includes comprehensive monitoring, observability, and advanced testing capabilities:\n\n### \ud83d\udd0d **Monitoring & Observability**\n- **\ud83d\udcca Metrics Collection**: Track performance, usage patterns, and system health\n- **\ud83c\udfe5 Health Checks**: Monitor components, dependencies, and operational status \n- **\u26a1 Performance Profiling**: Detailed memory usage and execution timing analysis\n\n### \ud83e\uddea **Extended Testing Framework**\n- **\ud83c\udfb2 Property-Based Testing**: Automated property verification with Hypothesis\n- **\ud83c\udfc1 Performance Benchmarks**: Algorithm comparison and scaling analysis\n- **\ud83e\uddec Mutation Testing**: Assess test suite quality and coverage gaps\n\n### \ud83d\ude80 **Legacy Features (v4.1.0)**\n- \ud83d\udda5\ufe0f **Command Line Interface**: Run causal analysis directly from your terminal\n- \ud83c\udf10 **Interactive Web Interface**: Point-and-click analysis with Streamlit\n- \ud83d\udc0d **Python Library**: Full programmatic control (as before)\n\n---\n\n## \ud83d\ude80 Performance Highlights\n\n- **10x Faster Computations**: Vectorized algorithms with Numba JIT compilation\n- **80% Memory Reduction**: Intelligent data chunking and lazy evaluation \n- **Unlimited Scale**: Handle datasets with millions of rows through streaming processing\n- **Smart Caching**: 80%+ cache hit rates for repeated analyses\n- **Parallel Processing**: Async computations with automatic resource management\n- **Zero Configuration**: Performance optimizations work automatically\n\n---\n\n## \ud83d\udccb Table of Contents\n\n1. [Quick Start - CLI & Web](#-quick-start---cli--web) \u2b50 **New**\n2. [Quick Start - Python](#-quick-start---python)\n3. [Installation](#-installation)\n4. [Key Features](#-key-features)\n5. [Domain Examples](#-domain-examples)\n6. [Core Components](#-core-components)\n7. [Performance](#-performance)\n8. [API Documentation](#-api-documentation)\n9. [Advanced Features](#-advanced-features)\n10. [Support & Community](#-support--community)\n\n---\n\n## \ud83d\ude80 Quick Start - CLI & Web\n\n### \ud83d\udda5\ufe0f Command Line Interface\n\n**Perfect for data scientists and analysts who prefer terminal-based workflows:**\n\n```bash\n# Install CausalLLM\npip install causallm\n\n# Discover causal relationships\ncausallm discover --data healthcare_data.csv \\\n --variables \"age,treatment,outcome\" \\\n --domain healthcare \\\n --output results.json\n\n# Estimate treatment effects\ncausallm effect --data experiment.csv \\\n --treatment drug \\\n --outcome recovery \\\n --confounders \"age,gender\" \\\n --output effects.json\n\n# Generate counterfactual scenarios\ncausallm counterfactual --data patient_data.csv \\\n --intervention \"treatment=1\" \\\n --samples 200 \\\n --output scenarios.json\n\n# Get help and examples\ncausallm info --examples\n```\n\n**CLI Features:**\n- \ud83d\udd0d **Causal Discovery**: Find relationships in your data automatically\n- \u26a1 **Effect Estimation**: Quantify treatment impacts with confidence intervals\n- \ud83d\udd2e **Counterfactual Analysis**: Generate \"what-if\" scenarios\n- \ud83d\udcca **Multiple Formats**: Support for CSV, JSON input/output\n- \ud83c\udfe5 **Domain Context**: Healthcare, marketing, education, insurance\n- \ud83d\udcd6 **Built-in Help**: Examples and documentation at your fingertips\n\n### \ud83c\udf10 Interactive Web Interface\n\n**Perfect for business users, researchers, and anyone who prefers point-and-click analysis:**\n\n```bash\n# Install with web interface\npip install \"causallm[ui]\"\n\n# Launch interactive web interface\ncausallm web --port 8080\n\n# Open browser to http://localhost:8080\n```\n\n**Web Interface Features:**\n- \ud83d\udcc1 **Drag & Drop Data**: Upload CSV/JSON files or use sample datasets\n- \ud83c\udfaf **Visual Analysis**: Interactive graphs and visualizations\n- \ud83d\udcca **Real-time Results**: See analysis results as you configure parameters\n- \ud83e\udded **Guided Workflow**: Step-by-step tabs for discovery, effects, and counterfactuals\n- \ud83d\udcd6 **Built-in Documentation**: Examples and guides integrated in the interface\n- \ud83d\udd04 **Export Results**: Download analysis results and visualizations\n\n**Sample Web Analysis Workflow:**\n1. **Upload Data** \u2192 CSV/JSON files or choose from healthcare/marketing samples\n2. **Discover Relationships** \u2192 Select variables, choose domain context, view causal graph\n3. **Estimate Effects** \u2192 Pick treatment/outcome, control for confounders, see confidence intervals\n4. **Explore Counterfactuals** \u2192 Set interventions, generate scenarios, understand impacts\n5. **Export & Share** \u2192 Download results, graphs, and analysis reports\n\n### \ud83d\udcf1 Installation Options\n\n```bash\n# Basic installation (CLI + Python library)\npip install causallm\n\n# With web interface (adds Streamlit, Dash, Gradio)\npip install \"causallm[ui]\"\n\n# With plugin support (LangChain, transformers)\npip install \"causallm[plugins]\"\n\n# Full installation (everything)\npip install \"causallm[full]\"\n```\n\n---\n\n## \ud83d\ude80 Quick Start - Python\n\n### Basic High-Performance Analysis with Configuration\n\n```python\nfrom causallm import EnhancedCausalLLM\nimport pandas as pd\n\n# Initialize with automatic configuration (uses environment variables and defaults)\ncausal_llm = EnhancedCausalLLM()\n\n# OR initialize with specific configuration overrides\ncausal_llm = EnhancedCausalLLM(\n llm_provider='openai', # LLM provider\n use_async=True, # Enable async processing\n cache_dir='./cache' # Enable persistent caching\n)\n\n# Load your data (supports very large datasets)\ndata = pd.read_csv(\"your_large_data.csv\") # Can handle millions of rows\n\n# Comprehensive analysis with standardized parameter names\nresults = causal_llm.comprehensive_analysis(\n data=data, # Standardized: 'data' (not 'df')\n treatment_variable='treatment_col', # Standardized: 'treatment_variable' \n outcome_variable='outcome_col', # Standardized: 'outcome_variable'\n domain_context='healthcare' # Standardized: 'domain_context'\n)\n\nprint(f\"Effect estimate: {results.inference_results}\")\nprint(f\"Confidence: {results.confidence_score}\")\n```\n\n### Configuration-Based Setup\n\n```python\nfrom causallm.config import CausalLLMConfig\n\n# Create custom configuration\nconfig = CausalLLMConfig()\nconfig.llm.provider = 'openai'\nconfig.performance.use_async = True\nconfig.performance.chunk_size = 50000\nconfig.statistical.significance_level = 0.01\n\n# Initialize with configuration\ncausal_llm = EnhancedCausalLLM(config=config)\n\n# Or use configuration file\ncausal_llm = EnhancedCausalLLM(config_file='my_config.json')\n```\n\n### Environment Variable Configuration\n\n```bash\n# Set environment variables for automatic configuration\nexport CAUSALLM_LLM_PROVIDER=openai\nexport CAUSALLM_USE_ASYNC=true\nexport CAUSALLM_CHUNK_SIZE=10000\nexport CAUSALLM_CACHE_DIR=./cache\nexport OPENAI_API_KEY=your-api-key\n\n# No configuration needed - automatically uses environment variables\npython -c \"\nfrom causallm import EnhancedCausalLLM\ncausal_llm = EnhancedCausalLLM() # Automatically configured\n\"\n```\n\n### Memory-Efficient Processing for Large Datasets\n\n```python\nfrom causallm.core.data_processing import DataChunker, StreamingDataProcessor\n\n# Process datasets that don't fit in memory\nprocessor = StreamingDataProcessor()\n\ndef analyze_chunk(chunk_data):\n return chunk_data.corr()\n\n# Stream and process large CSV files\nresults = processor.process_streaming(\n \"very_large_data.csv\",\n analyze_chunk,\n aggregation_func=lambda results: pd.concat(results).mean()\n)\n```\n\n---\n\n## \ud83d\udce6 Installation\n\n**Choose the installation that fits your workflow:**\n\n```bash\n# Basic installation (CLI + Python library)\npip install causallm\n\n# With monitoring and testing features (recommended for development)\npip install \"causallm[testing]\"\n\n# With web interface (recommended for most users)\npip install \"causallm[ui]\"\n\n# With plugin support (LangChain, transformers, etc.)\npip install \"causallm[plugins]\"\n\n# Full installation (everything - web, plugins, dev tools)\npip install \"causallm[full]\"\n\n# Development installation\npip install \"causallm[dev]\"\n```\n\n**After Installation:**\n```bash\n# Test CLI\ncausallm --help\n\n# Launch web interface (if installed with [ui])\ncausallm web\n\n# Use in Python\npython -c \"from causallm import CausalLLM; print('Ready!')\"\n```\n\n---\n\n## \u2728 Key Features\n\n### \ud83d\udd0d **Monitoring & Observability** \u2b50 *New in v4.2.0*\n- **Comprehensive Metrics Collection**: Track performance, usage patterns, and system health with thread-safe collectors\n- **Advanced Health Checks**: Monitor system resources, database connectivity, LLM provider APIs, and custom components\n- **Performance Profiling**: Detailed memory usage tracking, execution timing, and statistical analysis\n- **Real-time Monitoring**: Background monitoring with configurable intervals and alerting\n- **Export & Integration**: JSON export for external monitoring systems (Prometheus, Grafana, etc.)\n\n### \ud83e\uddea **Extended Testing Framework** \u2b50 *New in v4.2.0*\n- **Property-Based Testing**: Automated property verification using Hypothesis with causal-specific strategies\n- **Performance Benchmarks**: Algorithm comparison, scaling analysis, and statistical performance evaluation\n- **Mutation Testing**: Assess test suite quality with AST-based code mutations and survival analysis\n- **Causal Test Strategies**: Generate realistic datasets with known causal structures for robust testing\n- **Comprehensive Test Runner**: Unified test execution with detailed reporting and analysis\n\n### \ud83d\udda5\ufe0f **CLI & Web Interfaces** \u2b50 *New in v4.1.0*\n- **Command Line Tool**: `causallm` command for terminal-based analysis\n- **Interactive Web Interface**: Streamlit-based GUI for point-and-click analysis \n- **No Python Required**: Full causal inference without programming\n- **Multiple Input Formats**: CSV, JSON data support with sample datasets\n- **Export Capabilities**: Download results, graphs, and analysis reports\n\n### \ud83c\udfaf **Standardized Interfaces** \u2b50 *New*\n- **Consistent Parameter Names**: Same parameter names across all components (`data`, `treatment_variable`, `outcome_variable`)\n- **Unified Async Support**: All methods support both sync and async with identical interfaces \n- **Protocol-Based Design**: Type-safe interfaces ensuring consistency\n- **Rich Metadata**: Comprehensive analysis metadata with execution tracking\n\n### \u2699\ufe0f **Centralized Configuration** \u2b50 *New* \n- **Environment Variable Support**: Automatic configuration from environment variables\n- **Configuration Files**: JSON-based configuration with validation\n- **Multiple Environments**: Development, testing, and production configurations\n- **Dynamic Updates**: Runtime configuration updates with validation\n\n### \ud83e\udde0 Statistical Causal Inference\n- **Multiple Methods**: Linear regression, propensity score matching, instrumental variables, doubly robust estimation\n- **Assumption Testing**: Automated validation of causal inference assumptions\n- **Robustness Checks**: Cross-validation across multiple statistical approaches\n- **Performance Optimized**: Vectorized algorithms for large-scale analysis\n\n### \ud83d\udd0d Causal Structure Discovery \n- **PC Algorithm**: Implementation for discovering relationships from data\n- **Parallel Processing**: Async independence testing for faster discovery\n- **LLM Enhancement**: Optional integration with language models for domain expertise\n- **Scalable**: Chunked processing for very large variable sets\n\n### \ud83c\udfed Domain-Specific Packages\n- **[Healthcare](#healthcare-domain)**: Clinical trial analysis, treatment effectiveness, patient outcomes\n- **[Insurance](#insurance-domain)**: Risk assessment, premium optimization, claims analysis \n- **[Marketing](#marketing-domain)**: Campaign attribution, ROI optimization, customer analytics\n- **Education**: Student outcomes, intervention analysis, policy evaluation\n- **Experimentation**: A/B testing, experimental design validation\n\n### \ud83d\udd27 Advanced Performance Features\n- **Data Chunking**: Automatic memory-efficient processing of large datasets\n- **Intelligent Caching**: Multi-tier caching (memory + disk) with smart invalidation\n- **Vectorized Algorithms**: Numba-optimized statistical computations\n- **Async Processing**: Parallel execution of independent computations\n- **Lazy Evaluation**: Deferred computation until results are needed\n- **Resource Monitoring**: Automatic memory and CPU usage optimization\n\n### \ud83c\udf10 LLM Integrations\n- **Multiple Providers**: OpenAI, Anthropic, LLaMA, local models\n- **Optional Usage**: Library works fully without API keys using statistical methods\n- **MCP Support**: Model Context Protocol for advanced integrations\n\n---\n\n## \ud83d\udd0d Monitoring & Testing Examples\n\n### Quick Start: Monitoring in Production\n\n```python\nfrom causallm.monitoring import configure_metrics, get_global_health_checker\nfrom causallm.monitoring.profiler import profile, profile_block\n\n# Configure comprehensive monitoring\ncollector = configure_metrics(enabled=True, collection_interval=30)\nhealth_checker = get_global_health_checker()\n\n# Profile your causal inference functions\n@profile(name=\"causal_discovery\", track_memory=True)\nasync def run_causal_analysis(data):\n # Your causal inference code\n results = await causal_llm.discover_causal_relationships(data, variables)\n \n # Manual metrics recording\n collector.record_causal_discovery(\n variables_count=len(variables),\n duration=time.time() - start_time,\n method='PC',\n success=True\n )\n return results\n\n# Monitor system health\nhealth_status = await health_checker.run_all_health_checks()\nprint(f\"System status: {health_status}\")\n```\n\n### Property-Based Testing for Causal Methods\n\n```python\nfrom causallm.testing import CausalDataStrategy, causal_hypothesis_test\nfrom hypothesis import given\n\nclass TestCausalInference:\n @given(CausalDataStrategy.numeric_data(['X', 'Y', 'Z'], min_rows=100))\n @causal_hypothesis_test(\n strategy=CausalDataStrategy.numeric_data(['X', 'Y', 'Z']),\n property_func=lambda result, data: result is not None\n )\n def test_causal_discovery_properties(self, data):\n \"\"\"Test that causal discovery returns valid results.\"\"\"\n result = my_causal_discovery_function(data)\n \n # Property: Results should be deterministic for same data\n result2 = my_causal_discovery_function(data)\n assert result == result2\n \n # Property: Number of edges should be reasonable\n assert len(result.edges) <= len(data.columns) ** 2\n \n return result\n```\n\n### Performance Benchmarking\n\n```python\nfrom causallm.testing import BenchmarkSuite, CausalBenchmarkSuite\n\n# Compare different causal discovery algorithms\nalgorithms = {\n 'pc_algorithm': my_pc_implementation,\n 'ges_algorithm': my_ges_implementation,\n 'direct_lingam': my_lingam_implementation\n}\n\nbenchmark_suite = CausalBenchmarkSuite(\n data_sizes=[100, 500, 1000, 5000],\n variable_counts=[5, 10, 15, 20]\n)\n\n# Run comprehensive benchmarks\nresults = {}\nfor name, algorithm in algorithms.items():\n results[name] = benchmark_suite.benchmark_causal_discovery(algorithm, name)\n\n# Compare performance\ncomparison = benchmark_suite.compare_algorithms(\n {name: benchmark_suite.results[name] for name in algorithms.keys()}\n)\n\nprint(f\"Fastest algorithm: {comparison['_summary']['fastest_algorithm']}\")\nprint(f\"Most memory efficient: {comparison['_summary']['most_memory_efficient']}\")\n```\n\n### Mutation Testing for Test Quality\n\n```python\nfrom causallm.testing import MutationTestRunner, MutationTestConfig\n\n# Configure mutation testing\nconfig = MutationTestConfig(\n target_files=['causallm/core/causal_discovery.py'],\n test_command='pytest tests/test_causal_discovery.py -v',\n mutation_score_threshold=0.8,\n max_mutations_per_file=50\n)\n\n# Run mutation tests\nrunner = MutationTestRunner(config)\nresults = runner.run_mutation_tests()\n\nprint(f\"Mutation Score: {results['mutation_score']:.2%}\")\nprint(f\"Test Quality: {'Good' if results['passed_threshold'] else 'Needs Improvement'}\")\n\n# Analyze weak spots\nif not results['passed_threshold']:\n print(\"Files needing better tests:\")\n for file_path, stats in results['results_by_file'].items():\n if stats['mutation_score'] < 0.7:\n print(f\" {file_path}: {stats['mutation_score']:.2%}\")\n```\n\n### Complete Monitoring Dashboard\n\n```python\nimport asyncio\nfrom causallm.monitoring import MetricsCollector, HealthChecker, PerformanceProfiler\n\nclass CausalLLMMonitor:\n def __init__(self):\n self.metrics = MetricsCollector(enabled=True)\n self.health_checker = HealthChecker(enabled=True)\n self.profiler = PerformanceProfiler(enabled=True)\n \n async def get_system_status(self):\n \"\"\"Get comprehensive system status.\"\"\"\n # Health checks\n health_results = await self.health_checker.run_all_health_checks()\n overall_health = self.health_checker.get_overall_health()\n \n # Performance metrics\n metrics_summary = self.metrics.get_metrics_summary()\n performance_summary = self.profiler.get_performance_summary()\n \n return {\n 'health': overall_health,\n 'metrics': metrics_summary,\n 'performance': performance_summary,\n 'timestamp': datetime.now().isoformat()\n }\n \n async def monitor_continuously(self, interval=60):\n \"\"\"Continuous monitoring loop.\"\"\"\n await self.health_checker.start_background_monitoring(interval)\n \n while True:\n status = await self.get_system_status()\n \n # Alert on issues\n if status['health']['status'] != 'healthy':\n await self.send_alert(status)\n \n await asyncio.sleep(interval)\n\n# Usage\nmonitor = CausalLLMMonitor()\nstatus = await monitor.get_system_status()\n```\n\n---\n\n## \ud83c\udfe5 Domain Examples\n\n### Healthcare Domain\n\nTransform clinical data analysis with domain-specific expertise:\n\n```python\nfrom causallm import HealthcareDomain, EnhancedCausalLLM\n\n# Initialize with healthcare configuration\ncausal_llm = EnhancedCausalLLM(\n config_file='healthcare_config.json', # Domain-specific configuration\n domain_context='healthcare'\n)\n\nhealthcare = HealthcareDomain()\n\n# Generate realistic clinical trial data (scalable)\nclinical_data = healthcare.generate_clinical_trial_data(\n n_patients=100000, # Large dataset support\n treatment_arms=['control', 'treatment_a', 'treatment_b']\n)\n\n# Treatment effectiveness analysis with standardized interface\nresults = causal_llm.estimate_causal_effect(\n data=clinical_data, # Standardized parameter\n treatment_variable='treatment_group', # Standardized parameter\n outcome_variable='recovery_time', # Standardized parameter \n covariate_variables=['age', 'baseline_severity', 'comorbidities']\n)\n\nprint(f\"Treatment effect: {results.primary_effect.estimate:.2f} days\")\nprint(f\"Confidence interval: {results.primary_effect.confidence_interval}\")\nprint(f\"Clinical significance: {results.interpretation}\")\n```\n\n**Healthcare Features:**\n- Clinical trial data generation with proper randomization\n- Treatment effectiveness analysis with medical context\n- Safety analysis and adverse event evaluation\n- Patient outcome prediction with clinical insights\n\n### Insurance Domain\n\nOptimize risk assessment and premium pricing:\n\n```python\nfrom causallm import InsuranceDomain, EnhancedCausalLLM\n\n# Initialize with insurance-optimized configuration\ncausal_llm = EnhancedCausalLLM(\n config_file='insurance_config.json',\n use_async=True, # Handle large policy datasets\n chunk_size=50000 # Optimize for policy data\n)\n\ninsurance = InsuranceDomain()\n\n# Generate large-scale policy data\npolicy_data = insurance.generate_stop_loss_data(n_policies=500000)\n\n# Risk factor analysis with standardized interface\nrisk_results = causal_llm.estimate_causal_effect(\n data=policy_data, # Standardized parameter\n treatment_variable='industry_type', # Standardized parameter\n outcome_variable='total_claim_amount', # Standardized parameter\n covariate_variables=['company_size', 'policy_limit', 'geographic_region']\n)\n\nprint(f\"Industry risk effect: ${risk_results.primary_effect.estimate:,.0f}\")\nprint(f\"Statistical significance: p = {risk_results.primary_effect.p_value:.6f}\")\nprint(f\"Confidence level: {risk_results.confidence_level}\")\n```\n\n**Insurance Features:**\n- Stop loss insurance data simulation\n- Risk factor analysis with actuarial insights\n- Premium optimization recommendations\n- Claims prediction and underwriting support\n\n### Marketing Domain\n\nMaster campaign attribution and ROI optimization:\n\n```python\nfrom causallm.domains.marketing import MarketingDomain\nfrom causallm import EnhancedCausalLLM\n\n# Initialize with marketing-optimized configuration\ncausal_llm = EnhancedCausalLLM(\n config_file='marketing_config.json',\n llm_provider='openai', # For enhanced attribution insights\n use_async=True # Handle large touchpoint datasets\n)\n\nmarketing = MarketingDomain(enable_performance_optimizations=True)\n\n# Generate sample marketing data\nmarketing_data = marketing.generate_marketing_data(\n n_customers=10000,\n n_touchpoints=30000\n)\n\n# Comprehensive attribution analysis with standardized interface\nattribution_result = causal_llm.comprehensive_analysis(\n data=marketing_data, # Standardized parameter\n treatment_variable='channel_spend', # Standardized parameter\n outcome_variable='conversion_value', # Standardized parameter\n covariate_variables=['customer_segment', 'touchpoint_sequence'],\n domain_context='marketing' # Standardized parameter\n)\n\nprint(f\"Overall attribution confidence: {attribution_result.confidence_score:.2f}\")\nfor insight in attribution_result.actionable_insights[:3]:\n print(f\"\u2022 {insight}\")\n```\n\n**Marketing Features:**\n- Multi-touch attribution modeling (first-touch, last-touch, data-driven, Shapley)\n- Campaign ROI analysis and optimization\n- Cross-device and cross-channel attribution\n- Customer lifetime value modeling\n\n**Quick Reference - Attribution Models:**\n| Model | Best For | Description |\n|-------|----------|-------------|\n| `data_driven` | **Recommended** | Uses causal inference for attribution |\n| `first_touch` | Brand awareness | 100% credit to first interaction |\n| `last_touch` | Direct response | 100% credit to last interaction |\n| `linear` | Balanced view | Equal credit across touchpoints |\n| `shapley` | Advanced | Game theory based attribution |\n\n---\n\n## \ud83c\udfd7\ufe0f Core Components\n\n### EnhancedCausalLLM\nHigh-performance main class with **standardized interfaces** and **centralized configuration management**.\n\n```python\nfrom causallm import EnhancedCausalLLM\nfrom causallm.config import CausalLLMConfig\n\n# Configuration-driven initialization (recommended)\ncausal_llm = EnhancedCausalLLM(config_file='my_config.json')\n\n# OR with parameter overrides\ncausal_llm = EnhancedCausalLLM(\n config_file='base_config.json',\n llm_provider='openai', # Override configuration \n use_async=True, # Enable async processing\n cache_dir='./cache' # Custom cache location\n)\n\n# OR programmatic configuration\nconfig = CausalLLMConfig()\nconfig.llm.provider = 'openai'\nconfig.llm.model = 'gpt-4'\nconfig.performance.use_async = True\nconfig.statistical.significance_level = 0.01\ncausal_llm = EnhancedCausalLLM(config=config)\n\n# OR automatic configuration from environment variables\ncausal_llm = EnhancedCausalLLM() # Uses env vars + defaults\n```\n\n#### **New Configuration Features:**\n- **Environment Variable Support**: Automatic configuration from `CAUSALLM_*` environment variables\n- **Configuration Files**: JSON-based configuration with validation and inheritance\n- **Dynamic Updates**: Runtime configuration changes with `update_configuration()`\n- **Performance Metrics**: Built-in execution tracking with `get_performance_metrics()`\n\n### Statistical Methods (Performance Optimized)\n- **Vectorized Linear Regression**: NumPy/Numba optimized for large datasets\n- **Fast Propensity Score Matching**: Efficient matching algorithms with parallel processing \n- **Optimized Instrumental Variables**: Matrix operations optimized for speed\n- **Parallel PC Algorithm**: Concurrent independence testing for causal discovery\n\n### Domain Packages (Scalable)\nPre-configured, performance-optimized components for specific industries with built-in expertise and realistic data generators.\n\n---\n\n## \u26a1 Performance\n\n### Dataset Size Support\n- **Small Datasets** (< 10K rows): Instant analysis with full feature set\n- **Medium Datasets** (10K - 100K rows): Automatic optimization, ~2-5x speedup\n- **Large Datasets** (100K - 1M rows): Chunked processing, async operations\n- **Very Large Datasets** (> 1M rows): Streaming analysis, distributed computing\n\n### Speed Improvements\n- **Correlation Analysis**: 10x faster with Numba vectorization\n- **Causal Discovery**: 5x faster with parallel independence testing \n- **Effect Estimation**: 3x faster with optimized matching algorithms\n- **Repeated Analysis**: 20x+ faster with intelligent caching\n\n### Memory Efficiency \n- **Data Chunking**: Process datasets 10x larger than available RAM\n- **Lazy Evaluation**: 60-80% memory reduction through deferred computation\n- **Smart Caching**: Configurable memory vs. disk trade-offs\n\n### Performance Configuration Examples\n\n```python\n# Small datasets (< 10K rows)\ncausal_llm = EnhancedCausalLLM(\n enable_performance_optimizations=False # Overhead not worth it\n)\n\n# Large datasets (100K+ rows)\ncausal_llm = EnhancedCausalLLM(\n enable_performance_optimizations=True,\n chunk_size=50000,\n use_async=True,\n cache_dir=\"./cache\",\n max_memory_usage_gb=8\n)\n```\n\n---\n\n## \ud83d\udcda API Documentation\n\n### Core Methods\n\n#### `comprehensive_analysis()`\nComplete end-to-end causal analysis combining discovery and inference.\n\n```python\nanalysis = causal_llm.comprehensive_analysis(\n data=df, # Required: Your dataset\n treatment='campaign', # Optional: Specific treatment\n outcome='revenue', # Optional: Specific outcome \n domain='marketing', # Optional: Domain context\n covariates=['age', 'income'] # Optional: Control variables\n)\n```\n\n**Returns:** `ComprehensiveCausalAnalysis` with:\n- `discovery_results`: Causal structure findings\n- `inference_results`: Detailed effect estimates\n- `domain_recommendations`: Domain-specific advice\n- `actionable_insights`: List of actionable findings\n- `confidence_score`: Overall analysis confidence (0-1)\n\n#### `discover_causal_relationships()`\nAutomatically discover causal relationships in your data.\n\n```python\ndiscovery = causal_llm.discover_causal_relationships(\n data=df,\n variables=['age', 'treatment', 'outcome'],\n domain='healthcare'\n)\n```\n\n**Returns:** `CausalDiscoveryResult` with discovered edges, confounders, and domain insights.\n\n#### `estimate_causal_effect()`\nEstimate the causal effect of a treatment on an outcome.\n\n```python\neffect = causal_llm.estimate_causal_effect(\n data=df,\n treatment='new_drug',\n outcome='recovery_rate',\n covariates=['age', 'severity'],\n method='comprehensive' # 'regression', 'matching', 'iv'\n)\n```\n\n**Returns:** `CausalInferenceResult` with effect estimates, confidence intervals, and robustness checks.\n\n### Statistical Methods\n\nAvailable through `StatisticalCausalInference`:\n\n- `CausalMethod.LINEAR_REGRESSION`: Standard regression with covariates\n- `CausalMethod.MATCHING`: Propensity score matching\n- `CausalMethod.INSTRUMENTAL_VARIABLES`: Two-stage least squares\n- `CausalMethod.REGRESSION_DISCONTINUITY`: RDD (if applicable)\n- `CausalMethod.DIFFERENCE_IN_DIFFERENCES`: DiD (if applicable)\n\n### Domain Packages API\n\nEach domain package provides:\n- **Data Generators**: Realistic synthetic data with proper causal structure\n- **Domain Knowledge**: Expert knowledge about relationships and confounders\n- **Analysis Templates**: Pre-configured workflows with domain-specific interpretation\n\n---\n\n## \ud83d\udd27 Advanced Features\n\n### Cached Analysis for Faster Iterations\n\n```python\nfrom causallm.core.caching import StatisticalComputationCache\n\n# Enable persistent caching across sessions\ncausal_llm = EnhancedCausalLLM(cache_dir=\"./causallm_cache\")\n\n# First run computes and caches\nresult1 = causal_llm.estimate_causal_effect(data, 'treatment', 'outcome')\n\n# Second run uses cached results (10x+ faster) \nresult2 = causal_llm.estimate_causal_effect(data, 'treatment', 'outcome')\n```\n\n### Async Processing for Maximum Performance\n\n```python\nimport asyncio\nfrom causallm.core.async_processing import AsyncCausalAnalysis\n\nasync def parallel_analysis():\n async_causal = AsyncCausalAnalysis()\n \n # Parallel correlation analysis\n corr_matrix = await async_causal.parallel_correlation_analysis(\n large_data, chunk_size=5000\n )\n \n # Parallel bootstrap analysis \n bootstrap_results = await async_causal.parallel_bootstrap_analysis(\n large_data, analysis_func=my_analysis, n_bootstrap=1000\n )\n \n return corr_matrix, bootstrap_results\n\n# Run async analysis\nresults = asyncio.run(parallel_analysis())\n```\n\n### MCP Server Integration\n\nCausalLLM provides Model Context Protocol (MCP) server capabilities for integration with Claude Desktop, VS Code, and other MCP-enabled applications:\n\n```bash\n# Start MCP server for integration with Claude Desktop, VS Code, etc.\npython -m causallm.mcp.server --port 8000\n```\n\n**Available MCP tools:**\n- `simulate_counterfactual`: Generate counterfactual scenarios\n- `analyze_treatment_effect`: High-performance treatment analysis \n- `extract_causal_edges`: Parallel causal relationship extraction\n- `generate_reasoning_prompt`: LLM-enhanced causal reasoning\n\n### Statistical Rigor with Performance\n\n- **Assumption Validation**: Automated testing with parallel processing\n- **Robustness Checks**: Cross-validation across multiple optimized methods\n- **Confidence Intervals**: Uncertainty quantification with bootstrap parallelization \n- **Effect Size Interpretation**: Statistical and practical significance assessment\n- **Performance Monitoring**: Automatic benchmarking and optimization suggestions\n\n---\n\n## \ud83d\udccb Requirements\n\n### Core Dependencies\n- Python 3.9+\n- pandas >= 1.3.0\n- numpy >= 1.21.0 \n- scikit-learn >= 1.0.0\n- scipy >= 1.7.0\n\n### Performance Dependencies (Automatically Installed)\n- numba >= 0.56.0 (JIT compilation)\n- dask >= 2022.1.0 (distributed computing)\n- psutil >= 5.8.0 (resource monitoring)\n\n### Optional Dependencies\n- openai >= 1.0.0 (LLM features)\n- anthropic (Claude integration)\n- aiofiles (async file operations)\n\n---\n\n## \ud83e\udd1d Support & Community\n\n### Getting Help\n\n- **GitHub Issues**: [Report bugs & request features](https://github.com/rdmurugan/causallm/issues)\n- **GitHub Discussions**: [Community support & questions](https://github.com/rdmurugan/causallm/discussions)\n- **Performance Issues**: Tag with 'performance' label\n- **Email Support**: durai@infinidatum.net\n- **LinkedIn**: [Durai Rajamanickam](https://www.linkedin.com/in/durai-rajamanickam)\n\n### \ud83d\udcda Documentation\n\n- **\ud83d\udccb [Documentation Index](docs/DOCUMENTATION_INDEX.md)**: Complete documentation guide and navigation\n- **\ud83d\udd27 [API Reference](docs/API_REFERENCE.md)**: Complete API documentation with all classes and methods\n- **\ud83d\udcd6 [Complete User Guide](docs/COMPLETE_USER_GUIDE.md)**: Comprehensive guide with examples and best practices\n- **\u26a1 [Performance Guide](docs/PERFORMANCE_GUIDE.md)**: Optimization tips and benchmarks \n- **\ud83c\udfed [Domain Packages Guide](docs/DOMAIN_PACKAGES.md)**: Industry-specific components and examples\n- **\ud83d\udd17 [MCP Usage Guide](docs/MCP_USAGE.md)**: Model Context Protocol integration\n- **\ud83d\udcda [Usage Examples](docs/USAGE_EXAMPLES.md)**: Real-world use cases across domains\n- **\ud83d\udcc8 [Marketing Quick Reference](docs/MARKETING_QUICK_REFERENCE.md)**: Marketing attribution guide\n- **\ud83d\udca1 [Examples Directory](examples/)**: Runnable code examples and tutorials\n\n### Contributing\n\nWe welcome contributions! Areas where help is needed:\n- Additional domain packages (finance, retail, manufacturing)\n- New statistical methods with performance optimization\n- Advanced caching strategies\n- Distributed computing enhancements\n\nSee **[CONTRIBUTING.md](CONTRIBUTING.md)** for guidelines.\n\n### Performance Support & Benchmarking\n\n```python\n# Built-in performance demo\nfrom causallm.performance_demo import PerformanceBenchmark\n\nbenchmark = PerformanceBenchmark()\nresults = benchmark.run_comprehensive_benchmark([10000, 50000, 100000])\nprint(benchmark.generate_performance_report())\n```\n\n---\n\n## \ud83d\udcc4 License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n---\n\n## \ud83d\udcd6 Citation\n\nIf you use CausalLLM in your research:\n\n```bibtex\n@software{causallm2024,\n title={CausalLLM: High-Performance Causal Inference Library},\n author={Durai Rajamanickam},\n year={2024},\n url={https://github.com/rdmurugan/causallm},\n note={Performance-optimized causal inference with statistical rigor}\n}\n```\n\n---\n\n## \ud83c\udfe2 About\n\nCausalLLM is developed and maintained by **Durai Rajamanickam**, with contributions from the open source community. The library aims to make causal inference more accessible while maintaining statistical rigor and providing enterprise-grade performance for production use cases.\n\n---\n\n**\u2728 Ready to discover causal insights in your data? Start with `pip install causallm` and explore the [examples](examples/) directory!**\n",
"bugtrack_url": null,
"license": null,
"summary": "Production-ready causal inference with comprehensive monitoring, testing, and LLM integration",
"version": "4.2.0",
"project_urls": {
"Bug Tracker": "https://github.com/rdmurugan/causallm/issues",
"Homepage": "https://github.com/rdmurugan/causallm",
"Repository": "https://github.com/rdmurugan/causallm"
},
"split_keywords": [
"causal-inference",
" machine-learning",
" statistics",
" llm",
" artificial-intelligence",
" monitoring",
" testing",
" property-based-testing",
" benchmarking",
" mutation-testing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "6e40de1d73251978eb77588e6bf6efb7bd272423b1ec22ad0484476997e1c7ae",
"md5": "2ecbaaa3fddd53d168302b8b1e13fef6",
"sha256": "975e93709aeeb6fa8f69f5c94e665d3e358b7adcf4b9f4c41bdcc4f3b7a18c63"
},
"downloads": -1,
"filename": "causallm-4.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2ecbaaa3fddd53d168302b8b1e13fef6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 241995,
"upload_time": "2025-09-09T17:14:51",
"upload_time_iso_8601": "2025-09-09T17:14:51.642593Z",
"url": "https://files.pythonhosted.org/packages/6e/40/de1d73251978eb77588e6bf6efb7bd272423b1ec22ad0484476997e1c7ae/causallm-4.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "699ee242b568cd6d9fcc47bf29e5883a593c15b0da72ab894d2e7757827b750e",
"md5": "aea10f9d9d8ad34839e07c93d63f9e3d",
"sha256": "545d6a05337af14a5551b2b725ff7fe2eccf2f9319b7b8104e6c2c6b7ead0929"
},
"downloads": -1,
"filename": "causallm-4.2.0.tar.gz",
"has_sig": false,
"md5_digest": "aea10f9d9d8ad34839e07c93d63f9e3d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 296551,
"upload_time": "2025-09-09T17:14:52",
"upload_time_iso_8601": "2025-09-09T17:14:52.936461Z",
"url": "https://files.pythonhosted.org/packages/69/9e/e242b568cd6d9fcc47bf29e5883a593c15b0da72ab894d2e7757827b750e/causallm-4.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-09 17:14:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "rdmurugan",
"github_project": "causallm",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.21.0"
],
[
"<",
"2.0.0"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"1.3.0"
],
[
"<",
"3.0.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.0.0"
],
[
"<",
"2.0.0"
]
]
},
{
"name": "networkx",
"specs": [
[
"<",
"4.0.0"
],
[
">=",
"2.6.0"
]
]
},
{
"name": "scipy",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.7.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.3.0"
],
[
"<",
"4.0.0"
]
]
},
{
"name": "plotly",
"specs": [
[
">=",
"5.0.0"
]
]
},
{
"name": "openai",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "PyYAML",
"specs": [
[
">=",
"6.0.0"
]
]
},
{
"name": "requests",
"specs": [
[
">=",
"2.28.0"
]
]
},
{
"name": "statsmodels",
"specs": [
[
">=",
"0.13.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
">=",
"0.11.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"7.0.0"
]
]
},
{
"name": "pytest-asyncio",
"specs": [
[
">=",
"0.21.0"
]
]
},
{
"name": "pytest-cov",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "pytest-mock",
"specs": [
[
">=",
"3.10.0"
]
]
},
{
"name": "pytest-timeout",
"specs": [
[
">=",
"2.1.0"
]
]
},
{
"name": "mypy",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "black",
"specs": [
[
">=",
"22.0.0"
]
]
},
{
"name": "flake8",
"specs": [
[
">=",
"5.0.0"
]
]
},
{
"name": "isort",
"specs": [
[
">=",
"5.0.0"
]
]
},
{
"name": "types-PyYAML",
"specs": [
[
">=",
"6.0.0"
]
]
},
{
"name": "types-requests",
"specs": [
[
">=",
"2.28.0"
]
]
}
],
"lcname": "causallm"
}