# Discrete Choice Model Benchmarking (DCMBench)
A comprehensive Python package for benchmarking, analyzing, and validating discrete choice models for transportation mode choice analysis. DCMBench provides a unified framework for comparing different model specifications, conducting sensitivity analysis, and visualizing results.
## Installation
You can install DCMBench using pip:
```bash
pip install dcmbench
```
## Key Features
### Model Estimation and Benchmarking
- **Multiple Model Types**: Support for Multinomial Logit (MNL), Nested Logit (NL), and Mixed Logit (ML) models
- **Standardized Metrics**: Compare models using log-likelihood, rho-squared, prediction accuracy, and market share
- **Cross-validation**: Evaluate model performance on training and testing datasets
- **Visualization**: Generate comparative plots showing model performance across different metrics
```python
from dcmbench.model_benchmarker import Benchmarker
from dcmbench.datasets import fetch_data
# Load dataset (automatically downloads if not in local cache)
data = fetch_data("swissmetro_dataset")
# Define models and run benchmark
benchmarker = Benchmarker()
benchmarker.register_model(models, "Model Name")
results = benchmarker.run_benchmark(data, choice_column="CHOICE")
benchmarker.print_comparison()
```
### Advanced Analysis Capabilities
- **Sensitivity Analysis**: Evaluate how model predictions change with variations in key variables
- Simulate changes in travel times, costs, and other attributes
- Generate plots showing the evolution of market shares under different scenarios
- **Individual-Level Parameters**: For Mixed Logit models, calculate and visualize:
- Individual-specific parameter distributions using Bayesian approaches
- Value of Time (VOT) distributions across the population
- Heterogeneity in preference structures
- **Model Calibration**: Automatically calibrate Alternative Specific Constants (ASCs) to match observed market shares
### Visualization and Reporting
- **Market Share Analysis**: Compare predicted vs. observed mode shares
- **Performance Plots**: Visualize how different models perform across multiple datasets
- **Parameter Distributions**: Plot distributions of random parameters and derived metrics like VOT
- **Sensitivity Curves**: Show how predicted mode shares change with variations in key variables
## Supported Datasets
- **Swissmetro** (`swissmetro_dataset`): Swiss inter-city travel mode choice
- **London Transport** (`ltds_dataset`): London Travel Demand Survey with urban mode choices
- **ModeCanada** (`modecanada_dataset`): Canadian inter-city travel dataset
Datasets are automatically downloaded from the [dcmbench-datasets](https://github.com/carlosguirado/dcmbench-datasets) repository on first use and cached locally:
```python
from dcmbench.datasets import fetch_data
# Use default cache location (~/.dcmbench/datasets)
data = fetch_data("swissmetro_dataset")
# Specify custom cache location
data = fetch_data("swissmetro_dataset", local_cache_dir="/path/to/cache")
# Get features and target separately
X, y = fetch_data("swissmetro_dataset", return_X_y=True)
```
## Example Applications
### Benchmarking Multiple Models
The package includes tools to benchmark multiple model types across different datasets:
```python
# Run benchmark_all_models.py to compare models across datasets
python benchmark_all_models.py
```
This generates comparative visualizations showing how different model types perform across datasets, plotting metrics like choice accuracy and market share accuracy against model fit.
### Sensitivity Analysis
Analyze how changes in key variables affect predicted mode shares:
```python
# Run sensitivity analysis on ModeCanada models
python sensitivity_analysis.py
```
This creates plots showing the evolution of mode shares as you modify variables like:
- Travel costs for different modes
- Travel times
- Service frequencies
### Individual Parameter Analysis
For Mixed Logit models, analyze individual-level parameters and VOT:
```python
# Generate individual parameter distributions for ModeCanada
python plot_individual_parameters_canada.py
```
This calculates individual-specific parameters using Bayesian conditioning and produces:
- Distributions of time and cost parameters
- Value of Time (VOT) distributions
- Summary statistics for preference heterogeneity
## Requirements
- Python >=3.8
- NumPy >=2.0.0
- Pandas >=2.0.0
- Biogeme >=3.2.14
- Matplotlib >=3.0.0
- Requests >=2.25.0
- SciPy (for statistical functions)
- Seaborn (for advanced visualizations)
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Contributing
We welcome contributions! Please see our contributing guidelines for details.
### Adding New Datasets
To add a new dataset to the DCMBench ecosystem:
1. Fork the [dcmbench-datasets](https://github.com/carlosguirado/dcmbench-datasets) repository
2. Add your dataset following the structure guidelines in the repository's CONTRIBUTING.md file
3. Submit a pull request to the dcmbench-datasets repository
4. Update the metadata.json file in the main DCMBench package to include your dataset information
This design allows the package to remain lightweight while providing access to a growing collection of transportation mode choice datasets.
Raw data
{
"_id": null,
"home_page": "https://github.com/carlosguirado/dcmbench",
"name": "dcmbench",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Carlos Guirado <your.email@example.com>",
"keywords": "discrete choice, benchmarking, transportation, econometrics, machine learning",
"author": "Carlos Guirado",
"author_email": "Carlos Guirado <your.email@example.com>",
"download_url": "https://files.pythonhosted.org/packages/60/85/c08e972c2dd2f6463abcd1e4ef939a27dc6970339a05070fb1527555d999/dcmbench-0.1.2.tar.gz",
"platform": null,
"description": "# Discrete Choice Model Benchmarking (DCMBench)\n\nA comprehensive Python package for benchmarking, analyzing, and validating discrete choice models for transportation mode choice analysis. DCMBench provides a unified framework for comparing different model specifications, conducting sensitivity analysis, and visualizing results.\n\n## Installation\n\nYou can install DCMBench using pip:\n\n```bash\npip install dcmbench\n```\n\n## Key Features\n\n### Model Estimation and Benchmarking\n\n- **Multiple Model Types**: Support for Multinomial Logit (MNL), Nested Logit (NL), and Mixed Logit (ML) models\n- **Standardized Metrics**: Compare models using log-likelihood, rho-squared, prediction accuracy, and market share\n- **Cross-validation**: Evaluate model performance on training and testing datasets\n- **Visualization**: Generate comparative plots showing model performance across different metrics\n\n```python\nfrom dcmbench.model_benchmarker import Benchmarker\nfrom dcmbench.datasets import fetch_data\n\n# Load dataset (automatically downloads if not in local cache)\ndata = fetch_data(\"swissmetro_dataset\")\n\n# Define models and run benchmark\nbenchmarker = Benchmarker()\nbenchmarker.register_model(models, \"Model Name\")\nresults = benchmarker.run_benchmark(data, choice_column=\"CHOICE\")\nbenchmarker.print_comparison()\n```\n\n### Advanced Analysis Capabilities\n\n- **Sensitivity Analysis**: Evaluate how model predictions change with variations in key variables\n - Simulate changes in travel times, costs, and other attributes\n - Generate plots showing the evolution of market shares under different scenarios\n\n- **Individual-Level Parameters**: For Mixed Logit models, calculate and visualize:\n - Individual-specific parameter distributions using Bayesian approaches\n - Value of Time (VOT) distributions across the population\n - Heterogeneity in preference structures\n\n- **Model Calibration**: Automatically calibrate Alternative Specific Constants (ASCs) to match observed market shares\n\n### Visualization and Reporting\n\n- **Market Share Analysis**: Compare predicted vs. observed mode shares\n- **Performance Plots**: Visualize how different models perform across multiple datasets\n- **Parameter Distributions**: Plot distributions of random parameters and derived metrics like VOT\n- **Sensitivity Curves**: Show how predicted mode shares change with variations in key variables\n\n## Supported Datasets\n\n- **Swissmetro** (`swissmetro_dataset`): Swiss inter-city travel mode choice\n- **London Transport** (`ltds_dataset`): London Travel Demand Survey with urban mode choices\n- **ModeCanada** (`modecanada_dataset`): Canadian inter-city travel dataset\n\nDatasets are automatically downloaded from the [dcmbench-datasets](https://github.com/carlosguirado/dcmbench-datasets) repository on first use and cached locally:\n\n```python\nfrom dcmbench.datasets import fetch_data\n\n# Use default cache location (~/.dcmbench/datasets)\ndata = fetch_data(\"swissmetro_dataset\")\n\n# Specify custom cache location\ndata = fetch_data(\"swissmetro_dataset\", local_cache_dir=\"/path/to/cache\")\n\n# Get features and target separately\nX, y = fetch_data(\"swissmetro_dataset\", return_X_y=True)\n```\n\n## Example Applications\n\n### Benchmarking Multiple Models\n\nThe package includes tools to benchmark multiple model types across different datasets:\n\n```python\n# Run benchmark_all_models.py to compare models across datasets\npython benchmark_all_models.py\n```\n\nThis generates comparative visualizations showing how different model types perform across datasets, plotting metrics like choice accuracy and market share accuracy against model fit.\n\n### Sensitivity Analysis\n\nAnalyze how changes in key variables affect predicted mode shares:\n\n```python\n# Run sensitivity analysis on ModeCanada models\npython sensitivity_analysis.py\n```\n\nThis creates plots showing the evolution of mode shares as you modify variables like:\n- Travel costs for different modes\n- Travel times\n- Service frequencies\n\n### Individual Parameter Analysis\n\nFor Mixed Logit models, analyze individual-level parameters and VOT:\n\n```python\n# Generate individual parameter distributions for ModeCanada\npython plot_individual_parameters_canada.py\n```\n\nThis calculates individual-specific parameters using Bayesian conditioning and produces:\n- Distributions of time and cost parameters\n- Value of Time (VOT) distributions\n- Summary statistics for preference heterogeneity\n\n## Requirements\n\n- Python >=3.8\n- NumPy >=2.0.0\n- Pandas >=2.0.0\n- Biogeme >=3.2.14\n- Matplotlib >=3.0.0\n- Requests >=2.25.0\n- SciPy (for statistical functions)\n- Seaborn (for advanced visualizations)\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Contributing\n\nWe welcome contributions! Please see our contributing guidelines for details.\n\n### Adding New Datasets\n\nTo add a new dataset to the DCMBench ecosystem:\n\n1. Fork the [dcmbench-datasets](https://github.com/carlosguirado/dcmbench-datasets) repository\n2. Add your dataset following the structure guidelines in the repository's CONTRIBUTING.md file\n3. Submit a pull request to the dcmbench-datasets repository\n4. Update the metadata.json file in the main DCMBench package to include your dataset information\n\nThis design allows the package to remain lightweight while providing access to a growing collection of transportation mode choice datasets.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A comprehensive benchmarking framework for discrete choice models",
"version": "0.1.2",
"project_urls": {
"Bug Tracker": "https://github.com/carlosguirado/dcmbench/issues",
"Documentation": "https://dcmbench.readthedocs.io",
"Homepage": "https://github.com/carlosguirado/dcmbench",
"Repository": "https://github.com/carlosguirado/dcmbench.git"
},
"split_keywords": [
"discrete choice",
" benchmarking",
" transportation",
" econometrics",
" machine learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "16b8bae189cbe342e5fb657162fba14e458d9337b5bfe108a84cd2b2bc757d4c",
"md5": "5a413fd24338699b83adf939a83a29f1",
"sha256": "175e91b325a6df5b6af8f4dc8d281764a2791312f229f87a807509dab1dcc102"
},
"downloads": -1,
"filename": "dcmbench-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5a413fd24338699b83adf939a83a29f1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 127458,
"upload_time": "2025-08-29T17:35:20",
"upload_time_iso_8601": "2025-08-29T17:35:20.179133Z",
"url": "https://files.pythonhosted.org/packages/16/b8/bae189cbe342e5fb657162fba14e458d9337b5bfe108a84cd2b2bc757d4c/dcmbench-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6085c08e972c2dd2f6463abcd1e4ef939a27dc6970339a05070fb1527555d999",
"md5": "70c7bb6f93a7d3641e69ea7744231619",
"sha256": "64a117495eb6a265b9c53898cf8100c07b8423988eb2ef648c6a94ba3f5be6df"
},
"downloads": -1,
"filename": "dcmbench-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "70c7bb6f93a7d3641e69ea7744231619",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 112351,
"upload_time": "2025-08-29T17:35:21",
"upload_time_iso_8601": "2025-08-29T17:35:21.531954Z",
"url": "https://files.pythonhosted.org/packages/60/85/c08e972c2dd2f6463abcd1e4ef939a27dc6970339a05070fb1527555d999/dcmbench-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-29 17:35:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "carlosguirado",
"github_project": "dcmbench",
"github_not_found": true,
"lcname": "dcmbench"
}