# Segmentation Forests
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
**Unsupervised segment discovery using divergence-based decision trees inspired by Random Forests.**
Automatically discover meaningful segments in your data where metric distributions significantly diverge from the background. Perfect for exploratory data analysis, anomaly detection, and customer segmentation without requiring labeled data.
---
## 🎯 The Problem
You have a dataset with many categorical features and a metric you care about. For example:
- **E-commerce**: Users across countries, devices, age groups → conversion rate
- **Digital advertising**: Impressions across demographics, platforms, times → CTR
- **Healthcare**: Patients across conditions, treatments, demographics → readmission rate
- **Finance**: Transactions across customer segments, times → fraud rate
**The question**: *Which specific combinations of features exhibit unusual behavior?*
With N features and many values per feature, exhaustively testing combinations is impossible. **Segmentation Forests solves this** by intelligently searching for segments where your metric's distribution differs significantly from the overall population.
---
## ✨ Key Features
- 🌳 **Tree-based discovery**: Greedy algorithm efficiently navigates combinatorial feature space
- 🌲 **Forest ensemble**: Bootstrap aggregating for robust, reproducible patterns
- 📊 **Statistical rigor**: KS distance (continuous) & Jensen-Shannon divergence (discrete)
- 📈 **Beautiful visualizations**: Distribution comparisons and quality assessments
- 🔬 **Fully typed**: Complete type hints for excellent IDE support
- ⚡ **Fast & scalable**: Handles datasets with millions of rows
---
## 🚀 Quick Start
### Installation
```bash
pip install segmentation-forests
```
### Basic Example
```python
import pandas as pd
from segmentation_forests import SegmentationTree, SegmentationForest
# Your data: features + metric
data = pd.DataFrame({
'country': ['US', 'UK', 'US', 'UK', ...],
'device': ['Mobile', 'Desktop', 'Mobile', ...],
'gender': ['F', 'M', 'F', ...],
'impressions': [245, 103, 312, 98, ...] # Your metric
})
# Discover segments with a single tree
tree = SegmentationTree(max_depth=3, min_samples_split=100)
tree.fit(data, metric_column='impressions')
segments = tree.get_segments(min_divergence=0.1)
# View results
for i, seg in enumerate(segments[:3], 1):
print(f"{i}. {seg.get_condition_string()}")
print(f" Divergence: {seg.divergence:.3f} | Size: {seg.size:,}")
```
**Output:**
```
1. gender == F AND device == Mobile AND country == UK
Divergence: 0.948 | Size: 523
2. time_of_day == Evening AND country == US AND device == Desktop
Divergence: 0.856 | Size: 412
3. country == DE AND time_of_day == Morning
Divergence: 0.824 | Size: 289
```
---
## 🌲 Using the Forest (Recommended)
For more robust results, use the ensemble approach:
```python
from segmentation_forests import SegmentationForest
# Create forest with bootstrap sampling and random features
forest = SegmentationForest(
n_trees=10,
max_depth=3,
max_features=2, # Random feature selection
min_samples_split=100,
min_samples_leaf=50
)
# Fit and get segments found by multiple trees
forest.fit(data, metric_column='impressions')
robust_segments = forest.get_segments(min_support=3, min_divergence=0.1)
# View results
for seg in robust_segments:
cond_str = " AND ".join([f"{c[0]} {c[1]} {c[2]}" for c in seg['conditions']])
print(f"{cond_str}")
print(f" Support: {seg['support']}/10 trees ({seg['support_rate']*100:.0f}%)")
print(f" Avg Divergence: {seg['avg_divergence']:.3f}")
print()
```
---
## 📊 Visualization
Beautiful distribution comparison plots:
```python
from segmentation_forests.visualization import plot_segment_comparison
# Compare segment distribution vs background
fig = plot_segment_comparison(
data=data,
segment_conditions=[('country', '==', 'UK'), ('device', '==', 'Mobile')],
metric_column='impressions',
title='UK Mobile Users vs Background'
)
fig.savefig('segment_comparison.png', dpi=150)
```
**Example output:**
The plot shows:
- **Left**: Overlapping histograms (background in blue, segment in coral)
- **Right**: Box plots comparing distributions
- **Clear separation**: Strong segments show minimal overlap
---
## 🧠 How It Works
### Algorithm Overview
1. **Compute Background Distribution**: Calculate the distribution of your metric across all data
2. **Greedy Tree Building**:
- At each node, evaluate all feature-value splits
- Choose the split that maximizes divergence from background
- Recursively build left (matching condition) and right (not matching) subtrees
3. **Collect High-Divergence Leaves**: Return segments that diverge significantly
4. **Ensemble Aggregation** (Forest only): Vote across trees to find robust patterns
### Divergence Measures
The algorithm automatically selects the appropriate measure:
| Metric Type | Measure | Range | Description |
|------------|---------|-------|-------------|
| **Continuous** | Kolmogorov-Smirnov | [0, 1] | Max distance between CDFs |
| **Discrete** | Jensen-Shannon | [0, 1] | Symmetric KL divergence |
**Decision threshold**: ≤20 unique values → discrete, >20 → continuous
### Quality Guidelines
Interpret divergence scores:
- **≥ 0.5**: 🎯 **Excellent** - Strong, highly actionable pattern
- **0.3-0.5**: ✓ **Good** - Meaningful difference worth investigating
- **0.1-0.3**: ⚠️ **Weak** - Marginal effect, could be noise
- **< 0.1**: ❌ **Very weak** - Likely statistical noise
---
## 📖 API Reference
### `SegmentationTree`
```python
SegmentationTree(
max_depth: int = 5,
min_samples_split: int = 50,
min_samples_leaf: int = 20,
divergence_threshold: float = 0.01,
random_features: Optional[int] = None
)
```
**Parameters:**
- `max_depth`: Maximum tree depth (controls segment complexity)
- `min_samples_split`: Minimum samples required to split a node
- `min_samples_leaf`: Minimum samples required in each child
- `divergence_threshold`: Minimum divergence to keep a segment
- `random_features`: Number of random features per split (None = use all)
**Methods:**
- `fit(data: pd.DataFrame, metric_column: str) -> Self`: Fit tree to data
- `get_segments(min_divergence: float = 0.0) -> List[SegmentationNode]`: Get segments
---
### `SegmentationForest`
```python
SegmentationForest(
n_trees: int = 10,
max_depth: int = 5,
min_samples_split: int = 50,
min_samples_leaf: int = 20,
divergence_threshold: float = 0.01,
max_features: Optional[int] = None
)
```
**Parameters:**
- `n_trees`: Number of trees in the forest
- Other parameters same as `SegmentationTree`
**Methods:**
- `fit(data: pd.DataFrame, metric_column: str) -> Self`: Fit forest
- `get_segments(min_support: int = 2, min_divergence: float = 0.0) -> List[Dict]`: Get robust segments
**Returns:** List of dicts with keys:
- `conditions`: List of (column, operator, value) tuples
- `support`: Number of trees that found this segment
- `avg_divergence`: Average divergence across trees
- `avg_size`: Average segment size
- `support_rate`: Fraction of trees (support / n_trees)
---
### `SegmentationNode`
Represents a discovered segment.
**Attributes:**
- `conditions`: List of (column, operator, value) tuples
- `divergence`: Divergence score
- `size`: Number of data points
- `depth`: Depth in tree
- `data_indices`: Indices of data points in this segment
**Methods:**
- `get_condition_string() -> str`: Human-readable condition string
---
## 🎨 Visualization Functions
### `plot_segment_comparison`
```python
plot_segment_comparison(
data: pd.DataFrame,
segment_conditions: List[Tuple],
metric_column: str,
title: Optional[str] = None,
figsize: Tuple = (14, 5)
) -> plt.Figure
```
Creates side-by-side histogram and box plot comparison.
---
## 💡 Usage Tips
### Choosing Parameters
**For max_depth:**
- `depth=2`: Simple 2-condition segments (e.g., "Country=UK AND Device=Mobile")
- `depth=3-4`: **Recommended** - Balanced complexity
- `depth=5+`: Complex segments, risk of overfitting
**For min_divergence:**
- Start with `0.1` to see all interesting patterns
- Increase to `0.3+` to focus only on strong effects
- Use forest `min_support` to filter noise instead
**For forest:**
- `n_trees=10`: Good default
- `n_trees=20+`: More robust but slower
- `max_features=sqrt(n_features)`: Good for high-dimensional data
### Interpreting Results
1. **Always visualize top segments** to verify they make sense
2. **Check segment size** - very small segments may be spurious
3. **Use forest support** - patterns in 5+/10 trees are highly reliable
4. **Domain validation** - do discovered segments align with business intuition?
---
## 🔬 Example: Advertising Dataset
```python
from segmentation_forests import SegmentationForest
from segmentation_forests.visualization import plot_segment_comparison
import pandas as pd
import numpy as np
# Create synthetic advertising data
np.random.seed(42)
n = 10000
data = pd.DataFrame({
'country': np.random.choice(['US', 'UK', 'CA', 'DE', 'FR'], n),
'device': np.random.choice(['Mobile', 'Desktop', 'Tablet'], n),
'gender': np.random.choice(['M', 'F'], n),
'time_of_day': np.random.choice(['Morning', 'Afternoon', 'Evening', 'Night'], n),
'impressions': np.random.poisson(100, n) # Base: ~100 impressions
})
# Add hidden pattern: UK females on mobile get 3x impressions
mask = (data['gender'] == 'F') & (data['country'] == 'UK') & (data['device'] == 'Mobile')
data.loc[mask, 'impressions'] = np.random.poisson(300, mask.sum())
# Discover the pattern
forest = SegmentationForest(n_trees=10, max_depth=3, max_features=2)
forest.fit(data, 'impressions')
segments = forest.get_segments(min_support=3, min_divergence=0.3)
# Result: Discovers the hidden pattern!
# Output: "gender == F AND country == UK AND device == Mobile"
# Divergence: 0.948, Support: 7/10 trees
```
See `examples/advertising_example.py` for the complete example.
---
## 🛠️ Development
### Setup
```bash
# Clone repository
git clone https://github.com/davidgeorgewilliams/segmentation-forests.git
cd segmentation-forests
# Install with dev dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
```
### Running Tests
```bash
# Run all tests
pytest
# With coverage
pytest --cov=segmentation_forests --cov-report=html
# Run specific test
pytest tests/test_tree.py -v
```
### Code Quality
```bash
# Format code
black src/ tests/
isort src/ tests/
# Lint
ruff check src/ tests/
# Type check
mypy src/
```
---
## 🤝 Contributing
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes and add tests
4. Ensure all tests pass and code is formatted
5. Submit a pull request
---
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
## 📚 Citation
If you use Segmentation Forests in your research or project, please cite:
```bibtex
@software{segmentation_forests,
author = {Williams, David},
title = {Segmentation Forests: Unsupervised Segment Discovery using Divergence-based Decision Trees},
year = {2025},
url = {https://github.com/davidgeorgewilliams/segmentation-forests}
}
```
---
## 🙏 Acknowledgments
- Algorithm inspired by Random Forests (Breiman, 2001)
- Divergence measures from information theory (Kullback-Leibler, Jensen-Shannon)
- Built with NumPy, pandas, SciPy, matplotlib, and seaborn
---
## 📞 Contact
**David Williams** - [david@davidgeorgewilliams.com](mailto:david@davidgeorgewilliams.com)
Project Link: [https://github.com/davidgeorgewilliams/segmentation-forests](https://github.com/davidgeorgewilliams/segmentation-forests)
---
**Happy Discovering! 🎯🌲**
Raw data
{
"_id": null,
"home_page": null,
"name": "segmentation-forests",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "machine-learning, unsupervised-learning, segment-discovery, data-mining, random-forests",
"author": null,
"author_email": "David Williams <david@davidgeorgewilliams.com>",
"download_url": "https://files.pythonhosted.org/packages/fe/53/57b88e89061d70701b69321c40b3a6a0f77a6e4a1917d430af125f7decb0/segmentation_forests-0.1.0.tar.gz",
"platform": null,
"description": "# Segmentation Forests\n\n[](https://opensource.org/licenses/MIT)\n[](https://www.python.org/downloads/)\n\n**Unsupervised segment discovery using divergence-based decision trees inspired by Random Forests.**\n\nAutomatically discover meaningful segments in your data where metric distributions significantly diverge from the background. Perfect for exploratory data analysis, anomaly detection, and customer segmentation without requiring labeled data.\n\n---\n\n## \ud83c\udfaf The Problem\n\nYou have a dataset with many categorical features and a metric you care about. For example:\n\n- **E-commerce**: Users across countries, devices, age groups \u2192 conversion rate\n- **Digital advertising**: Impressions across demographics, platforms, times \u2192 CTR\n- **Healthcare**: Patients across conditions, treatments, demographics \u2192 readmission rate\n- **Finance**: Transactions across customer segments, times \u2192 fraud rate\n\n**The question**: *Which specific combinations of features exhibit unusual behavior?*\n\nWith N features and many values per feature, exhaustively testing combinations is impossible. **Segmentation Forests solves this** by intelligently searching for segments where your metric's distribution differs significantly from the overall population.\n\n---\n\n## \u2728 Key Features\n\n- \ud83c\udf33 **Tree-based discovery**: Greedy algorithm efficiently navigates combinatorial feature space\n- \ud83c\udf32 **Forest ensemble**: Bootstrap aggregating for robust, reproducible patterns\n- \ud83d\udcca **Statistical rigor**: KS distance (continuous) & Jensen-Shannon divergence (discrete)\n- \ud83d\udcc8 **Beautiful visualizations**: Distribution comparisons and quality assessments\n- \ud83d\udd2c **Fully typed**: Complete type hints for excellent IDE support\n- \u26a1 **Fast & scalable**: Handles datasets with millions of rows\n\n---\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\npip install segmentation-forests\n```\n\n### Basic Example\n\n```python\nimport pandas as pd\nfrom segmentation_forests import SegmentationTree, SegmentationForest\n\n# Your data: features + metric\ndata = pd.DataFrame({\n 'country': ['US', 'UK', 'US', 'UK', ...],\n 'device': ['Mobile', 'Desktop', 'Mobile', ...],\n 'gender': ['F', 'M', 'F', ...],\n 'impressions': [245, 103, 312, 98, ...] # Your metric\n})\n\n# Discover segments with a single tree\ntree = SegmentationTree(max_depth=3, min_samples_split=100)\ntree.fit(data, metric_column='impressions')\nsegments = tree.get_segments(min_divergence=0.1)\n\n# View results\nfor i, seg in enumerate(segments[:3], 1):\n print(f\"{i}. {seg.get_condition_string()}\")\n print(f\" Divergence: {seg.divergence:.3f} | Size: {seg.size:,}\")\n```\n\n**Output:**\n```\n1. gender == F AND device == Mobile AND country == UK\n Divergence: 0.948 | Size: 523\n\n2. time_of_day == Evening AND country == US AND device == Desktop\n Divergence: 0.856 | Size: 412\n\n3. country == DE AND time_of_day == Morning\n Divergence: 0.824 | Size: 289\n```\n\n---\n\n## \ud83c\udf32 Using the Forest (Recommended)\n\nFor more robust results, use the ensemble approach:\n\n```python\nfrom segmentation_forests import SegmentationForest\n\n# Create forest with bootstrap sampling and random features\nforest = SegmentationForest(\n n_trees=10,\n max_depth=3,\n max_features=2, # Random feature selection\n min_samples_split=100,\n min_samples_leaf=50\n)\n\n# Fit and get segments found by multiple trees\nforest.fit(data, metric_column='impressions')\nrobust_segments = forest.get_segments(min_support=3, min_divergence=0.1)\n\n# View results\nfor seg in robust_segments:\n cond_str = \" AND \".join([f\"{c[0]} {c[1]} {c[2]}\" for c in seg['conditions']])\n print(f\"{cond_str}\")\n print(f\" Support: {seg['support']}/10 trees ({seg['support_rate']*100:.0f}%)\")\n print(f\" Avg Divergence: {seg['avg_divergence']:.3f}\")\n print()\n```\n\n---\n\n## \ud83d\udcca Visualization\n\nBeautiful distribution comparison plots:\n\n```python\nfrom segmentation_forests.visualization import plot_segment_comparison\n\n# Compare segment distribution vs background\nfig = plot_segment_comparison(\n data=data,\n segment_conditions=[('country', '==', 'UK'), ('device', '==', 'Mobile')],\n metric_column='impressions',\n title='UK Mobile Users vs Background'\n)\nfig.savefig('segment_comparison.png', dpi=150)\n```\n\n**Example output:**\n\nThe plot shows:\n- **Left**: Overlapping histograms (background in blue, segment in coral)\n- **Right**: Box plots comparing distributions\n- **Clear separation**: Strong segments show minimal overlap\n\n---\n\n## \ud83e\udde0 How It Works\n\n### Algorithm Overview\n\n1. **Compute Background Distribution**: Calculate the distribution of your metric across all data\n2. **Greedy Tree Building**:\n - At each node, evaluate all feature-value splits\n - Choose the split that maximizes divergence from background\n - Recursively build left (matching condition) and right (not matching) subtrees\n3. **Collect High-Divergence Leaves**: Return segments that diverge significantly\n4. **Ensemble Aggregation** (Forest only): Vote across trees to find robust patterns\n\n### Divergence Measures\n\nThe algorithm automatically selects the appropriate measure:\n\n| Metric Type | Measure | Range | Description |\n|------------|---------|-------|-------------|\n| **Continuous** | Kolmogorov-Smirnov | [0, 1] | Max distance between CDFs |\n| **Discrete** | Jensen-Shannon | [0, 1] | Symmetric KL divergence |\n\n**Decision threshold**: \u226420 unique values \u2192 discrete, >20 \u2192 continuous\n\n### Quality Guidelines\n\nInterpret divergence scores:\n\n- **\u2265 0.5**: \ud83c\udfaf **Excellent** - Strong, highly actionable pattern\n- **0.3-0.5**: \u2713 **Good** - Meaningful difference worth investigating\n- **0.1-0.3**: \u26a0\ufe0f **Weak** - Marginal effect, could be noise\n- **< 0.1**: \u274c **Very weak** - Likely statistical noise\n\n---\n\n## \ud83d\udcd6 API Reference\n\n### `SegmentationTree`\n\n```python\nSegmentationTree(\n max_depth: int = 5,\n min_samples_split: int = 50,\n min_samples_leaf: int = 20,\n divergence_threshold: float = 0.01,\n random_features: Optional[int] = None\n)\n```\n\n**Parameters:**\n- `max_depth`: Maximum tree depth (controls segment complexity)\n- `min_samples_split`: Minimum samples required to split a node\n- `min_samples_leaf`: Minimum samples required in each child\n- `divergence_threshold`: Minimum divergence to keep a segment\n- `random_features`: Number of random features per split (None = use all)\n\n**Methods:**\n- `fit(data: pd.DataFrame, metric_column: str) -> Self`: Fit tree to data\n- `get_segments(min_divergence: float = 0.0) -> List[SegmentationNode]`: Get segments\n\n---\n\n### `SegmentationForest`\n\n```python\nSegmentationForest(\n n_trees: int = 10,\n max_depth: int = 5,\n min_samples_split: int = 50,\n min_samples_leaf: int = 20,\n divergence_threshold: float = 0.01,\n max_features: Optional[int] = None\n)\n```\n\n**Parameters:**\n- `n_trees`: Number of trees in the forest\n- Other parameters same as `SegmentationTree`\n\n**Methods:**\n- `fit(data: pd.DataFrame, metric_column: str) -> Self`: Fit forest\n- `get_segments(min_support: int = 2, min_divergence: float = 0.0) -> List[Dict]`: Get robust segments\n\n**Returns:** List of dicts with keys:\n- `conditions`: List of (column, operator, value) tuples\n- `support`: Number of trees that found this segment\n- `avg_divergence`: Average divergence across trees\n- `avg_size`: Average segment size\n- `support_rate`: Fraction of trees (support / n_trees)\n\n---\n\n### `SegmentationNode`\n\nRepresents a discovered segment.\n\n**Attributes:**\n- `conditions`: List of (column, operator, value) tuples\n- `divergence`: Divergence score\n- `size`: Number of data points\n- `depth`: Depth in tree\n- `data_indices`: Indices of data points in this segment\n\n**Methods:**\n- `get_condition_string() -> str`: Human-readable condition string\n\n---\n\n## \ud83c\udfa8 Visualization Functions\n\n### `plot_segment_comparison`\n\n```python\nplot_segment_comparison(\n data: pd.DataFrame,\n segment_conditions: List[Tuple],\n metric_column: str,\n title: Optional[str] = None,\n figsize: Tuple = (14, 5)\n) -> plt.Figure\n```\n\nCreates side-by-side histogram and box plot comparison.\n\n---\n\n## \ud83d\udca1 Usage Tips\n\n### Choosing Parameters\n\n**For max_depth:**\n- `depth=2`: Simple 2-condition segments (e.g., \"Country=UK AND Device=Mobile\")\n- `depth=3-4`: **Recommended** - Balanced complexity\n- `depth=5+`: Complex segments, risk of overfitting\n\n**For min_divergence:**\n- Start with `0.1` to see all interesting patterns\n- Increase to `0.3+` to focus only on strong effects\n- Use forest `min_support` to filter noise instead\n\n**For forest:**\n- `n_trees=10`: Good default\n- `n_trees=20+`: More robust but slower\n- `max_features=sqrt(n_features)`: Good for high-dimensional data\n\n### Interpreting Results\n\n1. **Always visualize top segments** to verify they make sense\n2. **Check segment size** - very small segments may be spurious\n3. **Use forest support** - patterns in 5+/10 trees are highly reliable\n4. **Domain validation** - do discovered segments align with business intuition?\n\n---\n\n## \ud83d\udd2c Example: Advertising Dataset\n\n```python\nfrom segmentation_forests import SegmentationForest\nfrom segmentation_forests.visualization import plot_segment_comparison\nimport pandas as pd\nimport numpy as np\n\n# Create synthetic advertising data\nnp.random.seed(42)\nn = 10000\n\ndata = pd.DataFrame({\n 'country': np.random.choice(['US', 'UK', 'CA', 'DE', 'FR'], n),\n 'device': np.random.choice(['Mobile', 'Desktop', 'Tablet'], n),\n 'gender': np.random.choice(['M', 'F'], n),\n 'time_of_day': np.random.choice(['Morning', 'Afternoon', 'Evening', 'Night'], n),\n 'impressions': np.random.poisson(100, n) # Base: ~100 impressions\n})\n\n# Add hidden pattern: UK females on mobile get 3x impressions\nmask = (data['gender'] == 'F') & (data['country'] == 'UK') & (data['device'] == 'Mobile')\ndata.loc[mask, 'impressions'] = np.random.poisson(300, mask.sum())\n\n# Discover the pattern\nforest = SegmentationForest(n_trees=10, max_depth=3, max_features=2)\nforest.fit(data, 'impressions')\nsegments = forest.get_segments(min_support=3, min_divergence=0.3)\n\n# Result: Discovers the hidden pattern!\n# Output: \"gender == F AND country == UK AND device == Mobile\"\n# Divergence: 0.948, Support: 7/10 trees\n```\n\nSee `examples/advertising_example.py` for the complete example.\n\n---\n\n## \ud83d\udee0\ufe0f Development\n\n### Setup\n\n```bash\n# Clone repository\ngit clone https://github.com/davidgeorgewilliams/segmentation-forests.git\ncd segmentation-forests\n\n# Install with dev dependencies\npip install -e \".[dev]\"\n\n# Install pre-commit hooks\npre-commit install\n```\n\n### Running Tests\n\n```bash\n# Run all tests\npytest\n\n# With coverage\npytest --cov=segmentation_forests --cov-report=html\n\n# Run specific test\npytest tests/test_tree.py -v\n```\n\n### Code Quality\n\n```bash\n# Format code\nblack src/ tests/\nisort src/ tests/\n\n# Lint\nruff check src/ tests/\n\n# Type check\nmypy src/\n```\n\n---\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please:\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Make your changes and add tests\n4. Ensure all tests pass and code is formatted\n5. Submit a pull request\n\n---\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n## \ud83d\udcda Citation\n\nIf you use Segmentation Forests in your research or project, please cite:\n\n```bibtex\n@software{segmentation_forests,\n author = {Williams, David},\n title = {Segmentation Forests: Unsupervised Segment Discovery using Divergence-based Decision Trees},\n year = {2025},\n url = {https://github.com/davidgeorgewilliams/segmentation-forests}\n}\n```\n\n---\n\n## \ud83d\ude4f Acknowledgments\n\n- Algorithm inspired by Random Forests (Breiman, 2001)\n- Divergence measures from information theory (Kullback-Leibler, Jensen-Shannon)\n- Built with NumPy, pandas, SciPy, matplotlib, and seaborn\n\n---\n\n## \ud83d\udcde Contact\n\n**David Williams** - [david@davidgeorgewilliams.com](mailto:david@davidgeorgewilliams.com)\n\nProject Link: [https://github.com/davidgeorgewilliams/segmentation-forests](https://github.com/davidgeorgewilliams/segmentation-forests)\n\n---\n\n**Happy Discovering! \ud83c\udfaf\ud83c\udf32**\n",
"bugtrack_url": null,
"license": null,
"summary": "Unsupervised segment discovery using divergence-based decision trees inspired by Random Forests",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/davidgeorgewilliams/segmentation-forests/issues",
"Documentation": "https://github.com/davidgeorgewilliams/segmentation-forests/blob/main/README.md",
"Homepage": "https://github.com/davidgeorgewilliams/segmentation-forests",
"Repository": "https://github.com/davidgeorgewilliams/segmentation-forests"
},
"split_keywords": [
"machine-learning",
" unsupervised-learning",
" segment-discovery",
" data-mining",
" random-forests"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "77f4d9a51443e6fbeb45d21b9cd35397742caddff78640c268491bbe727b5d52",
"md5": "c11186c1abdd157fff250cbcdf4237e4",
"sha256": "a4c29983caab4a2153a362c7ed209b67a9cdc34536ba49982224205e2f27089a"
},
"downloads": -1,
"filename": "segmentation_forests-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c11186c1abdd157fff250cbcdf4237e4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 14902,
"upload_time": "2025-10-08T10:27:48",
"upload_time_iso_8601": "2025-10-08T10:27:48.360206Z",
"url": "https://files.pythonhosted.org/packages/77/f4/d9a51443e6fbeb45d21b9cd35397742caddff78640c268491bbe727b5d52/segmentation_forests-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "fe5357b88e89061d70701b69321c40b3a6a0f77a6e4a1917d430af125f7decb0",
"md5": "a8b39edc1f35b61cbf44450ed9c06cb1",
"sha256": "b267564400c65a56f257a61f230806b7cf4ac2bbdca6f364b0130244e2f5a31b"
},
"downloads": -1,
"filename": "segmentation_forests-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "a8b39edc1f35b61cbf44450ed9c06cb1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 18838,
"upload_time": "2025-10-08T10:27:49",
"upload_time_iso_8601": "2025-10-08T10:27:49.963022Z",
"url": "https://files.pythonhosted.org/packages/fe/53/57b88e89061d70701b69321c40b3a6a0f77a6e4a1917d430af125f7decb0/segmentation_forests-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-08 10:27:49",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "davidgeorgewilliams",
"github_project": "segmentation-forests",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "segmentation-forests"
}