dataspot

Name	dataspot JSON
Version	0.4.6 JSON
	download
home_page	None
Summary	Find data concentration patterns and dataspots. Built for fraud detection and risk analysis.
upload_time	2025-07-24 01:11:36
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	None
keywords	anomaly-detection data-analysis dataspots fraud-detection pattern-mining
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Dataspot 🔥

> **Find data concentration patterns and dataspots in your datasets**

[![PyPI version](https://img.shields.io/pypi/v/dataspot.svg)](https://pypi.org/project/dataspot/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Maintained by Frauddi](https://img.shields.io/badge/Maintained%20by-Frauddi-blue.svg)](https://frauddi.com)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)

Dataspot automatically discovers **where your data concentrates**, helping you identify patterns, anomalies, and insights in datasets. Originally developed for fraud detection at Frauddi, now available as open source.

## ✨ Why Dataspot?

- 🎯 **Purpose-built** for finding data concentrations, not just clustering
- 🔍 **Fraud detection ready** - spot suspicious behavior patterns
- ⚡ **Simple API** - get insights in 3 lines of code
- 📊 **Hierarchical analysis** - understand data at multiple levels
- 🔧 **Flexible filtering** - customize analysis with powerful options
- 📈 **Field-tested** - validated in real fraud detection systems

## 🚀 Quick Start

```bash
pip install dataspot
```

```python
from dataspot import Dataspot
from dataspot.models.finder import FindInput, FindOptions

# Sample transaction data
data = [
    {"country": "US", "device": "mobile", "amount": "high", "user_type": "premium"},
    {"country": "US", "device": "mobile", "amount": "medium", "user_type": "premium"},
    {"country": "EU", "device": "desktop", "amount": "low", "user_type": "free"},
    {"country": "US", "device": "mobile", "amount": "high", "user_type": "premium"},
]

# Find concentration patterns
dataspot = Dataspot()
result = dataspot.find(
    FindInput(data=data, fields=["country", "device", "user_type"]),
    FindOptions(min_percentage=10.0, limit=5)
)

# Results show where data concentrates
for pattern in result.patterns:
    print(f"{pattern.path} → {pattern.percentage}% ({pattern.count} records)")

# Output:
# country=US > device=mobile > user_type=premium → 75.0% (3 records)
# country=US > device=mobile → 75.0% (3 records)
# device=mobile → 75.0% (3 records)
```

## 🎯 Real-World Use Cases

### 🚨 Fraud Detection

```python
from dataspot.models.finder import FindInput, FindOptions

# Find suspicious transaction patterns
result = dataspot.find(
    FindInput(
        data=transactions,
        fields=["country", "payment_method", "time_of_day"]
    ),
    FindOptions(min_percentage=15.0, contains="crypto")
)

# Spot unusual concentrations that might indicate fraud
for pattern in result.patterns:
    if pattern.percentage > 30:
        print(f"⚠️ High concentration: {pattern.path}")
```

### 📊 Business Intelligence

```python
from dataspot.models.analyzer import AnalyzeInput, AnalyzeOptions

# Discover customer behavior patterns
insights = dataspot.analyze(
    AnalyzeInput(
        data=customer_data,
        fields=["region", "device", "product_category", "tier"]
    ),
    AnalyzeOptions(min_percentage=10.0)
)

print(f"📈 Found {len(insights.patterns)} concentration patterns")
print(f"🎯 Top opportunity: {insights.patterns[0].path}")
```

### 🔍 Temporal Analysis

```python
from dataspot.models.compare import CompareInput, CompareOptions

# Compare patterns between time periods
comparison = dataspot.compare(
    CompareInput(
        current_data=this_month_data,
        baseline_data=last_month_data,
        fields=["country", "payment_method"]
    ),
    CompareOptions(
        change_threshold=0.20,
        statistical_significance=True
    )
)

print(f"📊 Changes detected: {len(comparison.changes)}")
print(f"🆕 New patterns: {len(comparison.new_patterns)}")
```

### 🌳 Hierarchical Visualization

```python
from dataspot.models.tree import TreeInput, TreeOptions

# Build hierarchical tree for data exploration
tree = dataspot.tree(
    TreeInput(
        data=sales_data,
        fields=["region", "product_category", "sales_channel"]
    ),
    TreeOptions(min_value=10, max_depth=3, sort_by="value")
)

print(f"🌳 Total records: {tree.value}")
print(f"📊 Main branches: {len(tree.children)}")

# Navigate the hierarchy
for region in tree.children:
    print(f"  📍 {region.name}: {region.value} records")
    for product in region.children:
        print(f"    📦 {product.name}: {product.value} records")
```

### 🤖 Auto Discovery

```python
from dataspot.models.discovery import DiscoverInput, DiscoverOptions

# Automatically discover important patterns
discovery = dataspot.discover(
    DiscoverInput(data=transaction_data),
    DiscoverOptions(max_fields=3, min_percentage=15.0)
)

print(f"🎯 Top patterns discovered: {len(discovery.top_patterns)}")
for field_ranking in discovery.field_ranking[:3]:
    print(f"📈 {field_ranking.field}: {field_ranking.score:.2f}")
```

## 🛠️ Core Methods

| Method | Purpose | Input Model | Options Model | Output Model |
|--------|---------|-------------|---------------|--------------|
| `find()` | Find concentration patterns | `FindInput` | `FindOptions` | `FindOutput` |
| `analyze()` | Statistical analysis | `AnalyzeInput` | `AnalyzeOptions` | `AnalyzeOutput` |
| `compare()` | Temporal comparison | `CompareInput` | `CompareOptions` | `CompareOutput` |
| `discover()` | Auto pattern discovery | `DiscoverInput` | `DiscoverOptions` | `DiscoverOutput` |
| `tree()` | Hierarchical visualization | `TreeInput` | `TreeOptions` | `TreeOutput` |

### Advanced Filtering Options

```python
# Complex analysis with multiple criteria
result = dataspot.find(
    FindInput(
        data=data,
        fields=["country", "device", "payment"],
        query={"country": ["US", "EU"]}  # Pre-filter data
    ),
    FindOptions(
        min_percentage=10.0,      # Only patterns with >10% concentration
        max_depth=3,             # Limit hierarchy depth
        contains="mobile",       # Must contain "mobile" in pattern
        min_count=50,           # At least 50 records
        sort_by="percentage",   # Sort by concentration strength
        limit=20                # Top 20 patterns
    )
)
```

## ⚡ Performance

Dataspot delivers consistent, predictable performance with exceptionally efficient memory usage and linear scaling.

### 🚀 Real-World Performance

| Dataset Size | Processing Time | Memory Usage | Patterns Found |
|--------------|-----------------|---------------|----------------|
| 1,000 records | **~5ms** | **~1.4MB** | 12 patterns |
| 10,000 records | **~43ms** | **~2.8MB** | 12 patterns |
| 100,000 records | **~375ms** | **~2.9MB** | 20 patterns |
| 1,000,000 records | **~3.7s** | **~3.0MB** | 20 patterns |

> **Benchmark Methodology**: Performance measured using validated testing with 5 iterations per dataset size on MacBook Pro (M-series). Test data specifications:
>
> - **JSON Size**: ~164 bytes per JSON record (~0.16 KB each)
> - **JSON Structure**: 8 keys per JSON record (`country`, `device`, `payment_method`, `amount`, `user_type`, `channel`, `status`, `id`)
> - **Analysis Scope**: 4 fields analyzed simultaneously (`country`, `device`, `payment_method`, `user_type`)
> - **Configuration**: `min_percentage=5.0`, `limit=50` patterns
> - **Results**: Consistently finds 12 concentration patterns across all dataset sizes
> - **Variance**: Minimal timing variance (±1-6ms), demonstrating algorithmic stability
> - **Memory Efficiency**: Near-constant memory usage regardless of dataset size

### 💡 Performance Tips

```python
# Optimize for speed
result = dataspot.find(
    FindInput(data=large_dataset, fields=fields),
    FindOptions(
        min_percentage=10.0,    # Skip low-concentration patterns
        max_depth=3,           # Limit hierarchy depth
        limit=100             # Cap results
    )
)

# Memory efficient processing
from dataspot.models.tree import TreeInput, TreeOptions

tree = dataspot.tree(
    TreeInput(data=data, fields=["country", "device"]),
    TreeOptions(min_value=10, top=5)  # Simplified tree
)
```

## 📈 What Makes Dataspot Different?

| **Traditional Clustering** | **Dataspot Analysis** |
|---------------------------|---------------------|
| Groups similar data points | **Finds concentration patterns** |
| Equal-sized clusters | **Identifies where data accumulates** |
| Distance-based | **Percentage and count based** |
| Hard to interpret | **Business-friendly hierarchy** |
| Generic approach | **Built for real-world analysis** |

## 🎬 Dataspot in Action

[View the algorithm](https://frauddi.github.io/dataspot/algorithm-dataspot.html)
![Dataspot in action - Finding data concentration patterns](algorithm-dataspot.gif)

See Dataspot discover concentration patterns and dataspots in real-time with hierarchical analysis and statistical insights.

## 📊 API Structure

### Input Models

- `FindInput` - Data and fields for pattern finding
- `AnalyzeInput` - Statistical analysis configuration
- `CompareInput` - Current vs baseline data comparison
- `DiscoverInput` - Automatic pattern discovery
- `TreeInput` - Hierarchical tree visualization

### Options Models

- `FindOptions` - Filtering and sorting for patterns
- `AnalyzeOptions` - Statistical analysis parameters
- `CompareOptions` - Change detection thresholds
- `DiscoverOptions` - Auto-discovery constraints
- `TreeOptions` - Tree structure customization

### Output Models

- `FindOutput` - Pattern discovery results with statistics
- `AnalyzeOutput` - Enhanced analysis with insights and confidence scores
- `CompareOutput` - Change detection results with significance tests
- `DiscoverOutput` - Auto-discovery findings with field rankings
- `TreeOutput` - Hierarchical tree structure with navigation

## 🔧 Installation & Requirements

```bash
# Install from PyPI
pip install dataspot

# Development installation
git clone https://github.com/frauddi/dataspot.git
cd dataspot
pip install -e ".[dev]"
```

**Requirements:**

- Python 3.9+
- No heavy dependencies (just standard library + optional speedups)

## 🛠️ Development Commands

| Command | Description |
|---------|-------------|
| `make lint` | Check code for style and quality issues |
| `make lint-fix` | Automatically fix linting issues where possible |
| `make tests` | Run all tests with coverage reporting |
| `make check` | Run both linting and tests |
| `make clean` | Remove cache files, build artifacts, and temporary files |
| `make install` | Create virtual environment and install dependencies |

## 📚 Documentation & Examples

- 📖 [User Guide](docs/user-guide.md) - Complete usage documentation
- 💡 [Examples](examples/) - Real-world usage examples:
  - `01_basic_query_filtering.py` - Query and filtering basics
  - `02_pattern_filtering_basic.py` - Pattern-based filtering
  - `06_real_world_scenarios.py` - Business use cases
  - `08_auto_discovery.py` - Automatic pattern discovery
  - `09_temporal_comparison.py` - A/B testing and change detection
  - `10_stats.py` - Statistical analysis
- 🤝 [Contributing](docs/CONTRIBUTING.md) - How to contribute

## 🌟 Why Open Source?

Dataspot was born from real-world fraud detection needs at Frauddi. We believe powerful pattern analysis shouldn't be locked behind closed doors. By open-sourcing Dataspot, we hope to:

- 🎯 **Advance fraud detection** across the industry
- 🤝 **Enable collaboration** on pattern analysis techniques
- 🔍 **Help companies** spot issues in their data
- 📈 **Improve data quality** everywhere

## 🤝 Contributing

We welcome contributions! Whether you're:

- 🐛 Reporting bugs
- 💡 Suggesting features
- 📝 Improving documentation
- 🔧 Adding new analysis methods

See our [Contributing Guide](docs/CONTRIBUTING.md) for details.

## 📄 License

MIT License - see [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- **Created by [@eliosf27](https://github.com/eliosf27)** - Original algorithm and implementation
- **Sponsored by [Frauddi](https://frauddi.com)** - Field testing and open source support
- **Inspired by real fraud detection challenges** - Built to solve actual problems

## 🔗 Links

- 🏠 [Homepage](https://github.com/frauddi/dataspot)
- 📦 [PyPI Package](https://pypi.org/project/dataspot/)
- 🐛 [Issue Tracker](https://github.com/frauddi/dataspot/issues)

---

**Find your data's dataspots. Discover what others miss.**
Built with ❤️ by [Frauddi](https://frauddi.com)

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "dataspot",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "anomaly-detection, data-analysis, dataspots, fraud-detection, pattern-mining",
    "author": null,
    "author_email": "Elio Rinc\u00f3n <elio@frauddi.com>",
    "download_url": "https://files.pythonhosted.org/packages/80/d8/2fbf27f65c6235150df0aff49c2f9a829b419564f5c6581135b0e8534a4f/dataspot-0.4.6.tar.gz",
    "platform": null,
    "description": "# Dataspot \ud83d\udd25\n\n> **Find data concentration patterns and dataspots in your datasets**\n\n[![PyPI version](https://img.shields.io/pypi/v/dataspot.svg)](https://pypi.org/project/dataspot/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Maintained by Frauddi](https://img.shields.io/badge/Maintained%20by-Frauddi-blue.svg)](https://frauddi.com)\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n\nDataspot automatically discovers **where your data concentrates**, helping you identify patterns, anomalies, and insights in datasets. Originally developed for fraud detection at Frauddi, now available as open source.\n\n## \u2728 Why Dataspot?\n\n- \ud83c\udfaf **Purpose-built** for finding data concentrations, not just clustering\n- \ud83d\udd0d **Fraud detection ready** - spot suspicious behavior patterns\n- \u26a1 **Simple API** - get insights in 3 lines of code\n- \ud83d\udcca **Hierarchical analysis** - understand data at multiple levels\n- \ud83d\udd27 **Flexible filtering** - customize analysis with powerful options\n- \ud83d\udcc8 **Field-tested** - validated in real fraud detection systems\n\n## \ud83d\ude80 Quick Start\n\n```bash\npip install dataspot\n```\n\n```python\nfrom dataspot import Dataspot\nfrom dataspot.models.finder import FindInput, FindOptions\n\n# Sample transaction data\ndata = [\n    {\"country\": \"US\", \"device\": \"mobile\", \"amount\": \"high\", \"user_type\": \"premium\"},\n    {\"country\": \"US\", \"device\": \"mobile\", \"amount\": \"medium\", \"user_type\": \"premium\"},\n    {\"country\": \"EU\", \"device\": \"desktop\", \"amount\": \"low\", \"user_type\": \"free\"},\n    {\"country\": \"US\", \"device\": \"mobile\", \"amount\": \"high\", \"user_type\": \"premium\"},\n]\n\n# Find concentration patterns\ndataspot = Dataspot()\nresult = dataspot.find(\n    FindInput(data=data, fields=[\"country\", \"device\", \"user_type\"]),\n    FindOptions(min_percentage=10.0, limit=5)\n)\n\n# Results show where data concentrates\nfor pattern in result.patterns:\n    print(f\"{pattern.path} \u2192 {pattern.percentage}% ({pattern.count} records)\")\n\n# Output:\n# country=US > device=mobile > user_type=premium \u2192 75.0% (3 records)\n# country=US > device=mobile \u2192 75.0% (3 records)\n# device=mobile \u2192 75.0% (3 records)\n```\n\n## \ud83c\udfaf Real-World Use Cases\n\n### \ud83d\udea8 Fraud Detection\n\n```python\nfrom dataspot.models.finder import FindInput, FindOptions\n\n# Find suspicious transaction patterns\nresult = dataspot.find(\n    FindInput(\n        data=transactions,\n        fields=[\"country\", \"payment_method\", \"time_of_day\"]\n    ),\n    FindOptions(min_percentage=15.0, contains=\"crypto\")\n)\n\n# Spot unusual concentrations that might indicate fraud\nfor pattern in result.patterns:\n    if pattern.percentage > 30:\n        print(f\"\u26a0\ufe0f High concentration: {pattern.path}\")\n```\n\n### \ud83d\udcca Business Intelligence\n\n```python\nfrom dataspot.models.analyzer import AnalyzeInput, AnalyzeOptions\n\n# Discover customer behavior patterns\ninsights = dataspot.analyze(\n    AnalyzeInput(\n        data=customer_data,\n        fields=[\"region\", \"device\", \"product_category\", \"tier\"]\n    ),\n    AnalyzeOptions(min_percentage=10.0)\n)\n\nprint(f\"\ud83d\udcc8 Found {len(insights.patterns)} concentration patterns\")\nprint(f\"\ud83c\udfaf Top opportunity: {insights.patterns[0].path}\")\n```\n\n### \ud83d\udd0d Temporal Analysis\n\n```python\nfrom dataspot.models.compare import CompareInput, CompareOptions\n\n# Compare patterns between time periods\ncomparison = dataspot.compare(\n    CompareInput(\n        current_data=this_month_data,\n        baseline_data=last_month_data,\n        fields=[\"country\", \"payment_method\"]\n    ),\n    CompareOptions(\n        change_threshold=0.20,\n        statistical_significance=True\n    )\n)\n\nprint(f\"\ud83d\udcca Changes detected: {len(comparison.changes)}\")\nprint(f\"\ud83c\udd95 New patterns: {len(comparison.new_patterns)}\")\n```\n\n### \ud83c\udf33 Hierarchical Visualization\n\n```python\nfrom dataspot.models.tree import TreeInput, TreeOptions\n\n# Build hierarchical tree for data exploration\ntree = dataspot.tree(\n    TreeInput(\n        data=sales_data,\n        fields=[\"region\", \"product_category\", \"sales_channel\"]\n    ),\n    TreeOptions(min_value=10, max_depth=3, sort_by=\"value\")\n)\n\nprint(f\"\ud83c\udf33 Total records: {tree.value}\")\nprint(f\"\ud83d\udcca Main branches: {len(tree.children)}\")\n\n# Navigate the hierarchy\nfor region in tree.children:\n    print(f\"  \ud83d\udccd {region.name}: {region.value} records\")\n    for product in region.children:\n        print(f\"    \ud83d\udce6 {product.name}: {product.value} records\")\n```\n\n### \ud83e\udd16 Auto Discovery\n\n```python\nfrom dataspot.models.discovery import DiscoverInput, DiscoverOptions\n\n# Automatically discover important patterns\ndiscovery = dataspot.discover(\n    DiscoverInput(data=transaction_data),\n    DiscoverOptions(max_fields=3, min_percentage=15.0)\n)\n\nprint(f\"\ud83c\udfaf Top patterns discovered: {len(discovery.top_patterns)}\")\nfor field_ranking in discovery.field_ranking[:3]:\n    print(f\"\ud83d\udcc8 {field_ranking.field}: {field_ranking.score:.2f}\")\n```\n\n## \ud83d\udee0\ufe0f Core Methods\n\n| Method | Purpose | Input Model | Options Model | Output Model |\n|--------|---------|-------------|---------------|--------------|\n| `find()` | Find concentration patterns | `FindInput` | `FindOptions` | `FindOutput` |\n| `analyze()` | Statistical analysis | `AnalyzeInput` | `AnalyzeOptions` | `AnalyzeOutput` |\n| `compare()` | Temporal comparison | `CompareInput` | `CompareOptions` | `CompareOutput` |\n| `discover()` | Auto pattern discovery | `DiscoverInput` | `DiscoverOptions` | `DiscoverOutput` |\n| `tree()` | Hierarchical visualization | `TreeInput` | `TreeOptions` | `TreeOutput` |\n\n### Advanced Filtering Options\n\n```python\n# Complex analysis with multiple criteria\nresult = dataspot.find(\n    FindInput(\n        data=data,\n        fields=[\"country\", \"device\", \"payment\"],\n        query={\"country\": [\"US\", \"EU\"]}  # Pre-filter data\n    ),\n    FindOptions(\n        min_percentage=10.0,      # Only patterns with >10% concentration\n        max_depth=3,             # Limit hierarchy depth\n        contains=\"mobile\",       # Must contain \"mobile\" in pattern\n        min_count=50,           # At least 50 records\n        sort_by=\"percentage\",   # Sort by concentration strength\n        limit=20                # Top 20 patterns\n    )\n)\n```\n\n## \u26a1 Performance\n\nDataspot delivers consistent, predictable performance with exceptionally efficient memory usage and linear scaling.\n\n### \ud83d\ude80 Real-World Performance\n\n| Dataset Size | Processing Time | Memory Usage | Patterns Found |\n|--------------|-----------------|---------------|----------------|\n| 1,000 records | **~5ms** | **~1.4MB** | 12 patterns |\n| 10,000 records | **~43ms** | **~2.8MB** | 12 patterns |\n| 100,000 records | **~375ms** | **~2.9MB** | 20 patterns |\n| 1,000,000 records | **~3.7s** | **~3.0MB** | 20 patterns |\n\n> **Benchmark Methodology**: Performance measured using validated testing with 5 iterations per dataset size on MacBook Pro (M-series). Test data specifications:\n>\n> - **JSON Size**: ~164 bytes per JSON record (~0.16 KB each)\n> - **JSON Structure**: 8 keys per JSON record (`country`, `device`, `payment_method`, `amount`, `user_type`, `channel`, `status`, `id`)\n> - **Analysis Scope**: 4 fields analyzed simultaneously (`country`, `device`, `payment_method`, `user_type`)\n> - **Configuration**: `min_percentage=5.0`, `limit=50` patterns\n> - **Results**: Consistently finds 12 concentration patterns across all dataset sizes\n> - **Variance**: Minimal timing variance (\u00b11-6ms), demonstrating algorithmic stability\n> - **Memory Efficiency**: Near-constant memory usage regardless of dataset size\n\n### \ud83d\udca1 Performance Tips\n\n```python\n# Optimize for speed\nresult = dataspot.find(\n    FindInput(data=large_dataset, fields=fields),\n    FindOptions(\n        min_percentage=10.0,    # Skip low-concentration patterns\n        max_depth=3,           # Limit hierarchy depth\n        limit=100             # Cap results\n    )\n)\n\n# Memory efficient processing\nfrom dataspot.models.tree import TreeInput, TreeOptions\n\ntree = dataspot.tree(\n    TreeInput(data=data, fields=[\"country\", \"device\"]),\n    TreeOptions(min_value=10, top=5)  # Simplified tree\n)\n```\n\n## \ud83d\udcc8 What Makes Dataspot Different?\n\n| **Traditional Clustering** | **Dataspot Analysis** |\n|---------------------------|---------------------|\n| Groups similar data points | **Finds concentration patterns** |\n| Equal-sized clusters | **Identifies where data accumulates** |\n| Distance-based | **Percentage and count based** |\n| Hard to interpret | **Business-friendly hierarchy** |\n| Generic approach | **Built for real-world analysis** |\n\n## \ud83c\udfac Dataspot in Action\n\n[View the algorithm](https://frauddi.github.io/dataspot/algorithm-dataspot.html)\n![Dataspot in action - Finding data concentration patterns](algorithm-dataspot.gif)\n\nSee Dataspot discover concentration patterns and dataspots in real-time with hierarchical analysis and statistical insights.\n\n## \ud83d\udcca API Structure\n\n### Input Models\n\n- `FindInput` - Data and fields for pattern finding\n- `AnalyzeInput` - Statistical analysis configuration\n- `CompareInput` - Current vs baseline data comparison\n- `DiscoverInput` - Automatic pattern discovery\n- `TreeInput` - Hierarchical tree visualization\n\n### Options Models\n\n- `FindOptions` - Filtering and sorting for patterns\n- `AnalyzeOptions` - Statistical analysis parameters\n- `CompareOptions` - Change detection thresholds\n- `DiscoverOptions` - Auto-discovery constraints\n- `TreeOptions` - Tree structure customization\n\n### Output Models\n\n- `FindOutput` - Pattern discovery results with statistics\n- `AnalyzeOutput` - Enhanced analysis with insights and confidence scores\n- `CompareOutput` - Change detection results with significance tests\n- `DiscoverOutput` - Auto-discovery findings with field rankings\n- `TreeOutput` - Hierarchical tree structure with navigation\n\n## \ud83d\udd27 Installation & Requirements\n\n```bash\n# Install from PyPI\npip install dataspot\n\n# Development installation\ngit clone https://github.com/frauddi/dataspot.git\ncd dataspot\npip install -e \".[dev]\"\n```\n\n**Requirements:**\n\n- Python 3.9+\n- No heavy dependencies (just standard library + optional speedups)\n\n## \ud83d\udee0\ufe0f Development Commands\n\n| Command | Description |\n|---------|-------------|\n| `make lint` | Check code for style and quality issues |\n| `make lint-fix` | Automatically fix linting issues where possible |\n| `make tests` | Run all tests with coverage reporting |\n| `make check` | Run both linting and tests |\n| `make clean` | Remove cache files, build artifacts, and temporary files |\n| `make install` | Create virtual environment and install dependencies |\n\n## \ud83d\udcda Documentation & Examples\n\n- \ud83d\udcd6 [User Guide](docs/user-guide.md) - Complete usage documentation\n- \ud83d\udca1 [Examples](examples/) - Real-world usage examples:\n  - `01_basic_query_filtering.py` - Query and filtering basics\n  - `02_pattern_filtering_basic.py` - Pattern-based filtering\n  - `06_real_world_scenarios.py` - Business use cases\n  - `08_auto_discovery.py` - Automatic pattern discovery\n  - `09_temporal_comparison.py` - A/B testing and change detection\n  - `10_stats.py` - Statistical analysis\n- \ud83e\udd1d [Contributing](docs/CONTRIBUTING.md) - How to contribute\n\n## \ud83c\udf1f Why Open Source?\n\nDataspot was born from real-world fraud detection needs at Frauddi. We believe powerful pattern analysis shouldn't be locked behind closed doors. By open-sourcing Dataspot, we hope to:\n\n- \ud83c\udfaf **Advance fraud detection** across the industry\n- \ud83e\udd1d **Enable collaboration** on pattern analysis techniques\n- \ud83d\udd0d **Help companies** spot issues in their data\n- \ud83d\udcc8 **Improve data quality** everywhere\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Whether you're:\n\n- \ud83d\udc1b Reporting bugs\n- \ud83d\udca1 Suggesting features\n- \ud83d\udcdd Improving documentation\n- \ud83d\udd27 Adding new analysis methods\n\nSee our [Contributing Guide](docs/CONTRIBUTING.md) for details.\n\n## \ud83d\udcc4 License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f Acknowledgments\n\n- **Created by [@eliosf27](https://github.com/eliosf27)** - Original algorithm and implementation\n- **Sponsored by [Frauddi](https://frauddi.com)** - Field testing and open source support\n- **Inspired by real fraud detection challenges** - Built to solve actual problems\n\n## \ud83d\udd17 Links\n\n- \ud83c\udfe0 [Homepage](https://github.com/frauddi/dataspot)\n- \ud83d\udce6 [PyPI Package](https://pypi.org/project/dataspot/)\n- \ud83d\udc1b [Issue Tracker](https://github.com/frauddi/dataspot/issues)\n\n---\n\n**Find your data's dataspots. Discover what others miss.**\nBuilt with \u2764\ufe0f by [Frauddi](https://frauddi.com)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Find data concentration patterns and dataspots. Built for fraud detection and risk analysis.",
    "version": "0.4.6",
    "project_urls": {
        "Bug Tracker": "https://github.com/frauddi/dataspot/issues",
        "Homepage": "https://github.com/frauddi/dataspot",
        "Repository": "https://github.com/frauddi/dataspot"
    },
    "split_keywords": [
        "anomaly-detection",
        " data-analysis",
        " dataspots",
        " fraud-detection",
        " pattern-mining"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "98da230e36692c161ff020c2e87d7fc690aa766739895419460509c1f672df4e",
                "md5": "b084567426cf7f7afcd21492aef5cab0",
                "sha256": "6de932421160271d7109f699f224391f9cd91161cf6347f83935ee5133ce4961"
            },
            "downloads": -1,
            "filename": "dataspot-0.4.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b084567426cf7f7afcd21492aef5cab0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 79246,
            "upload_time": "2025-07-24T01:11:35",
            "upload_time_iso_8601": "2025-07-24T01:11:35.130748Z",
            "url": "https://files.pythonhosted.org/packages/98/da/230e36692c161ff020c2e87d7fc690aa766739895419460509c1f672df4e/dataspot-0.4.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "80d82fbf27f65c6235150df0aff49c2f9a829b419564f5c6581135b0e8534a4f",
                "md5": "9365bffd86d8ba3c0c8f1219aaecdc06",
                "sha256": "7dee52b67371e0d93e11677bf65c8bc541f0b80ffb9c4c4a23b9d8d547a529bd"
            },
            "downloads": -1,
            "filename": "dataspot-0.4.6.tar.gz",
            "has_sig": false,
            "md5_digest": "9365bffd86d8ba3c0c8f1219aaecdc06",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 5623631,
            "upload_time": "2025-07-24T01:11:36",
            "upload_time_iso_8601": "2025-07-24T01:11:36.753429Z",
            "url": "https://files.pythonhosted.org/packages/80/d8/2fbf27f65c6235150df0aff49c2f9a829b419564f5c6581135b0e8534a4f/dataspot-0.4.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-24 01:11:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "frauddi",
    "github_project": "dataspot",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "dataspot"
}

None