gitflow-analytics


Namegitflow-analytics JSON
Version 3.12.6 PyPI version JSON
download
home_pageNone
SummaryAnalyze Git repositories for developer productivity insights
upload_time2025-11-06 23:37:19
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords git analytics productivity metrics development
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # GitFlow Analytics

[![PyPI version](https://badge.fury.io/py/gitflow-analytics.svg)](https://badge.fury.io/py/gitflow-analytics)
[![Python Support](https://img.shields.io/pypi/pyversions/gitflow-analytics.svg)](https://pypi.org/project/gitflow-analytics/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://github.com/bobmatnyc/gitflow-analytics/tree/main/docs)
[![Tests](https://github.com/bobmatnyc/gitflow-analytics/workflows/Tests/badge.svg)](https://github.com/bobmatnyc/gitflow-analytics/actions)

A comprehensive Python package for analyzing Git repositories to generate developer productivity insights without requiring external project management tools. Extract actionable metrics directly from Git history with ML-enhanced commit categorization, automated developer identity resolution, and professional reporting.

## 🚀 Key Features

- **🔍 Zero Dependencies**: Analyze productivity without requiring JIRA, Linear, or other PM tools
- **🧠 ML-Powered Intelligence**: Advanced commit categorization with 85-95% accuracy
- **👥 Smart Identity Resolution**: Automatically consolidate developer identities across email addresses
- **🏢 Enterprise Ready**: Organization-wide repository discovery with intelligent caching
- **📊 Professional Reports**: Rich markdown narratives and CSV exports for executive dashboards

## 🎯 Quick Start

Get up and running in 5 minutes:

```bash
# 1. Install GitFlow Analytics
pip install gitflow-analytics

# 2. Install ML dependencies (optional but recommended)
python -m spacy download en_core_web_sm

# 3. Create a simple configuration
echo 'version: "1.0"
github:
  token: "${GITHUB_TOKEN}"
  organization: "your-org"' > config.yaml

# 4. Set your GitHub token
echo 'GITHUB_TOKEN=ghp_your_token_here' > .env

# 5. Run analysis
gitflow-analytics -c config.yaml --weeks 8
```

**What you get:**
- 📈 Weekly metrics CSV with developer productivity trends
- 👥 Developer profiles with project distribution and work styles
- 🔍 Untracked work analysis with ML-powered categorization
- 📋 Executive summary with actionable insights
- 📊 Rich markdown report ready for stakeholders

### Sample Output Preview

```markdown
## Executive Summary
- **Total Commits**: 156 across 3 projects
- **Active Developers**: 5 team members
- **Ticket Coverage**: 73.2% (industry benchmark: 60-80%)
- **Top Contributor**: Sarah Chen (32 commits, FRONTEND focus)

## Key Insights
🎯 **High Productivity**: Team averaged 31 commits/week
📊 **Balanced Workload**: No single developer >40% of total work
✅ **Good Process**: 73% ticket coverage shows strong tracking
```

## ✨ Latest Features (v1.2.x)

- **🚀 Two-Step Processing**: Optimized fetch-then-classify workflow for better performance
- **💰 Cost Tracking**: Monitor LLM API usage with detailed token and cost reporting
- **⚡ Smart Caching**: Intelligent caching reduces analysis time by up to 90%
- **🔄 Automatic Updates**: Repositories automatically fetch latest commits before analysis
- **📊 Weekly Trends**: Track classification pattern changes over time
- **🎯 Enhanced Categorization**: All commits properly categorized with confidence scores

## 🔥 Core Capabilities

**📊 Analysis & Insights**
- Multi-repository analysis with intelligent project grouping
- ML-enhanced commit categorization (85-95% accuracy)
- Developer productivity metrics and work pattern analysis
- Story point extraction from commits and PRs
- Ticket tracking across JIRA, GitHub, ClickUp, and Linear

**🏢 Enterprise Features**
- Organization-wide repository discovery from GitHub
- Automated developer identity resolution and consolidation
- Database-backed caching for sub-second report generation
- Data anonymization for secure external sharing
- Batch processing optimized for large repositories

**📈 Professional Reporting**
- Rich markdown narratives with executive summaries
- Weekly CSV exports with trend analysis
- Customizable output formats and filtering
- Performance benchmarking and team comparisons

## 📚 Documentation

Comprehensive guides for every use case:

| **Getting Started** | **Advanced Usage** | **Integration** |
|-------------------|------------------|---------------|
| [Installation](docs/getting-started/installation.md) | [Complete Configuration](docs/guides/configuration.md) | [CLI Reference](docs/reference/cli-commands.md) |
| [5-Minute Tutorial](docs/getting-started/quickstart.md) | [ML Categorization](docs/guides/ml-categorization.md) | [JSON Export Schema](docs/reference/json-export-schema.md) |
| [First Analysis](docs/getting-started/first-analysis.md) | [Enterprise Setup](docs/examples/enterprise-setup.md) | [CI Integration](docs/examples/ci-integration.md) |

**🎯 Quick Links:**
- 📖 [**Documentation Hub**](docs/README.md) - Complete guide index
- 🚀 [**Quick Start**](docs/getting-started/quickstart.md) - Get running in 5 minutes
- ⚙️ [**Configuration**](docs/guides/configuration.md) - Full reference
- 🤝 [**Contributing**](docs/developer/contributing.md) - Join the project

## ⚡ Installation Options

### Standard Installation
```bash
pip install gitflow-analytics
```

### With ML Enhancement (Recommended)
```bash
pip install gitflow-analytics
python -m spacy download en_core_web_sm
```

### Development Installation
```bash
git clone https://github.com/bobmatnyc/gitflow-analytics.git
cd gitflow-analytics
pip install -e ".[dev]"
python -m spacy download en_core_web_sm
```

## 🔧 Configuration

### Option 1: Organization Analysis (Recommended)
```yaml
# config.yaml
version: "1.0"
github:
  token: "${GITHUB_TOKEN}"
  organization: "your-org"  # Auto-discovers all repositories

analysis:
  ml_categorization:
    enabled: true
    min_confidence: 0.7
```

### Option 2: Specific Repositories
```yaml
# config.yaml  
version: "1.0"
github:
  token: "${GITHUB_TOKEN}"
  
repositories:
  - name: "my-app"
    path: "~/code/my-app"
    github_repo: "myorg/my-app"
    project_key: "APP"
```

### Environment Setup
```bash
# .env (same directory as config.yaml)
GITHUB_TOKEN=ghp_your_token_here
```

### Run Analysis
```bash
# Analyze last 8 weeks
gitflow-analytics -c config.yaml --weeks 8

# With custom output directory
gitflow-analytics -c config.yaml --weeks 8 --output ./reports
```

> 💡 **Need more configuration options?** See the [Complete Configuration Guide](docs/guides/configuration.md) for advanced features, integrations, and customization.

## 🎯 Excluding Merge Commits from Metrics

GitFlow Analytics can exclude merge commits from filtered line count calculations, following DORA metrics best practices.

### Why Exclude Merge Commits?

Merge commits represent repository management, not original development work:
- **Average merge commit**: 236.6 filtered lines vs 30.8 for regular commits (7.7x higher)
- Merge commits can **skew productivity metrics** and velocity calculations
- **DORA metrics best practice**: Focus on original development work, not repository management

### Configuration

Add this setting to your analysis configuration:

```yaml
analysis:
  # Exclude merge commits from filtered line counts (DORA metrics best practice)
  exclude_merge_commits: true  # Default: false
```

### Impact Example

Real metrics from EWTN dataset analysis:

| Metric | With Merge Commits | Without Merge Commits | Change |
|--------|-------------------|----------------------|--------|
| **Total Filtered Lines** | 138,730 | 54,808 | -60% |
| **Merge Commits** | 355 commits | 355 commits | (excluded from line counts) |
| **Regular Commits** | 1,426 commits | 1,426 commits | (unchanged) |

### What Gets Excluded?

When `exclude_merge_commits: true`:

✅ **Filtered Stats**: Merge commits (2+ parents) have `filtered_insertions = 0` and `filtered_deletions = 0`
✅ **Raw Stats**: Always preserved for all commits (accurate commit counts)
✅ **Reports**: Line count metrics reflect only original development work

❌ **Not affected**: Commit counts, developer activity tracking, ticket references

### When to Use

**✅ Enable when:**
- You want DORA-compliant metrics for productivity tracking
- Your workflow uses merge commits for pull requests
- You need accurate developer velocity without repository overhead
- You're comparing metrics across teams with different merge strategies

**❌ Disable when:**
- You want to track all repository activity including management overhead
- Merge commits represent significant manual conflict resolution in your workflow
- You're analyzing repositories without merge-heavy workflows
- You need to measure total repository churn including merges

### Example Configuration

```yaml
# Full configuration example
analysis:
  weeks_back: 8
  include_weekends: true

  # DORA-compliant metrics: exclude merge commits
  exclude_merge_commits: true

  # Analyze ALL branches to capture feature branch work
  branch_patterns:
    - "*"  # Include all branches (feature, develop, hotfix, etc.)
```

> 💡 **Pro Tip**: Combine `exclude_merge_commits: true` with `branch_patterns: ["*"]` to analyze all development work without merge overhead.

## 📊 Generated Reports

GitFlow Analytics generates comprehensive reports for different audiences:

### 📈 CSV Data Files
- **weekly_metrics.csv** - Developer productivity trends by week
- **weekly_velocity.csv** - Lines-per-story-point velocity analysis
- **developers.csv** - Complete team profiles and statistics  
- **summary.csv** - Project-wide statistics and benchmarks
- **untracked_commits.csv** - ML-categorized uncommitted work analysis

### 📋 Executive Reports
- **narrative_summary.md** - Rich markdown report with:
  - Executive summary with key metrics
  - Team composition and work distribution  
  - Project activity breakdown
  - Development patterns and recommendations
  - Weekly trend analysis

### Sample Executive Summary
```markdown
## Executive Summary
- **Total Commits**: 324 commits across 4 projects
- **Active Developers**: 8 team members  
- **Ticket Coverage**: 78.4% (above industry benchmark)
- **Top Areas**: Frontend (45%), API (32%), Infrastructure (23%)

## Key Insights  
✅ **Strong Process Adherence**: 78% ticket coverage
🎯 **Balanced Team**: No developer >35% of total work
📈 **Growth Trend**: +15% productivity vs last quarter
```

## 🛠️ Common Use Cases

**👥 Team Lead Dashboard**
- Track individual developer productivity and growth
- Identify workload distribution and potential burnout
- Monitor code quality trends and technical debt

**📈 Engineering Management**  
- Generate executive reports on team velocity
- Analyze process adherence and ticket coverage
- Benchmark performance across projects and quarters

**🔍 Process Optimization**
- Identify untracked work patterns that should be formalized
- Optimize developer focus and reduce context switching  
- Improve estimation accuracy with historical data

**🏢 Enterprise Analytics**
- Organization-wide repository analysis across dozens of projects
- Automated identity resolution for large, distributed teams
- Cost-effective analysis without expensive PM tool dependencies

## Command Line Interface

### Main Commands

```bash
# Analyze repositories (default command)
gitflow-analytics -c config.yaml --weeks 12 --output ./reports

# Explicit analyze command (backward compatibility)
gitflow-analytics analyze -c config.yaml --weeks 12 --output ./reports

# Show cache statistics
gitflow-analytics cache-stats -c config.yaml

# List known developers
gitflow-analytics list-developers -c config.yaml

# Analyze developer identities
gitflow-analytics identities -c config.yaml

# Merge developer identities
gitflow-analytics merge-identity -c config.yaml dev1_id dev2_id

# Discover story point fields in your PM platform
gitflow-analytics discover-storypoint-fields -c config.yaml
```

### Options

- `--weeks, -w`: Number of weeks to analyze (default: 12)
- `--output, -o`: Output directory for reports (default: ./reports)
- `--anonymize`: Anonymize developer information
- `--no-cache`: Disable caching for fresh analysis
- `--clear-cache`: Clear cache before analysis
- `--validate-only`: Validate configuration without running
- `--skip-identity-analysis`: Skip automatic identity analysis
- `--apply-identity-suggestions`: Apply identity suggestions without prompting

## Complete Configuration Example

Here's a complete example showing `.env` file and corresponding YAML configuration:

### `.env` file
```bash
# GitHub Configuration
GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx
GITHUB_ORG=your-organization

# PM Platform Configuration
JIRA_ACCESS_USER=developer@company.com
JIRA_ACCESS_TOKEN=ATATT3xxxxxxxxxxx
LINEAR_API_KEY=lin_api_xxxxxxxxxxxx
CLICKUP_API_TOKEN=pk_xxxxxxxxxxxx

# Note: GitHub Issues uses GITHUB_TOKEN automatically
```

### `config.yaml` file
```yaml
version: "1.0"

# GitHub configuration with organization discovery
github:
  token: "${GITHUB_TOKEN}"
  organization: "${GITHUB_ORG}"

# Multi-platform PM integration
pm:
  jira:
    access_user: "${JIRA_ACCESS_USER}"
    access_token: "${JIRA_ACCESS_TOKEN}"
    base_url: "https://company.atlassian.net"

  linear:
    api_key: "${LINEAR_API_KEY}"
    team_ids: ["team_123abc"]  # Optional: filter by specific teams

  clickup:
    api_token: "${CLICKUP_API_TOKEN}"
    workspace_url: "https://app.clickup.com/12345/v/"

# JIRA story point integration (optional)
jira_integration:
  enabled: true
  fetch_story_points: true
  story_point_fields:
    - "Story point estimate"     # Your field name
    - "customfield_10016"        # Fallback field ID

# Analysis configuration
analysis:
  # Track tickets from all configured platforms
  ticket_platforms:
    - jira
    - linear
    - clickup
    - github  # GitHub Issues (uses GITHUB_TOKEN)
  
  # Exclude bot commits and boilerplate files
  exclude:
    authors:
      - "dependabot[bot]"
      - "renovate[bot]"
    paths:
      - "**/node_modules/**"
      - "**/*.min.js"
      - "**/package-lock.json"
  
  # Developer identity consolidation
  identity:
    similarity_threshold: 0.85
    manual_mappings:
      - name: "John Doe"
        primary_email: "john.doe@company.com"
        aliases:
          - "jdoe@oldcompany.com"
          - "john@personal.com"

# Output configuration
output:
  directory: "./reports"
  formats:
    - csv
    - markdown
```

## Output Reports

The tool generates comprehensive CSV reports and markdown summaries:

### CSV Reports

1. **Weekly Metrics** (`weekly_metrics_YYYYMMDD.csv`)
   - Week-by-week developer productivity
   - Story points, commits, lines changed
   - Ticket coverage percentages
   - Per-project breakdown

2. **Weekly Velocity** (`weekly_velocity_YYYYMMDD.csv`)
   - Lines of code per story point analysis
   - Efficiency trends and velocity patterns
   - PR-based vs commit-based story points breakdown
   - Team velocity benchmarking and week-over-week trends

3. **Summary Statistics** (`summary_YYYYMMDD.csv`)
   - Overall project statistics
   - Platform-specific ticket counts
   - Top contributors

4. **Developer Report** (`developers_YYYYMMDD.csv`)
   - Complete developer profiles
   - Total contributions
   - Identity aliases

5. **Untracked Commits Report** (`untracked_commits_YYYYMMDD.csv`)
   - Detailed analysis of commits without ticket references
   - Commit categorization (bug_fix, feature, refactor, documentation, maintenance, test, style, build)
   - Enhanced metadata: commit hash, author, timestamp, project, message, file/line changes
   - Configurable file change threshold for filtering significant commits

### Enhanced Untracked Commit Analysis

The untracked commits report provides deep insights into work that bypasses ticket tracking:

**CSV Columns:**
- `commit_hash` / `short_hash`: Full and abbreviated commit identifiers
- `author` / `author_email` / `canonical_id`: Developer identification (with anonymization support)
- `date`: Commit timestamp
- `project`: Project key for multi-repository analysis
- `message`: Commit message (truncated for readability)
- `category`: Automated categorization of work type
- `files_changed` / `lines_added` / `lines_removed` / `lines_changed`: Change metrics
- `is_merge`: Boolean flag for merge commits

**Automatic Categorization:**
- **Feature**: New functionality development (`add`, `new`, `implement`, `create`)
- **Bug Fix**: Error corrections (`fix`, `bug`, `error`, `resolve`, `hotfix`)
- **Refactor**: Code restructuring (`refactor`, `optimize`, `improve`, `cleanup`)
- **Documentation**: Documentation updates (`doc`, `readme`, `comment`, `guide`)
- **Maintenance**: Routine upkeep (`update`, `upgrade`, `dependency`, `config`)
- **Test**: Testing-related changes (`test`, `spec`, `mock`, `fixture`)
- **Style**: Formatting changes (`format`, `lint`, `prettier`, `whitespace`)
- **Build**: Build system changes (`build`, `compile`, `ci`, `docker`)

### Markdown Reports

5. **Narrative Summary** (`narrative_summary_YYYYMMDD.md`)
   - **Executive Summary**: High-level metrics and team overview
   - **Team Composition**: Developer profiles with project percentages and work patterns
   - **Project Activity**: Detailed breakdown by project with contributor percentages and **commit classifications**
   - **Development Patterns**: Key insights from productivity and collaboration analysis
   - **Pull Request Analysis**: PR metrics including size, lifetime, and review activity
   - **Weekly Trends** (v1.1.0+): Week-over-week changes in classification patterns

6. **Database-Backed Qualitative Report** (`database_qualitative_report_YYYYMMDD.md`) (v1.1.0+)
   - Generated directly from SQLite storage for fast retrieval
   - Includes weekly trend analysis per developer/project
   - Shows classification changes over time (e.g., "Features: +15%, Bug Fixes: -5%")
   - **Issue Tracking**: Platform usage and coverage analysis with simplified display
   - **Enhanced Untracked Work Analysis**: Comprehensive categorization with dual percentage metrics
   - **PM Platform Integration**: Story point tracking and correlation insights (when available)
   - **Recommendations**: Actionable insights based on analysis patterns

### Enhanced Narrative Report Sections

The narrative report provides comprehensive insights through multiple detailed sections:

#### Team Composition Section
- **Developer Profiles**: Individual developer statistics with commit counts
- **Project Distribution**: Shows ALL projects each developer works on with precise percentages
- **Work Style Classification**: Categorizes developers as "Focused", "Multi-project", or "Highly Focused"
- **Activity Patterns**: Identifies time patterns like "Standard Hours" or "Extended Hours"

**Example developer profile:**
```markdown
**John Developer**
- Commits: 15
- Projects: FRONTEND (85.0%), SERVICE_TS (15.0%)
- Work Style: Focused
- Active Pattern: Standard Hours
```

#### Project Activity Section
- **Activity by Project**: Commits and percentage of total activity per project
- **Contributor Breakdown**: Shows each developer's contribution percentage within each project
- **Lines Changed**: Quantifies the scale of changes per project

#### Issue Tracking with Simplified Display
- **Platform Usage**: Clean display of ticket platform distribution (JIRA, GitHub, etc.)
- **Coverage Analysis**: Percentage of commits that reference tickets
- **Enhanced Untracked Work Analysis**: Detailed categorization and recommendations

### Interpreting Dual Percentage Metrics

The enhanced untracked work analysis provides two key percentage metrics for better context:

1. **Percentage of Total Untracked Work**: Shows how much each developer contributes to the overall untracked work pool
2. **Percentage of Developer's Individual Work**: Shows what proportion of a specific developer's commits are untracked

**Example interpretation:**
```
- John Doe: 25 commits (40% of untracked, 15% of their work) - maintenance, style
```

This means:
- John contributed 25 untracked commits
- These represent 40% of all untracked commits in the analysis period  
- Only 15% of John's total work was untracked (85% was properly tracked)
- Most untracked work was maintenance and style changes (acceptable categories)

**Process Insights:**
- High "% of untracked" + low "% of their work" = Developer doing most of the acceptable maintenance work
- Low "% of untracked" + high "% of their work" = Developer needs process guidance
- High percentages in feature/bug_fix categories = Process improvement opportunity

### Example Report Outputs

#### Untracked Commits CSV Sample
```csv
commit_hash,short_hash,author,author_email,canonical_id,date,project,message,category,files_changed,lines_added,lines_removed,lines_changed,is_merge
a1b2c3d4e5f6...,a1b2c3d,John Doe,john@company.com,ID0001,2024-01-15 14:30:22,FRONTEND,Update dependency versions for security patches,maintenance,2,45,12,57,false
f6e5d4c3b2a1...,f6e5d4c,Jane Smith,jane@company.com,ID0002,2024-01-15 09:15:10,BACKEND,Fix typo in error message,bug_fix,1,1,1,2,false
9876543210ab...,9876543,Bob Wilson,bob@company.com,ID0003,2024-01-14 16:45:33,FRONTEND,Add JSDoc comments to utility functions,documentation,3,28,0,28,false
```

#### Complete Narrative Report Sample
```markdown
# GitFlow Analytics Report

**Generated**: 2025-08-04 14:27:47
**Analysis Period**: Last 4 weeks

## Executive Summary

- **Total Commits**: 35
- **Active Developers**: 3
- **Lines Changed**: 910
- **Ticket Coverage**: 71.4%
- **Active Projects**: FRONTEND, SERVICE_TS, SERVICES
- **Top Contributor**: John Developer with 15 commits

## Team Composition

### Developer Profiles

**John Developer**
- Commits: 15
- Projects: FRONTEND (85.0%), SERVICE_TS (15.0%)
- Work Style: Focused
- Active Pattern: Standard Hours

**Jane Smith**
- Commits: 12
- Projects: SERVICE_TS (70.0%), FRONTEND (30.0%)
- Work Style: Multi-project
- Active Pattern: Extended Hours

## Project Activity

### Activity by Project

**FRONTEND**
- Commits: 14 (50.0% of total)
- Lines Changed: 450
- Contributors: John Developer (71.4%), Jane Smith (28.6%)

**SERVICE_TS**
- Commits: 8 (28.6% of total)
- Lines Changed: 280
- Contributors: Jane Smith (100.0%)

## Issue Tracking

### Platform Usage

- **Jira**: 15 tickets (60.0%)
- **Github**: 8 tickets (32.0%)
- **Clickup**: 2 tickets (8.0%)

### Untracked Work Analysis

**Summary**: 10 commits (28.6% of total) lack ticket references.

#### Work Categories

- **Maintenance**: 4 commits (40.0%), avg 23 lines *(acceptable untracked)*
- **Bug Fix**: 3 commits (30.0%), avg 15 lines *(should be tracked)*
- **Documentation**: 2 commits (20.0%), avg 12 lines *(acceptable untracked)*

#### Top Contributors (Untracked Work)

- **John Developer**: 1 commits (50.0% of untracked, 6.7% of their work) - *refactor*
- **Jane Smith**: 1 commits (50.0% of untracked, 8.3% of their work) - *style*

#### Recommendations for Untracked Work

🎯 **Excellent tracking**: Less than 20% of commits are untracked - the team shows strong process adherence.

## Recommendations

✅ The team shows healthy development patterns. Continue current practices while monitoring for changes.
```

### Configuration for Enhanced Narrative Reports

The narrative reports automatically include all available sections based on your configuration and data availability:

**Always Generated:**
- Executive Summary, Team Composition, Project Activity, Development Patterns, Issue Tracking, Recommendations

**Conditionally Generated:**
- **Pull Request Analysis**: Requires GitHub integration with PR data
- **PM Platform Integration**: Requires JIRA or other PM platform configuration
- **Qualitative Analysis**: Requires ChatGPT integration setup

**Customizing Report Content:**
```yaml
# config.yaml
output:
  formats:
    - csv
    - markdown  # Enables narrative report generation
  
# Optional: Enhance narrative reports with additional data
jira:
  access_user: "${JIRA_ACCESS_USER}"
  access_token: "${JIRA_ACCESS_TOKEN}"
  base_url: "https://company.atlassian.net"

# Optional: Add qualitative insights
analysis:
  chatgpt:
    enabled: true
    api_key: "${OPENAI_API_KEY}"
```

## Story Point Patterns

Configure custom regex patterns to match your team's story point format:

```yaml
story_point_patterns:
  - "SP: (\\d+)"           # SP: 5
  - "\\[([0-9]+) pts\\]"   # [3 pts]
  - "estimate: (\\d+)"     # estimate: 8
```

## Ticket Platform Support

Automatically detects and tracks tickets from multiple PM platforms:
- **JIRA**: `PROJ-123`
- **GitHub Issues**: `#123`, `GH-123`
- **ClickUp**: `CU-abc123`
- **Linear**: `ENG-123`

### Multi-Platform PM Integration

GitFlow Analytics supports multiple project management platforms simultaneously. You can configure one or more platforms based on your team's workflow:

```yaml
# Configure which platforms to track
analysis:
  ticket_platforms:
    - jira
    - linear
    - clickup
    - github  # GitHub Issues

# Platform-specific configuration
pm:
  jira:
    access_user: "${JIRA_ACCESS_USER}"
    access_token: "${JIRA_ACCESS_TOKEN}"
    base_url: "https://your-company.atlassian.net"

  linear:
    api_key: "${LINEAR_API_KEY}"
    team_ids:  # Optional: filter by team
      - "team_123abc"

  clickup:
    api_token: "${CLICKUP_API_TOKEN}"
    workspace_url: "https://app.clickup.com/12345/v/"

# GitHub Issues uses existing GitHub token automatically
github:
  token: "${GITHUB_TOKEN}"
```

### Platform Setup Guides

#### JIRA Setup
1. **Get API Token**: Go to [Atlassian API Tokens](https://id.atlassian.com/manage-profile/security/api-tokens)
2. **Required Permissions**: Read access to projects and issues
3. **Configuration**:
   ```yaml
   pm:
     jira:
       access_user: "${JIRA_ACCESS_USER}"  # Your Atlassian email
       access_token: "${JIRA_ACCESS_TOKEN}"
       base_url: "https://your-company.atlassian.net"
   ```

#### Linear Setup
1. **Get API Key**: Go to [Linear Settings → API](https://linear.app/settings/api)
2. **Required Permissions**: Read access to issues
3. **Configuration**:
   ```yaml
   pm:
     linear:
       api_key: "${LINEAR_API_KEY}"
       team_ids: ["team_123abc"]  # Optional: specify team IDs
   ```

#### ClickUp Setup
1. **Get API Token**: Go to [ClickUp Settings → Apps](https://app.clickup.com/settings/apps)
2. **Get Workspace URL**: Copy from browser when viewing your workspace
3. **Configuration**:
   ```yaml
   pm:
     clickup:
       api_token: "${CLICKUP_API_TOKEN}"
       workspace_url: "https://app.clickup.com/12345/v/"
   ```

#### GitHub Issues Setup
GitHub Issues is automatically enabled when GitHub integration is configured. No additional setup required:
```yaml
github:
  token: "${GITHUB_TOKEN}"  # Same token for repo access and issues
```

### JIRA Story Point Integration

GitFlow Analytics can fetch story points directly from JIRA tickets:

```yaml
jira_integration:
  enabled: true
  fetch_story_points: true
  story_point_fields:
    - "Story point estimate"  # Your custom field name
    - "customfield_10016"     # Or use field ID
```

To discover your JIRA story point fields:
```bash
gitflow-analytics discover-storypoint-fields -c config.yaml
```

### Environment Variables for Credentials

Store credentials securely in a `.env` file:

```bash
# .env file (keep this secure and don't commit to git!)
GITHUB_TOKEN=ghp_your_token_here

# PM Platform Credentials
JIRA_ACCESS_USER=your.email@company.com
JIRA_ACCESS_TOKEN=ATATT3xxxxxxxxxxx
LINEAR_API_KEY=lin_api_xxxxxxxxxxxx
CLICKUP_API_TOKEN=pk_xxxxxxxxxxxx
```

## Caching

The tool uses SQLite for intelligent caching:
- Commit analysis results
- Developer identity mappings
- Pull request data

Cache is automatically managed with configurable TTL.

## Developer Identity Resolution

GitFlow Analytics intelligently consolidates developer identities across different email addresses and name variations:

### Automatic Identity Analysis (New!)

Identity analysis now runs **automatically by default** when no manual mappings exist. The system will:

1. **Analyze all developer identities** in your commits
2. **Show suggested consolidations** with a clear preview
3. **Prompt for approval** with a simple Y/n
4. **Update your configuration** automatically
5. **Continue analysis** with consolidated identities

Example of the interactive prompt:
```
🔍 Analyzing developer identities...

⚠️  Found 3 potential identity clusters:

📋 Suggested identity mappings:
   john.doe@company.com
     → 123456+johndoe@users.noreply.github.com
     → jdoe@personal.email.com

🤖 Found 2 bot accounts to exclude:
   - dependabot[bot]
   - renovate[bot]

────────────────────────────────────────────────────────────
Apply these identity mappings to your configuration? [Y/n]: 
```

This prompt appears at most once every 7 days. 

To skip automatic identity analysis:
```bash
# Simplified syntax (default)
gitflow-analytics -c config.yaml --skip-identity-analysis

# Explicit analyze command
gitflow-analytics analyze -c config.yaml --skip-identity-analysis
```

To manually run identity analysis:
```bash
gitflow-analytics identities -c config.yaml
```

### Smart Identity Matching

The system automatically detects:
- **GitHub noreply emails** (e.g., `150280367+username@users.noreply.github.com`)
- **Name variations** (e.g., "John Doe" vs "John D" vs "jdoe")
- **Common email patterns** across domains
- **Bot accounts** for automatic exclusion

### Manual Configuration

You can also manually configure identity mappings in your YAML:

```yaml
analysis:
  identity:
    manual_mappings:
      - name: "John Doe"  # Optional: preferred display name for reports
        primary_email: john.doe@company.com
        aliases:
          - jdoe@personal.email.com
          - 123456+johndoe@users.noreply.github.com
      - name: "Sarah Smith"
        primary_email: sarah.smith@company.com
        aliases:
          - s.smith@oldcompany.com
```

### Display Name Control

The optional `name` field in manual mappings allows you to control how developer names appear in reports. This is particularly useful for:

- **Standardizing display names** across different email formats
- **Resolving duplicates** when the same person appears with slight name variations
- **Using preferred names** instead of technical email formats

**Example use cases:**
```yaml
analysis:
  identity:
    manual_mappings:
      # Consolidate Austin Zach identities
      - name: "Austin Zach"
        primary_email: "john.smith@company.com"
        aliases:
          - "150280367+jsmith@users.noreply.github.com"
          - "jsmith-company@users.noreply.github.com"
      
      # Standardize name variations
      - name: "John Doe"  # Consistent display across all reports
        primary_email: "john.doe@company.com"
        aliases:
          - "johndoe@company.com"
          - "j.doe@company.com"
```

Without the `name` field, the system uses the canonical email's associated name, which might not be ideal for reporting.

### Disabling Automatic Analysis

To disable the automatic identity prompt:
```yaml
analysis:
  identity:
    auto_analysis: false
```

## ML-Enhanced Commit Categorization

GitFlow Analytics includes sophisticated machine learning capabilities for categorizing commits with high accuracy and confidence scoring.

### How It Works

The ML categorization system uses a **hybrid approach** combining:

1. **Semantic Analysis**: Uses spaCy NLP models to understand commit message meaning
2. **File Pattern Recognition**: Analyzes changed files for additional context signals  
3. **Rule-based Fallback**: Falls back to traditional regex patterns when ML confidence is low
4. **Confidence Scoring**: Provides confidence metrics for all categorizations

### Categories Detected

The system automatically categorizes commits into:

- **Feature**: New functionality development (`add`, `implement`, `create`)
- **Bug Fix**: Error corrections (`fix`, `resolve`, `correct`)
- **Refactor**: Code restructuring (`refactor`, `optimize`, `improve`) 
- **Documentation**: Documentation updates (`docs`, `readme`, `comment`)
- **Maintenance**: Routine upkeep (`update`, `upgrade`, `dependency`)
- **Test**: Testing-related changes (`test`, `spec`, `coverage`)
- **Style**: Formatting changes (`format`, `lint`, `prettier`)
- **Build**: Build system changes (`build`, `ci`, `docker`)
- **Security**: Security-related fixes (`security`, `vulnerability`)
- **Hotfix**: Urgent production fixes (`hotfix`, `critical`, `emergency`)
- **Config**: Configuration changes (`config`, `settings`, `environment`)

### Configuration

```yaml
analysis:
  ml_categorization:
    # Enable/disable ML categorization (default: true)
    enabled: true
    
    # Minimum confidence for ML predictions (0.0-1.0, default: 0.6)
    min_confidence: 0.6
    
    # Semantic vs file pattern weighting (default: 0.7 vs 0.3)
    semantic_weight: 0.7
    file_pattern_weight: 0.3
    
    # Confidence threshold for ML vs rule-based (default: 0.5)
    hybrid_threshold: 0.5
    
    # Caching for performance
    enable_caching: true
    cache_duration_days: 30
    
    # Processing settings
    batch_size: 100
```

### Installation Requirements

For ML categorization, install the spaCy English model:

```bash
python -m spacy download en_core_web_sm
```

**Alternative models** (if the default is unavailable):
```bash
# Medium model (more accurate, larger)
python -m spacy download en_core_web_md

# Large model (most accurate, largest)
python -m spacy download en_core_web_lg
```

### Performance Expectations

- **Accuracy**: 85-95% accuracy on typical commit messages
- **Speed**: ~50-100 commits/second with caching enabled
- **Fallback**: Gracefully disables qualitative analysis if spaCy model unavailable (provides helpful error messages)
- **Memory**: ~200MB additional memory usage for spaCy models

### Enhanced Reports

With ML categorization enabled, reports include:

- **Confidence scores** for each categorization
- **Method indicators** (ML, rules, or cached)
- **Alternative predictions** for uncertain cases
- **ML performance statistics** in analysis summaries

### Example Enhanced Output

```csv
commit_hash,category,ml_confidence,ml_method,message
a1b2c3d,feature,0.89,ml,"Add user authentication system"  
f6e5d4c,bug_fix,0.92,ml,"Fix memory leak in cache cleanup"
9876543,maintenance,0.74,rules,"Update dependency versions"
```

## Troubleshooting

### YAML Configuration Errors

GitFlow Analytics provides helpful error messages when YAML configuration issues are encountered. Here are common errors and their solutions:

#### Tab Characters Not Allowed
```
❌ YAML configuration error at line 3, column 1:
🚫 Tab characters are not allowed in YAML files!
```
**Fix**: Replace all tabs with spaces (use 2 or 4 spaces for indentation)
- Most editors can show whitespace characters and convert tabs to spaces
- In VS Code: View → Render Whitespace, then Edit → Convert Indentation to Spaces

#### Missing Colons
```
❌ YAML configuration error at line 5, column 10:
🚫 Missing colon (:) after a key name!
```
**Fix**: Add a colon and space after each key name
```yaml
# Correct:
repositories:
  - name: my-repo
    
# Incorrect:
repositories
  - name my-repo
```

#### Unclosed Quotes
```
❌ YAML configuration error at line 8, column 15:
🚫 Unclosed quoted string!
```
**Fix**: Ensure all quotes are properly closed
```yaml
# Correct:
token: "my-token-value"

# Incorrect:
token: "my-token-value
```

#### Invalid Indentation
```
❌ YAML configuration error:
🚫 Indentation error or invalid structure!
```
**Fix**: Use consistent indentation (either 2 or 4 spaces)
```yaml
# Correct:
analysis:
  exclude:
    paths:
      - "vendor/**"
      
# Incorrect:
analysis:
  exclude:
     paths:  # 3 spaces - inconsistent!
      - "vendor/**"
```

### Tips for Valid YAML

1. **Use a YAML validator**: Check your configuration with online YAML validators before using
2. **Enable whitespace display**: Make tabs and spaces visible in your editor
3. **Use quotes for special characters**: Wrap values containing `:`, `#`, `@`, etc. in quotes
4. **Consistent indentation**: Pick 2 or 4 spaces and stick to it throughout the file
5. **Check the sample config**: Reference `config-sample.yaml` for proper structure

### Configuration Validation

Beyond YAML syntax, GitFlow Analytics validates:
- Required fields (`repositories` must have `name` and `path`)
- Environment variable resolution
- File path existence
- Valid configuration structure

If you encounter persistent issues, run with `--debug` for detailed error information:
```bash
# Simplified syntax (default)
gitflow-analytics -c config.yaml --debug

# Explicit analyze command
gitflow-analytics analyze -c config.yaml --debug
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "gitflow-analytics",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "git, analytics, productivity, metrics, development",
    "author": null,
    "author_email": "Bob Matyas <bobmatnyc@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/9f/90/24d6d8c31ae61439b9e258445f7c548f7b7c9591b04e919e086d2e3765f6/gitflow_analytics-3.12.6.tar.gz",
    "platform": null,
    "description": "# GitFlow Analytics\n\n[![PyPI version](https://badge.fury.io/py/gitflow-analytics.svg)](https://badge.fury.io/py/gitflow-analytics)\n[![Python Support](https://img.shields.io/pypi/pyversions/gitflow-analytics.svg)](https://pypi.org/project/gitflow-analytics/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://github.com/bobmatnyc/gitflow-analytics/tree/main/docs)\n[![Tests](https://github.com/bobmatnyc/gitflow-analytics/workflows/Tests/badge.svg)](https://github.com/bobmatnyc/gitflow-analytics/actions)\n\nA comprehensive Python package for analyzing Git repositories to generate developer productivity insights without requiring external project management tools. Extract actionable metrics directly from Git history with ML-enhanced commit categorization, automated developer identity resolution, and professional reporting.\n\n## \ud83d\ude80 Key Features\n\n- **\ud83d\udd0d Zero Dependencies**: Analyze productivity without requiring JIRA, Linear, or other PM tools\n- **\ud83e\udde0 ML-Powered Intelligence**: Advanced commit categorization with 85-95% accuracy\n- **\ud83d\udc65 Smart Identity Resolution**: Automatically consolidate developer identities across email addresses\n- **\ud83c\udfe2 Enterprise Ready**: Organization-wide repository discovery with intelligent caching\n- **\ud83d\udcca Professional Reports**: Rich markdown narratives and CSV exports for executive dashboards\n\n## \ud83c\udfaf Quick Start\n\nGet up and running in 5 minutes:\n\n```bash\n# 1. Install GitFlow Analytics\npip install gitflow-analytics\n\n# 2. Install ML dependencies (optional but recommended)\npython -m spacy download en_core_web_sm\n\n# 3. Create a simple configuration\necho 'version: \"1.0\"\ngithub:\n  token: \"${GITHUB_TOKEN}\"\n  organization: \"your-org\"' > config.yaml\n\n# 4. Set your GitHub token\necho 'GITHUB_TOKEN=ghp_your_token_here' > .env\n\n# 5. Run analysis\ngitflow-analytics -c config.yaml --weeks 8\n```\n\n**What you get:**\n- \ud83d\udcc8 Weekly metrics CSV with developer productivity trends\n- \ud83d\udc65 Developer profiles with project distribution and work styles\n- \ud83d\udd0d Untracked work analysis with ML-powered categorization\n- \ud83d\udccb Executive summary with actionable insights\n- \ud83d\udcca Rich markdown report ready for stakeholders\n\n### Sample Output Preview\n\n```markdown\n## Executive Summary\n- **Total Commits**: 156 across 3 projects\n- **Active Developers**: 5 team members\n- **Ticket Coverage**: 73.2% (industry benchmark: 60-80%)\n- **Top Contributor**: Sarah Chen (32 commits, FRONTEND focus)\n\n## Key Insights\n\ud83c\udfaf **High Productivity**: Team averaged 31 commits/week\n\ud83d\udcca **Balanced Workload**: No single developer >40% of total work\n\u2705 **Good Process**: 73% ticket coverage shows strong tracking\n```\n\n## \u2728 Latest Features (v1.2.x)\n\n- **\ud83d\ude80 Two-Step Processing**: Optimized fetch-then-classify workflow for better performance\n- **\ud83d\udcb0 Cost Tracking**: Monitor LLM API usage with detailed token and cost reporting\n- **\u26a1 Smart Caching**: Intelligent caching reduces analysis time by up to 90%\n- **\ud83d\udd04 Automatic Updates**: Repositories automatically fetch latest commits before analysis\n- **\ud83d\udcca Weekly Trends**: Track classification pattern changes over time\n- **\ud83c\udfaf Enhanced Categorization**: All commits properly categorized with confidence scores\n\n## \ud83d\udd25 Core Capabilities\n\n**\ud83d\udcca Analysis & Insights**\n- Multi-repository analysis with intelligent project grouping\n- ML-enhanced commit categorization (85-95% accuracy)\n- Developer productivity metrics and work pattern analysis\n- Story point extraction from commits and PRs\n- Ticket tracking across JIRA, GitHub, ClickUp, and Linear\n\n**\ud83c\udfe2 Enterprise Features**\n- Organization-wide repository discovery from GitHub\n- Automated developer identity resolution and consolidation\n- Database-backed caching for sub-second report generation\n- Data anonymization for secure external sharing\n- Batch processing optimized for large repositories\n\n**\ud83d\udcc8 Professional Reporting**\n- Rich markdown narratives with executive summaries\n- Weekly CSV exports with trend analysis\n- Customizable output formats and filtering\n- Performance benchmarking and team comparisons\n\n## \ud83d\udcda Documentation\n\nComprehensive guides for every use case:\n\n| **Getting Started** | **Advanced Usage** | **Integration** |\n|-------------------|------------------|---------------|\n| [Installation](docs/getting-started/installation.md) | [Complete Configuration](docs/guides/configuration.md) | [CLI Reference](docs/reference/cli-commands.md) |\n| [5-Minute Tutorial](docs/getting-started/quickstart.md) | [ML Categorization](docs/guides/ml-categorization.md) | [JSON Export Schema](docs/reference/json-export-schema.md) |\n| [First Analysis](docs/getting-started/first-analysis.md) | [Enterprise Setup](docs/examples/enterprise-setup.md) | [CI Integration](docs/examples/ci-integration.md) |\n\n**\ud83c\udfaf Quick Links:**\n- \ud83d\udcd6 [**Documentation Hub**](docs/README.md) - Complete guide index\n- \ud83d\ude80 [**Quick Start**](docs/getting-started/quickstart.md) - Get running in 5 minutes\n- \u2699\ufe0f [**Configuration**](docs/guides/configuration.md) - Full reference\n- \ud83e\udd1d [**Contributing**](docs/developer/contributing.md) - Join the project\n\n## \u26a1 Installation Options\n\n### Standard Installation\n```bash\npip install gitflow-analytics\n```\n\n### With ML Enhancement (Recommended)\n```bash\npip install gitflow-analytics\npython -m spacy download en_core_web_sm\n```\n\n### Development Installation\n```bash\ngit clone https://github.com/bobmatnyc/gitflow-analytics.git\ncd gitflow-analytics\npip install -e \".[dev]\"\npython -m spacy download en_core_web_sm\n```\n\n## \ud83d\udd27 Configuration\n\n### Option 1: Organization Analysis (Recommended)\n```yaml\n# config.yaml\nversion: \"1.0\"\ngithub:\n  token: \"${GITHUB_TOKEN}\"\n  organization: \"your-org\"  # Auto-discovers all repositories\n\nanalysis:\n  ml_categorization:\n    enabled: true\n    min_confidence: 0.7\n```\n\n### Option 2: Specific Repositories\n```yaml\n# config.yaml  \nversion: \"1.0\"\ngithub:\n  token: \"${GITHUB_TOKEN}\"\n  \nrepositories:\n  - name: \"my-app\"\n    path: \"~/code/my-app\"\n    github_repo: \"myorg/my-app\"\n    project_key: \"APP\"\n```\n\n### Environment Setup\n```bash\n# .env (same directory as config.yaml)\nGITHUB_TOKEN=ghp_your_token_here\n```\n\n### Run Analysis\n```bash\n# Analyze last 8 weeks\ngitflow-analytics -c config.yaml --weeks 8\n\n# With custom output directory\ngitflow-analytics -c config.yaml --weeks 8 --output ./reports\n```\n\n> \ud83d\udca1 **Need more configuration options?** See the [Complete Configuration Guide](docs/guides/configuration.md) for advanced features, integrations, and customization.\n\n## \ud83c\udfaf Excluding Merge Commits from Metrics\n\nGitFlow Analytics can exclude merge commits from filtered line count calculations, following DORA metrics best practices.\n\n### Why Exclude Merge Commits?\n\nMerge commits represent repository management, not original development work:\n- **Average merge commit**: 236.6 filtered lines vs 30.8 for regular commits (7.7x higher)\n- Merge commits can **skew productivity metrics** and velocity calculations\n- **DORA metrics best practice**: Focus on original development work, not repository management\n\n### Configuration\n\nAdd this setting to your analysis configuration:\n\n```yaml\nanalysis:\n  # Exclude merge commits from filtered line counts (DORA metrics best practice)\n  exclude_merge_commits: true  # Default: false\n```\n\n### Impact Example\n\nReal metrics from EWTN dataset analysis:\n\n| Metric | With Merge Commits | Without Merge Commits | Change |\n|--------|-------------------|----------------------|--------|\n| **Total Filtered Lines** | 138,730 | 54,808 | -60% |\n| **Merge Commits** | 355 commits | 355 commits | (excluded from line counts) |\n| **Regular Commits** | 1,426 commits | 1,426 commits | (unchanged) |\n\n### What Gets Excluded?\n\nWhen `exclude_merge_commits: true`:\n\n\u2705 **Filtered Stats**: Merge commits (2+ parents) have `filtered_insertions = 0` and `filtered_deletions = 0`\n\u2705 **Raw Stats**: Always preserved for all commits (accurate commit counts)\n\u2705 **Reports**: Line count metrics reflect only original development work\n\n\u274c **Not affected**: Commit counts, developer activity tracking, ticket references\n\n### When to Use\n\n**\u2705 Enable when:**\n- You want DORA-compliant metrics for productivity tracking\n- Your workflow uses merge commits for pull requests\n- You need accurate developer velocity without repository overhead\n- You're comparing metrics across teams with different merge strategies\n\n**\u274c Disable when:**\n- You want to track all repository activity including management overhead\n- Merge commits represent significant manual conflict resolution in your workflow\n- You're analyzing repositories without merge-heavy workflows\n- You need to measure total repository churn including merges\n\n### Example Configuration\n\n```yaml\n# Full configuration example\nanalysis:\n  weeks_back: 8\n  include_weekends: true\n\n  # DORA-compliant metrics: exclude merge commits\n  exclude_merge_commits: true\n\n  # Analyze ALL branches to capture feature branch work\n  branch_patterns:\n    - \"*\"  # Include all branches (feature, develop, hotfix, etc.)\n```\n\n> \ud83d\udca1 **Pro Tip**: Combine `exclude_merge_commits: true` with `branch_patterns: [\"*\"]` to analyze all development work without merge overhead.\n\n## \ud83d\udcca Generated Reports\n\nGitFlow Analytics generates comprehensive reports for different audiences:\n\n### \ud83d\udcc8 CSV Data Files\n- **weekly_metrics.csv** - Developer productivity trends by week\n- **weekly_velocity.csv** - Lines-per-story-point velocity analysis\n- **developers.csv** - Complete team profiles and statistics  \n- **summary.csv** - Project-wide statistics and benchmarks\n- **untracked_commits.csv** - ML-categorized uncommitted work analysis\n\n### \ud83d\udccb Executive Reports\n- **narrative_summary.md** - Rich markdown report with:\n  - Executive summary with key metrics\n  - Team composition and work distribution  \n  - Project activity breakdown\n  - Development patterns and recommendations\n  - Weekly trend analysis\n\n### Sample Executive Summary\n```markdown\n## Executive Summary\n- **Total Commits**: 324 commits across 4 projects\n- **Active Developers**: 8 team members  \n- **Ticket Coverage**: 78.4% (above industry benchmark)\n- **Top Areas**: Frontend (45%), API (32%), Infrastructure (23%)\n\n## Key Insights  \n\u2705 **Strong Process Adherence**: 78% ticket coverage\n\ud83c\udfaf **Balanced Team**: No developer >35% of total work\n\ud83d\udcc8 **Growth Trend**: +15% productivity vs last quarter\n```\n\n## \ud83d\udee0\ufe0f Common Use Cases\n\n**\ud83d\udc65 Team Lead Dashboard**\n- Track individual developer productivity and growth\n- Identify workload distribution and potential burnout\n- Monitor code quality trends and technical debt\n\n**\ud83d\udcc8 Engineering Management**  \n- Generate executive reports on team velocity\n- Analyze process adherence and ticket coverage\n- Benchmark performance across projects and quarters\n\n**\ud83d\udd0d Process Optimization**\n- Identify untracked work patterns that should be formalized\n- Optimize developer focus and reduce context switching  \n- Improve estimation accuracy with historical data\n\n**\ud83c\udfe2 Enterprise Analytics**\n- Organization-wide repository analysis across dozens of projects\n- Automated identity resolution for large, distributed teams\n- Cost-effective analysis without expensive PM tool dependencies\n\n## Command Line Interface\n\n### Main Commands\n\n```bash\n# Analyze repositories (default command)\ngitflow-analytics -c config.yaml --weeks 12 --output ./reports\n\n# Explicit analyze command (backward compatibility)\ngitflow-analytics analyze -c config.yaml --weeks 12 --output ./reports\n\n# Show cache statistics\ngitflow-analytics cache-stats -c config.yaml\n\n# List known developers\ngitflow-analytics list-developers -c config.yaml\n\n# Analyze developer identities\ngitflow-analytics identities -c config.yaml\n\n# Merge developer identities\ngitflow-analytics merge-identity -c config.yaml dev1_id dev2_id\n\n# Discover story point fields in your PM platform\ngitflow-analytics discover-storypoint-fields -c config.yaml\n```\n\n### Options\n\n- `--weeks, -w`: Number of weeks to analyze (default: 12)\n- `--output, -o`: Output directory for reports (default: ./reports)\n- `--anonymize`: Anonymize developer information\n- `--no-cache`: Disable caching for fresh analysis\n- `--clear-cache`: Clear cache before analysis\n- `--validate-only`: Validate configuration without running\n- `--skip-identity-analysis`: Skip automatic identity analysis\n- `--apply-identity-suggestions`: Apply identity suggestions without prompting\n\n## Complete Configuration Example\n\nHere's a complete example showing `.env` file and corresponding YAML configuration:\n\n### `.env` file\n```bash\n# GitHub Configuration\nGITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx\nGITHUB_ORG=your-organization\n\n# PM Platform Configuration\nJIRA_ACCESS_USER=developer@company.com\nJIRA_ACCESS_TOKEN=ATATT3xxxxxxxxxxx\nLINEAR_API_KEY=lin_api_xxxxxxxxxxxx\nCLICKUP_API_TOKEN=pk_xxxxxxxxxxxx\n\n# Note: GitHub Issues uses GITHUB_TOKEN automatically\n```\n\n### `config.yaml` file\n```yaml\nversion: \"1.0\"\n\n# GitHub configuration with organization discovery\ngithub:\n  token: \"${GITHUB_TOKEN}\"\n  organization: \"${GITHUB_ORG}\"\n\n# Multi-platform PM integration\npm:\n  jira:\n    access_user: \"${JIRA_ACCESS_USER}\"\n    access_token: \"${JIRA_ACCESS_TOKEN}\"\n    base_url: \"https://company.atlassian.net\"\n\n  linear:\n    api_key: \"${LINEAR_API_KEY}\"\n    team_ids: [\"team_123abc\"]  # Optional: filter by specific teams\n\n  clickup:\n    api_token: \"${CLICKUP_API_TOKEN}\"\n    workspace_url: \"https://app.clickup.com/12345/v/\"\n\n# JIRA story point integration (optional)\njira_integration:\n  enabled: true\n  fetch_story_points: true\n  story_point_fields:\n    - \"Story point estimate\"     # Your field name\n    - \"customfield_10016\"        # Fallback field ID\n\n# Analysis configuration\nanalysis:\n  # Track tickets from all configured platforms\n  ticket_platforms:\n    - jira\n    - linear\n    - clickup\n    - github  # GitHub Issues (uses GITHUB_TOKEN)\n  \n  # Exclude bot commits and boilerplate files\n  exclude:\n    authors:\n      - \"dependabot[bot]\"\n      - \"renovate[bot]\"\n    paths:\n      - \"**/node_modules/**\"\n      - \"**/*.min.js\"\n      - \"**/package-lock.json\"\n  \n  # Developer identity consolidation\n  identity:\n    similarity_threshold: 0.85\n    manual_mappings:\n      - name: \"John Doe\"\n        primary_email: \"john.doe@company.com\"\n        aliases:\n          - \"jdoe@oldcompany.com\"\n          - \"john@personal.com\"\n\n# Output configuration\noutput:\n  directory: \"./reports\"\n  formats:\n    - csv\n    - markdown\n```\n\n## Output Reports\n\nThe tool generates comprehensive CSV reports and markdown summaries:\n\n### CSV Reports\n\n1. **Weekly Metrics** (`weekly_metrics_YYYYMMDD.csv`)\n   - Week-by-week developer productivity\n   - Story points, commits, lines changed\n   - Ticket coverage percentages\n   - Per-project breakdown\n\n2. **Weekly Velocity** (`weekly_velocity_YYYYMMDD.csv`)\n   - Lines of code per story point analysis\n   - Efficiency trends and velocity patterns\n   - PR-based vs commit-based story points breakdown\n   - Team velocity benchmarking and week-over-week trends\n\n3. **Summary Statistics** (`summary_YYYYMMDD.csv`)\n   - Overall project statistics\n   - Platform-specific ticket counts\n   - Top contributors\n\n4. **Developer Report** (`developers_YYYYMMDD.csv`)\n   - Complete developer profiles\n   - Total contributions\n   - Identity aliases\n\n5. **Untracked Commits Report** (`untracked_commits_YYYYMMDD.csv`)\n   - Detailed analysis of commits without ticket references\n   - Commit categorization (bug_fix, feature, refactor, documentation, maintenance, test, style, build)\n   - Enhanced metadata: commit hash, author, timestamp, project, message, file/line changes\n   - Configurable file change threshold for filtering significant commits\n\n### Enhanced Untracked Commit Analysis\n\nThe untracked commits report provides deep insights into work that bypasses ticket tracking:\n\n**CSV Columns:**\n- `commit_hash` / `short_hash`: Full and abbreviated commit identifiers\n- `author` / `author_email` / `canonical_id`: Developer identification (with anonymization support)\n- `date`: Commit timestamp\n- `project`: Project key for multi-repository analysis\n- `message`: Commit message (truncated for readability)\n- `category`: Automated categorization of work type\n- `files_changed` / `lines_added` / `lines_removed` / `lines_changed`: Change metrics\n- `is_merge`: Boolean flag for merge commits\n\n**Automatic Categorization:**\n- **Feature**: New functionality development (`add`, `new`, `implement`, `create`)\n- **Bug Fix**: Error corrections (`fix`, `bug`, `error`, `resolve`, `hotfix`)\n- **Refactor**: Code restructuring (`refactor`, `optimize`, `improve`, `cleanup`)\n- **Documentation**: Documentation updates (`doc`, `readme`, `comment`, `guide`)\n- **Maintenance**: Routine upkeep (`update`, `upgrade`, `dependency`, `config`)\n- **Test**: Testing-related changes (`test`, `spec`, `mock`, `fixture`)\n- **Style**: Formatting changes (`format`, `lint`, `prettier`, `whitespace`)\n- **Build**: Build system changes (`build`, `compile`, `ci`, `docker`)\n\n### Markdown Reports\n\n5. **Narrative Summary** (`narrative_summary_YYYYMMDD.md`)\n   - **Executive Summary**: High-level metrics and team overview\n   - **Team Composition**: Developer profiles with project percentages and work patterns\n   - **Project Activity**: Detailed breakdown by project with contributor percentages and **commit classifications**\n   - **Development Patterns**: Key insights from productivity and collaboration analysis\n   - **Pull Request Analysis**: PR metrics including size, lifetime, and review activity\n   - **Weekly Trends** (v1.1.0+): Week-over-week changes in classification patterns\n\n6. **Database-Backed Qualitative Report** (`database_qualitative_report_YYYYMMDD.md`) (v1.1.0+)\n   - Generated directly from SQLite storage for fast retrieval\n   - Includes weekly trend analysis per developer/project\n   - Shows classification changes over time (e.g., \"Features: +15%, Bug Fixes: -5%\")\n   - **Issue Tracking**: Platform usage and coverage analysis with simplified display\n   - **Enhanced Untracked Work Analysis**: Comprehensive categorization with dual percentage metrics\n   - **PM Platform Integration**: Story point tracking and correlation insights (when available)\n   - **Recommendations**: Actionable insights based on analysis patterns\n\n### Enhanced Narrative Report Sections\n\nThe narrative report provides comprehensive insights through multiple detailed sections:\n\n#### Team Composition Section\n- **Developer Profiles**: Individual developer statistics with commit counts\n- **Project Distribution**: Shows ALL projects each developer works on with precise percentages\n- **Work Style Classification**: Categorizes developers as \"Focused\", \"Multi-project\", or \"Highly Focused\"\n- **Activity Patterns**: Identifies time patterns like \"Standard Hours\" or \"Extended Hours\"\n\n**Example developer profile:**\n```markdown\n**John Developer**\n- Commits: 15\n- Projects: FRONTEND (85.0%), SERVICE_TS (15.0%)\n- Work Style: Focused\n- Active Pattern: Standard Hours\n```\n\n#### Project Activity Section\n- **Activity by Project**: Commits and percentage of total activity per project\n- **Contributor Breakdown**: Shows each developer's contribution percentage within each project\n- **Lines Changed**: Quantifies the scale of changes per project\n\n#### Issue Tracking with Simplified Display\n- **Platform Usage**: Clean display of ticket platform distribution (JIRA, GitHub, etc.)\n- **Coverage Analysis**: Percentage of commits that reference tickets\n- **Enhanced Untracked Work Analysis**: Detailed categorization and recommendations\n\n### Interpreting Dual Percentage Metrics\n\nThe enhanced untracked work analysis provides two key percentage metrics for better context:\n\n1. **Percentage of Total Untracked Work**: Shows how much each developer contributes to the overall untracked work pool\n2. **Percentage of Developer's Individual Work**: Shows what proportion of a specific developer's commits are untracked\n\n**Example interpretation:**\n```\n- John Doe: 25 commits (40% of untracked, 15% of their work) - maintenance, style\n```\n\nThis means:\n- John contributed 25 untracked commits\n- These represent 40% of all untracked commits in the analysis period  \n- Only 15% of John's total work was untracked (85% was properly tracked)\n- Most untracked work was maintenance and style changes (acceptable categories)\n\n**Process Insights:**\n- High \"% of untracked\" + low \"% of their work\" = Developer doing most of the acceptable maintenance work\n- Low \"% of untracked\" + high \"% of their work\" = Developer needs process guidance\n- High percentages in feature/bug_fix categories = Process improvement opportunity\n\n### Example Report Outputs\n\n#### Untracked Commits CSV Sample\n```csv\ncommit_hash,short_hash,author,author_email,canonical_id,date,project,message,category,files_changed,lines_added,lines_removed,lines_changed,is_merge\na1b2c3d4e5f6...,a1b2c3d,John Doe,john@company.com,ID0001,2024-01-15 14:30:22,FRONTEND,Update dependency versions for security patches,maintenance,2,45,12,57,false\nf6e5d4c3b2a1...,f6e5d4c,Jane Smith,jane@company.com,ID0002,2024-01-15 09:15:10,BACKEND,Fix typo in error message,bug_fix,1,1,1,2,false\n9876543210ab...,9876543,Bob Wilson,bob@company.com,ID0003,2024-01-14 16:45:33,FRONTEND,Add JSDoc comments to utility functions,documentation,3,28,0,28,false\n```\n\n#### Complete Narrative Report Sample\n```markdown\n# GitFlow Analytics Report\n\n**Generated**: 2025-08-04 14:27:47\n**Analysis Period**: Last 4 weeks\n\n## Executive Summary\n\n- **Total Commits**: 35\n- **Active Developers**: 3\n- **Lines Changed**: 910\n- **Ticket Coverage**: 71.4%\n- **Active Projects**: FRONTEND, SERVICE_TS, SERVICES\n- **Top Contributor**: John Developer with 15 commits\n\n## Team Composition\n\n### Developer Profiles\n\n**John Developer**\n- Commits: 15\n- Projects: FRONTEND (85.0%), SERVICE_TS (15.0%)\n- Work Style: Focused\n- Active Pattern: Standard Hours\n\n**Jane Smith**\n- Commits: 12\n- Projects: SERVICE_TS (70.0%), FRONTEND (30.0%)\n- Work Style: Multi-project\n- Active Pattern: Extended Hours\n\n## Project Activity\n\n### Activity by Project\n\n**FRONTEND**\n- Commits: 14 (50.0% of total)\n- Lines Changed: 450\n- Contributors: John Developer (71.4%), Jane Smith (28.6%)\n\n**SERVICE_TS**\n- Commits: 8 (28.6% of total)\n- Lines Changed: 280\n- Contributors: Jane Smith (100.0%)\n\n## Issue Tracking\n\n### Platform Usage\n\n- **Jira**: 15 tickets (60.0%)\n- **Github**: 8 tickets (32.0%)\n- **Clickup**: 2 tickets (8.0%)\n\n### Untracked Work Analysis\n\n**Summary**: 10 commits (28.6% of total) lack ticket references.\n\n#### Work Categories\n\n- **Maintenance**: 4 commits (40.0%), avg 23 lines *(acceptable untracked)*\n- **Bug Fix**: 3 commits (30.0%), avg 15 lines *(should be tracked)*\n- **Documentation**: 2 commits (20.0%), avg 12 lines *(acceptable untracked)*\n\n#### Top Contributors (Untracked Work)\n\n- **John Developer**: 1 commits (50.0% of untracked, 6.7% of their work) - *refactor*\n- **Jane Smith**: 1 commits (50.0% of untracked, 8.3% of their work) - *style*\n\n#### Recommendations for Untracked Work\n\n\ud83c\udfaf **Excellent tracking**: Less than 20% of commits are untracked - the team shows strong process adherence.\n\n## Recommendations\n\n\u2705 The team shows healthy development patterns. Continue current practices while monitoring for changes.\n```\n\n### Configuration for Enhanced Narrative Reports\n\nThe narrative reports automatically include all available sections based on your configuration and data availability:\n\n**Always Generated:**\n- Executive Summary, Team Composition, Project Activity, Development Patterns, Issue Tracking, Recommendations\n\n**Conditionally Generated:**\n- **Pull Request Analysis**: Requires GitHub integration with PR data\n- **PM Platform Integration**: Requires JIRA or other PM platform configuration\n- **Qualitative Analysis**: Requires ChatGPT integration setup\n\n**Customizing Report Content:**\n```yaml\n# config.yaml\noutput:\n  formats:\n    - csv\n    - markdown  # Enables narrative report generation\n  \n# Optional: Enhance narrative reports with additional data\njira:\n  access_user: \"${JIRA_ACCESS_USER}\"\n  access_token: \"${JIRA_ACCESS_TOKEN}\"\n  base_url: \"https://company.atlassian.net\"\n\n# Optional: Add qualitative insights\nanalysis:\n  chatgpt:\n    enabled: true\n    api_key: \"${OPENAI_API_KEY}\"\n```\n\n## Story Point Patterns\n\nConfigure custom regex patterns to match your team's story point format:\n\n```yaml\nstory_point_patterns:\n  - \"SP: (\\\\d+)\"           # SP: 5\n  - \"\\\\[([0-9]+) pts\\\\]\"   # [3 pts]\n  - \"estimate: (\\\\d+)\"     # estimate: 8\n```\n\n## Ticket Platform Support\n\nAutomatically detects and tracks tickets from multiple PM platforms:\n- **JIRA**: `PROJ-123`\n- **GitHub Issues**: `#123`, `GH-123`\n- **ClickUp**: `CU-abc123`\n- **Linear**: `ENG-123`\n\n### Multi-Platform PM Integration\n\nGitFlow Analytics supports multiple project management platforms simultaneously. You can configure one or more platforms based on your team's workflow:\n\n```yaml\n# Configure which platforms to track\nanalysis:\n  ticket_platforms:\n    - jira\n    - linear\n    - clickup\n    - github  # GitHub Issues\n\n# Platform-specific configuration\npm:\n  jira:\n    access_user: \"${JIRA_ACCESS_USER}\"\n    access_token: \"${JIRA_ACCESS_TOKEN}\"\n    base_url: \"https://your-company.atlassian.net\"\n\n  linear:\n    api_key: \"${LINEAR_API_KEY}\"\n    team_ids:  # Optional: filter by team\n      - \"team_123abc\"\n\n  clickup:\n    api_token: \"${CLICKUP_API_TOKEN}\"\n    workspace_url: \"https://app.clickup.com/12345/v/\"\n\n# GitHub Issues uses existing GitHub token automatically\ngithub:\n  token: \"${GITHUB_TOKEN}\"\n```\n\n### Platform Setup Guides\n\n#### JIRA Setup\n1. **Get API Token**: Go to [Atlassian API Tokens](https://id.atlassian.com/manage-profile/security/api-tokens)\n2. **Required Permissions**: Read access to projects and issues\n3. **Configuration**:\n   ```yaml\n   pm:\n     jira:\n       access_user: \"${JIRA_ACCESS_USER}\"  # Your Atlassian email\n       access_token: \"${JIRA_ACCESS_TOKEN}\"\n       base_url: \"https://your-company.atlassian.net\"\n   ```\n\n#### Linear Setup\n1. **Get API Key**: Go to [Linear Settings \u2192 API](https://linear.app/settings/api)\n2. **Required Permissions**: Read access to issues\n3. **Configuration**:\n   ```yaml\n   pm:\n     linear:\n       api_key: \"${LINEAR_API_KEY}\"\n       team_ids: [\"team_123abc\"]  # Optional: specify team IDs\n   ```\n\n#### ClickUp Setup\n1. **Get API Token**: Go to [ClickUp Settings \u2192 Apps](https://app.clickup.com/settings/apps)\n2. **Get Workspace URL**: Copy from browser when viewing your workspace\n3. **Configuration**:\n   ```yaml\n   pm:\n     clickup:\n       api_token: \"${CLICKUP_API_TOKEN}\"\n       workspace_url: \"https://app.clickup.com/12345/v/\"\n   ```\n\n#### GitHub Issues Setup\nGitHub Issues is automatically enabled when GitHub integration is configured. No additional setup required:\n```yaml\ngithub:\n  token: \"${GITHUB_TOKEN}\"  # Same token for repo access and issues\n```\n\n### JIRA Story Point Integration\n\nGitFlow Analytics can fetch story points directly from JIRA tickets:\n\n```yaml\njira_integration:\n  enabled: true\n  fetch_story_points: true\n  story_point_fields:\n    - \"Story point estimate\"  # Your custom field name\n    - \"customfield_10016\"     # Or use field ID\n```\n\nTo discover your JIRA story point fields:\n```bash\ngitflow-analytics discover-storypoint-fields -c config.yaml\n```\n\n### Environment Variables for Credentials\n\nStore credentials securely in a `.env` file:\n\n```bash\n# .env file (keep this secure and don't commit to git!)\nGITHUB_TOKEN=ghp_your_token_here\n\n# PM Platform Credentials\nJIRA_ACCESS_USER=your.email@company.com\nJIRA_ACCESS_TOKEN=ATATT3xxxxxxxxxxx\nLINEAR_API_KEY=lin_api_xxxxxxxxxxxx\nCLICKUP_API_TOKEN=pk_xxxxxxxxxxxx\n```\n\n## Caching\n\nThe tool uses SQLite for intelligent caching:\n- Commit analysis results\n- Developer identity mappings\n- Pull request data\n\nCache is automatically managed with configurable TTL.\n\n## Developer Identity Resolution\n\nGitFlow Analytics intelligently consolidates developer identities across different email addresses and name variations:\n\n### Automatic Identity Analysis (New!)\n\nIdentity analysis now runs **automatically by default** when no manual mappings exist. The system will:\n\n1. **Analyze all developer identities** in your commits\n2. **Show suggested consolidations** with a clear preview\n3. **Prompt for approval** with a simple Y/n\n4. **Update your configuration** automatically\n5. **Continue analysis** with consolidated identities\n\nExample of the interactive prompt:\n```\n\ud83d\udd0d Analyzing developer identities...\n\n\u26a0\ufe0f  Found 3 potential identity clusters:\n\n\ud83d\udccb Suggested identity mappings:\n   john.doe@company.com\n     \u2192 123456+johndoe@users.noreply.github.com\n     \u2192 jdoe@personal.email.com\n\n\ud83e\udd16 Found 2 bot accounts to exclude:\n   - dependabot[bot]\n   - renovate[bot]\n\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nApply these identity mappings to your configuration? [Y/n]: \n```\n\nThis prompt appears at most once every 7 days. \n\nTo skip automatic identity analysis:\n```bash\n# Simplified syntax (default)\ngitflow-analytics -c config.yaml --skip-identity-analysis\n\n# Explicit analyze command\ngitflow-analytics analyze -c config.yaml --skip-identity-analysis\n```\n\nTo manually run identity analysis:\n```bash\ngitflow-analytics identities -c config.yaml\n```\n\n### Smart Identity Matching\n\nThe system automatically detects:\n- **GitHub noreply emails** (e.g., `150280367+username@users.noreply.github.com`)\n- **Name variations** (e.g., \"John Doe\" vs \"John D\" vs \"jdoe\")\n- **Common email patterns** across domains\n- **Bot accounts** for automatic exclusion\n\n### Manual Configuration\n\nYou can also manually configure identity mappings in your YAML:\n\n```yaml\nanalysis:\n  identity:\n    manual_mappings:\n      - name: \"John Doe\"  # Optional: preferred display name for reports\n        primary_email: john.doe@company.com\n        aliases:\n          - jdoe@personal.email.com\n          - 123456+johndoe@users.noreply.github.com\n      - name: \"Sarah Smith\"\n        primary_email: sarah.smith@company.com\n        aliases:\n          - s.smith@oldcompany.com\n```\n\n### Display Name Control\n\nThe optional `name` field in manual mappings allows you to control how developer names appear in reports. This is particularly useful for:\n\n- **Standardizing display names** across different email formats\n- **Resolving duplicates** when the same person appears with slight name variations\n- **Using preferred names** instead of technical email formats\n\n**Example use cases:**\n```yaml\nanalysis:\n  identity:\n    manual_mappings:\n      # Consolidate Austin Zach identities\n      - name: \"Austin Zach\"\n        primary_email: \"john.smith@company.com\"\n        aliases:\n          - \"150280367+jsmith@users.noreply.github.com\"\n          - \"jsmith-company@users.noreply.github.com\"\n      \n      # Standardize name variations\n      - name: \"John Doe\"  # Consistent display across all reports\n        primary_email: \"john.doe@company.com\"\n        aliases:\n          - \"johndoe@company.com\"\n          - \"j.doe@company.com\"\n```\n\nWithout the `name` field, the system uses the canonical email's associated name, which might not be ideal for reporting.\n\n### Disabling Automatic Analysis\n\nTo disable the automatic identity prompt:\n```yaml\nanalysis:\n  identity:\n    auto_analysis: false\n```\n\n## ML-Enhanced Commit Categorization\n\nGitFlow Analytics includes sophisticated machine learning capabilities for categorizing commits with high accuracy and confidence scoring.\n\n### How It Works\n\nThe ML categorization system uses a **hybrid approach** combining:\n\n1. **Semantic Analysis**: Uses spaCy NLP models to understand commit message meaning\n2. **File Pattern Recognition**: Analyzes changed files for additional context signals  \n3. **Rule-based Fallback**: Falls back to traditional regex patterns when ML confidence is low\n4. **Confidence Scoring**: Provides confidence metrics for all categorizations\n\n### Categories Detected\n\nThe system automatically categorizes commits into:\n\n- **Feature**: New functionality development (`add`, `implement`, `create`)\n- **Bug Fix**: Error corrections (`fix`, `resolve`, `correct`)\n- **Refactor**: Code restructuring (`refactor`, `optimize`, `improve`) \n- **Documentation**: Documentation updates (`docs`, `readme`, `comment`)\n- **Maintenance**: Routine upkeep (`update`, `upgrade`, `dependency`)\n- **Test**: Testing-related changes (`test`, `spec`, `coverage`)\n- **Style**: Formatting changes (`format`, `lint`, `prettier`)\n- **Build**: Build system changes (`build`, `ci`, `docker`)\n- **Security**: Security-related fixes (`security`, `vulnerability`)\n- **Hotfix**: Urgent production fixes (`hotfix`, `critical`, `emergency`)\n- **Config**: Configuration changes (`config`, `settings`, `environment`)\n\n### Configuration\n\n```yaml\nanalysis:\n  ml_categorization:\n    # Enable/disable ML categorization (default: true)\n    enabled: true\n    \n    # Minimum confidence for ML predictions (0.0-1.0, default: 0.6)\n    min_confidence: 0.6\n    \n    # Semantic vs file pattern weighting (default: 0.7 vs 0.3)\n    semantic_weight: 0.7\n    file_pattern_weight: 0.3\n    \n    # Confidence threshold for ML vs rule-based (default: 0.5)\n    hybrid_threshold: 0.5\n    \n    # Caching for performance\n    enable_caching: true\n    cache_duration_days: 30\n    \n    # Processing settings\n    batch_size: 100\n```\n\n### Installation Requirements\n\nFor ML categorization, install the spaCy English model:\n\n```bash\npython -m spacy download en_core_web_sm\n```\n\n**Alternative models** (if the default is unavailable):\n```bash\n# Medium model (more accurate, larger)\npython -m spacy download en_core_web_md\n\n# Large model (most accurate, largest)\npython -m spacy download en_core_web_lg\n```\n\n### Performance Expectations\n\n- **Accuracy**: 85-95% accuracy on typical commit messages\n- **Speed**: ~50-100 commits/second with caching enabled\n- **Fallback**: Gracefully disables qualitative analysis if spaCy model unavailable (provides helpful error messages)\n- **Memory**: ~200MB additional memory usage for spaCy models\n\n### Enhanced Reports\n\nWith ML categorization enabled, reports include:\n\n- **Confidence scores** for each categorization\n- **Method indicators** (ML, rules, or cached)\n- **Alternative predictions** for uncertain cases\n- **ML performance statistics** in analysis summaries\n\n### Example Enhanced Output\n\n```csv\ncommit_hash,category,ml_confidence,ml_method,message\na1b2c3d,feature,0.89,ml,\"Add user authentication system\"  \nf6e5d4c,bug_fix,0.92,ml,\"Fix memory leak in cache cleanup\"\n9876543,maintenance,0.74,rules,\"Update dependency versions\"\n```\n\n## Troubleshooting\n\n### YAML Configuration Errors\n\nGitFlow Analytics provides helpful error messages when YAML configuration issues are encountered. Here are common errors and their solutions:\n\n#### Tab Characters Not Allowed\n```\n\u274c YAML configuration error at line 3, column 1:\n\ud83d\udeab Tab characters are not allowed in YAML files!\n```\n**Fix**: Replace all tabs with spaces (use 2 or 4 spaces for indentation)\n- Most editors can show whitespace characters and convert tabs to spaces\n- In VS Code: View \u2192 Render Whitespace, then Edit \u2192 Convert Indentation to Spaces\n\n#### Missing Colons\n```\n\u274c YAML configuration error at line 5, column 10:\n\ud83d\udeab Missing colon (:) after a key name!\n```\n**Fix**: Add a colon and space after each key name\n```yaml\n# Correct:\nrepositories:\n  - name: my-repo\n    \n# Incorrect:\nrepositories\n  - name my-repo\n```\n\n#### Unclosed Quotes\n```\n\u274c YAML configuration error at line 8, column 15:\n\ud83d\udeab Unclosed quoted string!\n```\n**Fix**: Ensure all quotes are properly closed\n```yaml\n# Correct:\ntoken: \"my-token-value\"\n\n# Incorrect:\ntoken: \"my-token-value\n```\n\n#### Invalid Indentation\n```\n\u274c YAML configuration error:\n\ud83d\udeab Indentation error or invalid structure!\n```\n**Fix**: Use consistent indentation (either 2 or 4 spaces)\n```yaml\n# Correct:\nanalysis:\n  exclude:\n    paths:\n      - \"vendor/**\"\n      \n# Incorrect:\nanalysis:\n  exclude:\n     paths:  # 3 spaces - inconsistent!\n      - \"vendor/**\"\n```\n\n### Tips for Valid YAML\n\n1. **Use a YAML validator**: Check your configuration with online YAML validators before using\n2. **Enable whitespace display**: Make tabs and spaces visible in your editor\n3. **Use quotes for special characters**: Wrap values containing `:`, `#`, `@`, etc. in quotes\n4. **Consistent indentation**: Pick 2 or 4 spaces and stick to it throughout the file\n5. **Check the sample config**: Reference `config-sample.yaml` for proper structure\n\n### Configuration Validation\n\nBeyond YAML syntax, GitFlow Analytics validates:\n- Required fields (`repositories` must have `name` and `path`)\n- Environment variable resolution\n- File path existence\n- Valid configuration structure\n\nIf you encounter persistent issues, run with `--debug` for detailed error information:\n```bash\n# Simplified syntax (default)\ngitflow-analytics -c config.yaml --debug\n\n# Explicit analyze command\ngitflow-analytics analyze -c config.yaml --debug\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Analyze Git repositories for developer productivity insights",
    "version": "3.12.6",
    "project_urls": {
        "Documentation": "https://github.com/bobmatnyc/gitflow-analytics/blob/main/README.md",
        "Homepage": "https://github.com/bobmatnyc/gitflow-analytics",
        "Issues": "https://github.com/bobmatnyc/gitflow-analytics/issues",
        "Repository": "https://github.com/bobmatnyc/gitflow-analytics"
    },
    "split_keywords": [
        "git",
        " analytics",
        " productivity",
        " metrics",
        " development"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "404eb5c2ec732a1b3477cc8b38c59c72b10464c13bf752a87be982a1eeea76fd",
                "md5": "b4accbab1384e82256b06d8b5917bfcf",
                "sha256": "7c35f27cd7e057affb4f070b768408990d82a1b2649a5729c7517cb8f2b52b2a"
            },
            "downloads": -1,
            "filename": "gitflow_analytics-3.12.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b4accbab1384e82256b06d8b5917bfcf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 625857,
            "upload_time": "2025-11-06T23:37:17",
            "upload_time_iso_8601": "2025-11-06T23:37:17.593813Z",
            "url": "https://files.pythonhosted.org/packages/40/4e/b5c2ec732a1b3477cc8b38c59c72b10464c13bf752a87be982a1eeea76fd/gitflow_analytics-3.12.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9f9024d6d8c31ae61439b9e258445f7c548f7b7c9591b04e919e086d2e3765f6",
                "md5": "12af64c4e47595ab0bfa425216e943e2",
                "sha256": "d2d1958907912ed1564bbe2072d03857600faa5e41d7d8ff53e3670e19ef4342"
            },
            "downloads": -1,
            "filename": "gitflow_analytics-3.12.6.tar.gz",
            "has_sig": false,
            "md5_digest": "12af64c4e47595ab0bfa425216e943e2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 825631,
            "upload_time": "2025-11-06T23:37:19",
            "upload_time_iso_8601": "2025-11-06T23:37:19.571437Z",
            "url": "https://files.pythonhosted.org/packages/9f/90/24d6d8c31ae61439b9e258445f7c548f7b7c9591b04e919e086d2e3765f6/gitflow_analytics-3.12.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-06 23:37:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bobmatnyc",
    "github_project": "gitflow-analytics",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "gitflow-analytics"
}
        
Elapsed time: 4.57030s