# GitFlow Analytics
[](https://badge.fury.io/py/gitflow-analytics)
[](https://pypi.org/project/gitflow-analytics/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/bobmatnyc/gitflow-analytics/tree/main/docs)
[](https://github.com/bobmatnyc/gitflow-analytics/actions)
A comprehensive Python package for analyzing Git repositories to generate developer productivity insights without requiring external project management tools. Extract actionable metrics directly from Git history with ML-enhanced commit categorization, automated developer identity resolution, and professional reporting.
## 🚀 Key Features
- **🔍 Zero Dependencies**: Analyze productivity without requiring JIRA, Linear, or other PM tools
- **🧠 ML-Powered Intelligence**: Advanced commit categorization with 85-95% accuracy
- **👥 Smart Identity Resolution**: Automatically consolidate developer identities across email addresses
- **🏢 Enterprise Ready**: Organization-wide repository discovery with intelligent caching
- **📊 Professional Reports**: Rich markdown narratives and CSV exports for executive dashboards
## 🎯 Quick Start
Get up and running in 5 minutes:
```bash
# 1. Install GitFlow Analytics
pip install gitflow-analytics
# 2. Install ML dependencies (optional but recommended)
python -m spacy download en_core_web_sm
# 3. Create a simple configuration
echo 'version: "1.0"
github:
token: "${GITHUB_TOKEN}"
organization: "your-org"' > config.yaml
# 4. Set your GitHub token
echo 'GITHUB_TOKEN=ghp_your_token_here' > .env
# 5. Run analysis
gitflow-analytics -c config.yaml --weeks 8
```
**What you get:**
- 📈 Weekly metrics CSV with developer productivity trends
- 👥 Developer profiles with project distribution and work styles
- 🔍 Untracked work analysis with ML-powered categorization
- 📋 Executive summary with actionable insights
- 📊 Rich markdown report ready for stakeholders
### Sample Output Preview
```markdown
## Executive Summary
- **Total Commits**: 156 across 3 projects
- **Active Developers**: 5 team members
- **Ticket Coverage**: 73.2% (industry benchmark: 60-80%)
- **Top Contributor**: Sarah Chen (32 commits, FRONTEND focus)
## Key Insights
🎯 **High Productivity**: Team averaged 31 commits/week
📊 **Balanced Workload**: No single developer >40% of total work
✅ **Good Process**: 73% ticket coverage shows strong tracking
```
## ✨ Latest Features (v1.2.x)
- **🚀 Two-Step Processing**: Optimized fetch-then-classify workflow for better performance
- **💰 Cost Tracking**: Monitor LLM API usage with detailed token and cost reporting
- **⚡ Smart Caching**: Intelligent caching reduces analysis time by up to 90%
- **🔄 Automatic Updates**: Repositories automatically fetch latest commits before analysis
- **📊 Weekly Trends**: Track classification pattern changes over time
- **🎯 Enhanced Categorization**: All commits properly categorized with confidence scores
## 🔥 Core Capabilities
**📊 Analysis & Insights**
- Multi-repository analysis with intelligent project grouping
- ML-enhanced commit categorization (85-95% accuracy)
- Developer productivity metrics and work pattern analysis
- Story point extraction from commits and PRs
- Ticket tracking across JIRA, GitHub, ClickUp, and Linear
**🏢 Enterprise Features**
- Organization-wide repository discovery from GitHub
- Automated developer identity resolution and consolidation
- Database-backed caching for sub-second report generation
- Data anonymization for secure external sharing
- Batch processing optimized for large repositories
**📈 Professional Reporting**
- Rich markdown narratives with executive summaries
- Weekly CSV exports with trend analysis
- Customizable output formats and filtering
- Performance benchmarking and team comparisons
## 📚 Documentation
Comprehensive guides for every use case:
| **Getting Started** | **Advanced Usage** | **Integration** |
|-------------------|------------------|---------------|
| [Installation](docs/getting-started/installation.md) | [Complete Configuration](docs/guides/configuration.md) | [CLI Reference](docs/reference/cli-commands.md) |
| [5-Minute Tutorial](docs/getting-started/quickstart.md) | [ML Categorization](docs/guides/ml-categorization.md) | [JSON Export Schema](docs/reference/json-export-schema.md) |
| [First Analysis](docs/getting-started/first-analysis.md) | [Enterprise Setup](docs/examples/enterprise-setup.md) | [CI Integration](docs/examples/ci-integration.md) |
**🎯 Quick Links:**
- 📖 [**Documentation Hub**](docs/README.md) - Complete guide index
- 🚀 [**Quick Start**](docs/getting-started/quickstart.md) - Get running in 5 minutes
- ⚙️ [**Configuration**](docs/guides/configuration.md) - Full reference
- 🤝 [**Contributing**](docs/developer/contributing.md) - Join the project
## ⚡ Installation Options
### Standard Installation
```bash
pip install gitflow-analytics
```
### With ML Enhancement (Recommended)
```bash
pip install gitflow-analytics
python -m spacy download en_core_web_sm
```
### Development Installation
```bash
git clone https://github.com/bobmatnyc/gitflow-analytics.git
cd gitflow-analytics
pip install -e ".[dev]"
python -m spacy download en_core_web_sm
```
## 🔧 Configuration
### Option 1: Organization Analysis (Recommended)
```yaml
# config.yaml
version: "1.0"
github:
token: "${GITHUB_TOKEN}"
organization: "your-org" # Auto-discovers all repositories
analysis:
ml_categorization:
enabled: true
min_confidence: 0.7
```
### Option 2: Specific Repositories
```yaml
# config.yaml
version: "1.0"
github:
token: "${GITHUB_TOKEN}"
repositories:
- name: "my-app"
path: "~/code/my-app"
github_repo: "myorg/my-app"
project_key: "APP"
```
### Environment Setup
```bash
# .env (same directory as config.yaml)
GITHUB_TOKEN=ghp_your_token_here
```
### Run Analysis
```bash
# Analyze last 8 weeks
gitflow-analytics -c config.yaml --weeks 8
# With custom output directory
gitflow-analytics -c config.yaml --weeks 8 --output ./reports
```
> 💡 **Need more configuration options?** See the [Complete Configuration Guide](docs/guides/configuration.md) for advanced features, integrations, and customization.
## 🎯 Excluding Merge Commits from Metrics
GitFlow Analytics can exclude merge commits from filtered line count calculations, following DORA metrics best practices.
### Why Exclude Merge Commits?
Merge commits represent repository management, not original development work:
- **Average merge commit**: 236.6 filtered lines vs 30.8 for regular commits (7.7x higher)
- Merge commits can **skew productivity metrics** and velocity calculations
- **DORA metrics best practice**: Focus on original development work, not repository management
### Configuration
Add this setting to your analysis configuration:
```yaml
analysis:
# Exclude merge commits from filtered line counts (DORA metrics best practice)
exclude_merge_commits: true # Default: false
```
### Impact Example
Real metrics from EWTN dataset analysis:
| Metric | With Merge Commits | Without Merge Commits | Change |
|--------|-------------------|----------------------|--------|
| **Total Filtered Lines** | 138,730 | 54,808 | -60% |
| **Merge Commits** | 355 commits | 355 commits | (excluded from line counts) |
| **Regular Commits** | 1,426 commits | 1,426 commits | (unchanged) |
### What Gets Excluded?
When `exclude_merge_commits: true`:
✅ **Filtered Stats**: Merge commits (2+ parents) have `filtered_insertions = 0` and `filtered_deletions = 0`
✅ **Raw Stats**: Always preserved for all commits (accurate commit counts)
✅ **Reports**: Line count metrics reflect only original development work
❌ **Not affected**: Commit counts, developer activity tracking, ticket references
### When to Use
**✅ Enable when:**
- You want DORA-compliant metrics for productivity tracking
- Your workflow uses merge commits for pull requests
- You need accurate developer velocity without repository overhead
- You're comparing metrics across teams with different merge strategies
**❌ Disable when:**
- You want to track all repository activity including management overhead
- Merge commits represent significant manual conflict resolution in your workflow
- You're analyzing repositories without merge-heavy workflows
- You need to measure total repository churn including merges
### Example Configuration
```yaml
# Full configuration example
analysis:
weeks_back: 8
include_weekends: true
# DORA-compliant metrics: exclude merge commits
exclude_merge_commits: true
# Analyze ALL branches to capture feature branch work
branch_patterns:
- "*" # Include all branches (feature, develop, hotfix, etc.)
```
> 💡 **Pro Tip**: Combine `exclude_merge_commits: true` with `branch_patterns: ["*"]` to analyze all development work without merge overhead.
## 📊 Generated Reports
GitFlow Analytics generates comprehensive reports for different audiences:
### 📈 CSV Data Files
- **weekly_metrics.csv** - Developer productivity trends by week
- **weekly_velocity.csv** - Lines-per-story-point velocity analysis
- **developers.csv** - Complete team profiles and statistics
- **summary.csv** - Project-wide statistics and benchmarks
- **untracked_commits.csv** - ML-categorized uncommitted work analysis
### 📋 Executive Reports
- **narrative_summary.md** - Rich markdown report with:
- Executive summary with key metrics
- Team composition and work distribution
- Project activity breakdown
- Development patterns and recommendations
- Weekly trend analysis
### Sample Executive Summary
```markdown
## Executive Summary
- **Total Commits**: 324 commits across 4 projects
- **Active Developers**: 8 team members
- **Ticket Coverage**: 78.4% (above industry benchmark)
- **Top Areas**: Frontend (45%), API (32%), Infrastructure (23%)
## Key Insights
✅ **Strong Process Adherence**: 78% ticket coverage
🎯 **Balanced Team**: No developer >35% of total work
📈 **Growth Trend**: +15% productivity vs last quarter
```
## 🛠️ Common Use Cases
**👥 Team Lead Dashboard**
- Track individual developer productivity and growth
- Identify workload distribution and potential burnout
- Monitor code quality trends and technical debt
**📈 Engineering Management**
- Generate executive reports on team velocity
- Analyze process adherence and ticket coverage
- Benchmark performance across projects and quarters
**🔍 Process Optimization**
- Identify untracked work patterns that should be formalized
- Optimize developer focus and reduce context switching
- Improve estimation accuracy with historical data
**🏢 Enterprise Analytics**
- Organization-wide repository analysis across dozens of projects
- Automated identity resolution for large, distributed teams
- Cost-effective analysis without expensive PM tool dependencies
## Command Line Interface
### Main Commands
```bash
# Analyze repositories (default command)
gitflow-analytics -c config.yaml --weeks 12 --output ./reports
# Explicit analyze command (backward compatibility)
gitflow-analytics analyze -c config.yaml --weeks 12 --output ./reports
# Show cache statistics
gitflow-analytics cache-stats -c config.yaml
# List known developers
gitflow-analytics list-developers -c config.yaml
# Analyze developer identities
gitflow-analytics identities -c config.yaml
# Merge developer identities
gitflow-analytics merge-identity -c config.yaml dev1_id dev2_id
# Discover story point fields in your PM platform
gitflow-analytics discover-storypoint-fields -c config.yaml
```
### Options
- `--weeks, -w`: Number of weeks to analyze (default: 12)
- `--output, -o`: Output directory for reports (default: ./reports)
- `--anonymize`: Anonymize developer information
- `--no-cache`: Disable caching for fresh analysis
- `--clear-cache`: Clear cache before analysis
- `--validate-only`: Validate configuration without running
- `--skip-identity-analysis`: Skip automatic identity analysis
- `--apply-identity-suggestions`: Apply identity suggestions without prompting
## Complete Configuration Example
Here's a complete example showing `.env` file and corresponding YAML configuration:
### `.env` file
```bash
# GitHub Configuration
GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx
GITHUB_ORG=your-organization
# PM Platform Configuration
JIRA_ACCESS_USER=developer@company.com
JIRA_ACCESS_TOKEN=ATATT3xxxxxxxxxxx
LINEAR_API_KEY=lin_api_xxxxxxxxxxxx
CLICKUP_API_TOKEN=pk_xxxxxxxxxxxx
# Note: GitHub Issues uses GITHUB_TOKEN automatically
```
### `config.yaml` file
```yaml
version: "1.0"
# GitHub configuration with organization discovery
github:
token: "${GITHUB_TOKEN}"
organization: "${GITHUB_ORG}"
# Multi-platform PM integration
pm:
jira:
access_user: "${JIRA_ACCESS_USER}"
access_token: "${JIRA_ACCESS_TOKEN}"
base_url: "https://company.atlassian.net"
linear:
api_key: "${LINEAR_API_KEY}"
team_ids: ["team_123abc"] # Optional: filter by specific teams
clickup:
api_token: "${CLICKUP_API_TOKEN}"
workspace_url: "https://app.clickup.com/12345/v/"
# JIRA story point integration (optional)
jira_integration:
enabled: true
fetch_story_points: true
story_point_fields:
- "Story point estimate" # Your field name
- "customfield_10016" # Fallback field ID
# Analysis configuration
analysis:
# Track tickets from all configured platforms
ticket_platforms:
- jira
- linear
- clickup
- github # GitHub Issues (uses GITHUB_TOKEN)
# Exclude bot commits and boilerplate files
exclude:
authors:
- "dependabot[bot]"
- "renovate[bot]"
paths:
- "**/node_modules/**"
- "**/*.min.js"
- "**/package-lock.json"
# Developer identity consolidation
identity:
similarity_threshold: 0.85
manual_mappings:
- name: "John Doe"
primary_email: "john.doe@company.com"
aliases:
- "jdoe@oldcompany.com"
- "john@personal.com"
# Output configuration
output:
directory: "./reports"
formats:
- csv
- markdown
```
## Output Reports
The tool generates comprehensive CSV reports and markdown summaries:
### CSV Reports
1. **Weekly Metrics** (`weekly_metrics_YYYYMMDD.csv`)
- Week-by-week developer productivity
- Story points, commits, lines changed
- Ticket coverage percentages
- Per-project breakdown
2. **Weekly Velocity** (`weekly_velocity_YYYYMMDD.csv`)
- Lines of code per story point analysis
- Efficiency trends and velocity patterns
- PR-based vs commit-based story points breakdown
- Team velocity benchmarking and week-over-week trends
3. **Summary Statistics** (`summary_YYYYMMDD.csv`)
- Overall project statistics
- Platform-specific ticket counts
- Top contributors
4. **Developer Report** (`developers_YYYYMMDD.csv`)
- Complete developer profiles
- Total contributions
- Identity aliases
5. **Untracked Commits Report** (`untracked_commits_YYYYMMDD.csv`)
- Detailed analysis of commits without ticket references
- Commit categorization (bug_fix, feature, refactor, documentation, maintenance, test, style, build)
- Enhanced metadata: commit hash, author, timestamp, project, message, file/line changes
- Configurable file change threshold for filtering significant commits
### Enhanced Untracked Commit Analysis
The untracked commits report provides deep insights into work that bypasses ticket tracking:
**CSV Columns:**
- `commit_hash` / `short_hash`: Full and abbreviated commit identifiers
- `author` / `author_email` / `canonical_id`: Developer identification (with anonymization support)
- `date`: Commit timestamp
- `project`: Project key for multi-repository analysis
- `message`: Commit message (truncated for readability)
- `category`: Automated categorization of work type
- `files_changed` / `lines_added` / `lines_removed` / `lines_changed`: Change metrics
- `is_merge`: Boolean flag for merge commits
**Automatic Categorization:**
- **Feature**: New functionality development (`add`, `new`, `implement`, `create`)
- **Bug Fix**: Error corrections (`fix`, `bug`, `error`, `resolve`, `hotfix`)
- **Refactor**: Code restructuring (`refactor`, `optimize`, `improve`, `cleanup`)
- **Documentation**: Documentation updates (`doc`, `readme`, `comment`, `guide`)
- **Maintenance**: Routine upkeep (`update`, `upgrade`, `dependency`, `config`)
- **Test**: Testing-related changes (`test`, `spec`, `mock`, `fixture`)
- **Style**: Formatting changes (`format`, `lint`, `prettier`, `whitespace`)
- **Build**: Build system changes (`build`, `compile`, `ci`, `docker`)
### Markdown Reports
5. **Narrative Summary** (`narrative_summary_YYYYMMDD.md`)
- **Executive Summary**: High-level metrics and team overview
- **Team Composition**: Developer profiles with project percentages and work patterns
- **Project Activity**: Detailed breakdown by project with contributor percentages and **commit classifications**
- **Development Patterns**: Key insights from productivity and collaboration analysis
- **Pull Request Analysis**: PR metrics including size, lifetime, and review activity
- **Weekly Trends** (v1.1.0+): Week-over-week changes in classification patterns
6. **Database-Backed Qualitative Report** (`database_qualitative_report_YYYYMMDD.md`) (v1.1.0+)
- Generated directly from SQLite storage for fast retrieval
- Includes weekly trend analysis per developer/project
- Shows classification changes over time (e.g., "Features: +15%, Bug Fixes: -5%")
- **Issue Tracking**: Platform usage and coverage analysis with simplified display
- **Enhanced Untracked Work Analysis**: Comprehensive categorization with dual percentage metrics
- **PM Platform Integration**: Story point tracking and correlation insights (when available)
- **Recommendations**: Actionable insights based on analysis patterns
### Enhanced Narrative Report Sections
The narrative report provides comprehensive insights through multiple detailed sections:
#### Team Composition Section
- **Developer Profiles**: Individual developer statistics with commit counts
- **Project Distribution**: Shows ALL projects each developer works on with precise percentages
- **Work Style Classification**: Categorizes developers as "Focused", "Multi-project", or "Highly Focused"
- **Activity Patterns**: Identifies time patterns like "Standard Hours" or "Extended Hours"
**Example developer profile:**
```markdown
**John Developer**
- Commits: 15
- Projects: FRONTEND (85.0%), SERVICE_TS (15.0%)
- Work Style: Focused
- Active Pattern: Standard Hours
```
#### Project Activity Section
- **Activity by Project**: Commits and percentage of total activity per project
- **Contributor Breakdown**: Shows each developer's contribution percentage within each project
- **Lines Changed**: Quantifies the scale of changes per project
#### Issue Tracking with Simplified Display
- **Platform Usage**: Clean display of ticket platform distribution (JIRA, GitHub, etc.)
- **Coverage Analysis**: Percentage of commits that reference tickets
- **Enhanced Untracked Work Analysis**: Detailed categorization and recommendations
### Interpreting Dual Percentage Metrics
The enhanced untracked work analysis provides two key percentage metrics for better context:
1. **Percentage of Total Untracked Work**: Shows how much each developer contributes to the overall untracked work pool
2. **Percentage of Developer's Individual Work**: Shows what proportion of a specific developer's commits are untracked
**Example interpretation:**
```
- John Doe: 25 commits (40% of untracked, 15% of their work) - maintenance, style
```
This means:
- John contributed 25 untracked commits
- These represent 40% of all untracked commits in the analysis period
- Only 15% of John's total work was untracked (85% was properly tracked)
- Most untracked work was maintenance and style changes (acceptable categories)
**Process Insights:**
- High "% of untracked" + low "% of their work" = Developer doing most of the acceptable maintenance work
- Low "% of untracked" + high "% of their work" = Developer needs process guidance
- High percentages in feature/bug_fix categories = Process improvement opportunity
### Example Report Outputs
#### Untracked Commits CSV Sample
```csv
commit_hash,short_hash,author,author_email,canonical_id,date,project,message,category,files_changed,lines_added,lines_removed,lines_changed,is_merge
a1b2c3d4e5f6...,a1b2c3d,John Doe,john@company.com,ID0001,2024-01-15 14:30:22,FRONTEND,Update dependency versions for security patches,maintenance,2,45,12,57,false
f6e5d4c3b2a1...,f6e5d4c,Jane Smith,jane@company.com,ID0002,2024-01-15 09:15:10,BACKEND,Fix typo in error message,bug_fix,1,1,1,2,false
9876543210ab...,9876543,Bob Wilson,bob@company.com,ID0003,2024-01-14 16:45:33,FRONTEND,Add JSDoc comments to utility functions,documentation,3,28,0,28,false
```
#### Complete Narrative Report Sample
```markdown
# GitFlow Analytics Report
**Generated**: 2025-08-04 14:27:47
**Analysis Period**: Last 4 weeks
## Executive Summary
- **Total Commits**: 35
- **Active Developers**: 3
- **Lines Changed**: 910
- **Ticket Coverage**: 71.4%
- **Active Projects**: FRONTEND, SERVICE_TS, SERVICES
- **Top Contributor**: John Developer with 15 commits
## Team Composition
### Developer Profiles
**John Developer**
- Commits: 15
- Projects: FRONTEND (85.0%), SERVICE_TS (15.0%)
- Work Style: Focused
- Active Pattern: Standard Hours
**Jane Smith**
- Commits: 12
- Projects: SERVICE_TS (70.0%), FRONTEND (30.0%)
- Work Style: Multi-project
- Active Pattern: Extended Hours
## Project Activity
### Activity by Project
**FRONTEND**
- Commits: 14 (50.0% of total)
- Lines Changed: 450
- Contributors: John Developer (71.4%), Jane Smith (28.6%)
**SERVICE_TS**
- Commits: 8 (28.6% of total)
- Lines Changed: 280
- Contributors: Jane Smith (100.0%)
## Issue Tracking
### Platform Usage
- **Jira**: 15 tickets (60.0%)
- **Github**: 8 tickets (32.0%)
- **Clickup**: 2 tickets (8.0%)
### Untracked Work Analysis
**Summary**: 10 commits (28.6% of total) lack ticket references.
#### Work Categories
- **Maintenance**: 4 commits (40.0%), avg 23 lines *(acceptable untracked)*
- **Bug Fix**: 3 commits (30.0%), avg 15 lines *(should be tracked)*
- **Documentation**: 2 commits (20.0%), avg 12 lines *(acceptable untracked)*
#### Top Contributors (Untracked Work)
- **John Developer**: 1 commits (50.0% of untracked, 6.7% of their work) - *refactor*
- **Jane Smith**: 1 commits (50.0% of untracked, 8.3% of their work) - *style*
#### Recommendations for Untracked Work
🎯 **Excellent tracking**: Less than 20% of commits are untracked - the team shows strong process adherence.
## Recommendations
✅ The team shows healthy development patterns. Continue current practices while monitoring for changes.
```
### Configuration for Enhanced Narrative Reports
The narrative reports automatically include all available sections based on your configuration and data availability:
**Always Generated:**
- Executive Summary, Team Composition, Project Activity, Development Patterns, Issue Tracking, Recommendations
**Conditionally Generated:**
- **Pull Request Analysis**: Requires GitHub integration with PR data
- **PM Platform Integration**: Requires JIRA or other PM platform configuration
- **Qualitative Analysis**: Requires ChatGPT integration setup
**Customizing Report Content:**
```yaml
# config.yaml
output:
formats:
- csv
- markdown # Enables narrative report generation
# Optional: Enhance narrative reports with additional data
jira:
access_user: "${JIRA_ACCESS_USER}"
access_token: "${JIRA_ACCESS_TOKEN}"
base_url: "https://company.atlassian.net"
# Optional: Add qualitative insights
analysis:
chatgpt:
enabled: true
api_key: "${OPENAI_API_KEY}"
```
## Story Point Patterns
Configure custom regex patterns to match your team's story point format:
```yaml
story_point_patterns:
- "SP: (\\d+)" # SP: 5
- "\\[([0-9]+) pts\\]" # [3 pts]
- "estimate: (\\d+)" # estimate: 8
```
## Ticket Platform Support
Automatically detects and tracks tickets from multiple PM platforms:
- **JIRA**: `PROJ-123`
- **GitHub Issues**: `#123`, `GH-123`
- **ClickUp**: `CU-abc123`
- **Linear**: `ENG-123`
### Multi-Platform PM Integration
GitFlow Analytics supports multiple project management platforms simultaneously. You can configure one or more platforms based on your team's workflow:
```yaml
# Configure which platforms to track
analysis:
ticket_platforms:
- jira
- linear
- clickup
- github # GitHub Issues
# Platform-specific configuration
pm:
jira:
access_user: "${JIRA_ACCESS_USER}"
access_token: "${JIRA_ACCESS_TOKEN}"
base_url: "https://your-company.atlassian.net"
linear:
api_key: "${LINEAR_API_KEY}"
team_ids: # Optional: filter by team
- "team_123abc"
clickup:
api_token: "${CLICKUP_API_TOKEN}"
workspace_url: "https://app.clickup.com/12345/v/"
# GitHub Issues uses existing GitHub token automatically
github:
token: "${GITHUB_TOKEN}"
```
### Platform Setup Guides
#### JIRA Setup
1. **Get API Token**: Go to [Atlassian API Tokens](https://id.atlassian.com/manage-profile/security/api-tokens)
2. **Required Permissions**: Read access to projects and issues
3. **Configuration**:
```yaml
pm:
jira:
access_user: "${JIRA_ACCESS_USER}" # Your Atlassian email
access_token: "${JIRA_ACCESS_TOKEN}"
base_url: "https://your-company.atlassian.net"
```
#### Linear Setup
1. **Get API Key**: Go to [Linear Settings → API](https://linear.app/settings/api)
2. **Required Permissions**: Read access to issues
3. **Configuration**:
```yaml
pm:
linear:
api_key: "${LINEAR_API_KEY}"
team_ids: ["team_123abc"] # Optional: specify team IDs
```
#### ClickUp Setup
1. **Get API Token**: Go to [ClickUp Settings → Apps](https://app.clickup.com/settings/apps)
2. **Get Workspace URL**: Copy from browser when viewing your workspace
3. **Configuration**:
```yaml
pm:
clickup:
api_token: "${CLICKUP_API_TOKEN}"
workspace_url: "https://app.clickup.com/12345/v/"
```
#### GitHub Issues Setup
GitHub Issues is automatically enabled when GitHub integration is configured. No additional setup required:
```yaml
github:
token: "${GITHUB_TOKEN}" # Same token for repo access and issues
```
### JIRA Story Point Integration
GitFlow Analytics can fetch story points directly from JIRA tickets:
```yaml
jira_integration:
enabled: true
fetch_story_points: true
story_point_fields:
- "Story point estimate" # Your custom field name
- "customfield_10016" # Or use field ID
```
To discover your JIRA story point fields:
```bash
gitflow-analytics discover-storypoint-fields -c config.yaml
```
### Environment Variables for Credentials
Store credentials securely in a `.env` file:
```bash
# .env file (keep this secure and don't commit to git!)
GITHUB_TOKEN=ghp_your_token_here
# PM Platform Credentials
JIRA_ACCESS_USER=your.email@company.com
JIRA_ACCESS_TOKEN=ATATT3xxxxxxxxxxx
LINEAR_API_KEY=lin_api_xxxxxxxxxxxx
CLICKUP_API_TOKEN=pk_xxxxxxxxxxxx
```
## Caching
The tool uses SQLite for intelligent caching:
- Commit analysis results
- Developer identity mappings
- Pull request data
Cache is automatically managed with configurable TTL.
## Developer Identity Resolution
GitFlow Analytics intelligently consolidates developer identities across different email addresses and name variations:
### Automatic Identity Analysis (New!)
Identity analysis now runs **automatically by default** when no manual mappings exist. The system will:
1. **Analyze all developer identities** in your commits
2. **Show suggested consolidations** with a clear preview
3. **Prompt for approval** with a simple Y/n
4. **Update your configuration** automatically
5. **Continue analysis** with consolidated identities
Example of the interactive prompt:
```
🔍 Analyzing developer identities...
⚠️ Found 3 potential identity clusters:
📋 Suggested identity mappings:
john.doe@company.com
→ 123456+johndoe@users.noreply.github.com
→ jdoe@personal.email.com
🤖 Found 2 bot accounts to exclude:
- dependabot[bot]
- renovate[bot]
────────────────────────────────────────────────────────────
Apply these identity mappings to your configuration? [Y/n]:
```
This prompt appears at most once every 7 days.
To skip automatic identity analysis:
```bash
# Simplified syntax (default)
gitflow-analytics -c config.yaml --skip-identity-analysis
# Explicit analyze command
gitflow-analytics analyze -c config.yaml --skip-identity-analysis
```
To manually run identity analysis:
```bash
gitflow-analytics identities -c config.yaml
```
### Smart Identity Matching
The system automatically detects:
- **GitHub noreply emails** (e.g., `150280367+username@users.noreply.github.com`)
- **Name variations** (e.g., "John Doe" vs "John D" vs "jdoe")
- **Common email patterns** across domains
- **Bot accounts** for automatic exclusion
### Manual Configuration
You can also manually configure identity mappings in your YAML:
```yaml
analysis:
identity:
manual_mappings:
- name: "John Doe" # Optional: preferred display name for reports
primary_email: john.doe@company.com
aliases:
- jdoe@personal.email.com
- 123456+johndoe@users.noreply.github.com
- name: "Sarah Smith"
primary_email: sarah.smith@company.com
aliases:
- s.smith@oldcompany.com
```
### Display Name Control
The optional `name` field in manual mappings allows you to control how developer names appear in reports. This is particularly useful for:
- **Standardizing display names** across different email formats
- **Resolving duplicates** when the same person appears with slight name variations
- **Using preferred names** instead of technical email formats
**Example use cases:**
```yaml
analysis:
identity:
manual_mappings:
# Consolidate Austin Zach identities
- name: "Austin Zach"
primary_email: "john.smith@company.com"
aliases:
- "150280367+jsmith@users.noreply.github.com"
- "jsmith-company@users.noreply.github.com"
# Standardize name variations
- name: "John Doe" # Consistent display across all reports
primary_email: "john.doe@company.com"
aliases:
- "johndoe@company.com"
- "j.doe@company.com"
```
Without the `name` field, the system uses the canonical email's associated name, which might not be ideal for reporting.
### Disabling Automatic Analysis
To disable the automatic identity prompt:
```yaml
analysis:
identity:
auto_analysis: false
```
## ML-Enhanced Commit Categorization
GitFlow Analytics includes sophisticated machine learning capabilities for categorizing commits with high accuracy and confidence scoring.
### How It Works
The ML categorization system uses a **hybrid approach** combining:
1. **Semantic Analysis**: Uses spaCy NLP models to understand commit message meaning
2. **File Pattern Recognition**: Analyzes changed files for additional context signals
3. **Rule-based Fallback**: Falls back to traditional regex patterns when ML confidence is low
4. **Confidence Scoring**: Provides confidence metrics for all categorizations
### Categories Detected
The system automatically categorizes commits into:
- **Feature**: New functionality development (`add`, `implement`, `create`)
- **Bug Fix**: Error corrections (`fix`, `resolve`, `correct`)
- **Refactor**: Code restructuring (`refactor`, `optimize`, `improve`)
- **Documentation**: Documentation updates (`docs`, `readme`, `comment`)
- **Maintenance**: Routine upkeep (`update`, `upgrade`, `dependency`)
- **Test**: Testing-related changes (`test`, `spec`, `coverage`)
- **Style**: Formatting changes (`format`, `lint`, `prettier`)
- **Build**: Build system changes (`build`, `ci`, `docker`)
- **Security**: Security-related fixes (`security`, `vulnerability`)
- **Hotfix**: Urgent production fixes (`hotfix`, `critical`, `emergency`)
- **Config**: Configuration changes (`config`, `settings`, `environment`)
### Configuration
```yaml
analysis:
ml_categorization:
# Enable/disable ML categorization (default: true)
enabled: true
# Minimum confidence for ML predictions (0.0-1.0, default: 0.6)
min_confidence: 0.6
# Semantic vs file pattern weighting (default: 0.7 vs 0.3)
semantic_weight: 0.7
file_pattern_weight: 0.3
# Confidence threshold for ML vs rule-based (default: 0.5)
hybrid_threshold: 0.5
# Caching for performance
enable_caching: true
cache_duration_days: 30
# Processing settings
batch_size: 100
```
### Installation Requirements
For ML categorization, install the spaCy English model:
```bash
python -m spacy download en_core_web_sm
```
**Alternative models** (if the default is unavailable):
```bash
# Medium model (more accurate, larger)
python -m spacy download en_core_web_md
# Large model (most accurate, largest)
python -m spacy download en_core_web_lg
```
### Performance Expectations
- **Accuracy**: 85-95% accuracy on typical commit messages
- **Speed**: ~50-100 commits/second with caching enabled
- **Fallback**: Gracefully disables qualitative analysis if spaCy model unavailable (provides helpful error messages)
- **Memory**: ~200MB additional memory usage for spaCy models
### Enhanced Reports
With ML categorization enabled, reports include:
- **Confidence scores** for each categorization
- **Method indicators** (ML, rules, or cached)
- **Alternative predictions** for uncertain cases
- **ML performance statistics** in analysis summaries
### Example Enhanced Output
```csv
commit_hash,category,ml_confidence,ml_method,message
a1b2c3d,feature,0.89,ml,"Add user authentication system"
f6e5d4c,bug_fix,0.92,ml,"Fix memory leak in cache cleanup"
9876543,maintenance,0.74,rules,"Update dependency versions"
```
## Troubleshooting
### YAML Configuration Errors
GitFlow Analytics provides helpful error messages when YAML configuration issues are encountered. Here are common errors and their solutions:
#### Tab Characters Not Allowed
```
❌ YAML configuration error at line 3, column 1:
🚫 Tab characters are not allowed in YAML files!
```
**Fix**: Replace all tabs with spaces (use 2 or 4 spaces for indentation)
- Most editors can show whitespace characters and convert tabs to spaces
- In VS Code: View → Render Whitespace, then Edit → Convert Indentation to Spaces
#### Missing Colons
```
❌ YAML configuration error at line 5, column 10:
🚫 Missing colon (:) after a key name!
```
**Fix**: Add a colon and space after each key name
```yaml
# Correct:
repositories:
- name: my-repo
# Incorrect:
repositories
- name my-repo
```
#### Unclosed Quotes
```
❌ YAML configuration error at line 8, column 15:
🚫 Unclosed quoted string!
```
**Fix**: Ensure all quotes are properly closed
```yaml
# Correct:
token: "my-token-value"
# Incorrect:
token: "my-token-value
```
#### Invalid Indentation
```
❌ YAML configuration error:
🚫 Indentation error or invalid structure!
```
**Fix**: Use consistent indentation (either 2 or 4 spaces)
```yaml
# Correct:
analysis:
exclude:
paths:
- "vendor/**"
# Incorrect:
analysis:
exclude:
paths: # 3 spaces - inconsistent!
- "vendor/**"
```
### Tips for Valid YAML
1. **Use a YAML validator**: Check your configuration with online YAML validators before using
2. **Enable whitespace display**: Make tabs and spaces visible in your editor
3. **Use quotes for special characters**: Wrap values containing `:`, `#`, `@`, etc. in quotes
4. **Consistent indentation**: Pick 2 or 4 spaces and stick to it throughout the file
5. **Check the sample config**: Reference `config-sample.yaml` for proper structure
### Configuration Validation
Beyond YAML syntax, GitFlow Analytics validates:
- Required fields (`repositories` must have `name` and `path`)
- Environment variable resolution
- File path existence
- Valid configuration structure
If you encounter persistent issues, run with `--debug` for detailed error information:
```bash
# Simplified syntax (default)
gitflow-analytics -c config.yaml --debug
# Explicit analyze command
gitflow-analytics analyze -c config.yaml --debug
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "gitflow-analytics",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "git, analytics, productivity, metrics, development",
"author": null,
"author_email": "Bob Matyas <bobmatnyc@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/9f/90/24d6d8c31ae61439b9e258445f7c548f7b7c9591b04e919e086d2e3765f6/gitflow_analytics-3.12.6.tar.gz",
"platform": null,
"description": "# GitFlow Analytics\n\n[](https://badge.fury.io/py/gitflow-analytics)\n[](https://pypi.org/project/gitflow-analytics/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/bobmatnyc/gitflow-analytics/tree/main/docs)\n[](https://github.com/bobmatnyc/gitflow-analytics/actions)\n\nA comprehensive Python package for analyzing Git repositories to generate developer productivity insights without requiring external project management tools. Extract actionable metrics directly from Git history with ML-enhanced commit categorization, automated developer identity resolution, and professional reporting.\n\n## \ud83d\ude80 Key Features\n\n- **\ud83d\udd0d Zero Dependencies**: Analyze productivity without requiring JIRA, Linear, or other PM tools\n- **\ud83e\udde0 ML-Powered Intelligence**: Advanced commit categorization with 85-95% accuracy\n- **\ud83d\udc65 Smart Identity Resolution**: Automatically consolidate developer identities across email addresses\n- **\ud83c\udfe2 Enterprise Ready**: Organization-wide repository discovery with intelligent caching\n- **\ud83d\udcca Professional Reports**: Rich markdown narratives and CSV exports for executive dashboards\n\n## \ud83c\udfaf Quick Start\n\nGet up and running in 5 minutes:\n\n```bash\n# 1. Install GitFlow Analytics\npip install gitflow-analytics\n\n# 2. Install ML dependencies (optional but recommended)\npython -m spacy download en_core_web_sm\n\n# 3. Create a simple configuration\necho 'version: \"1.0\"\ngithub:\n token: \"${GITHUB_TOKEN}\"\n organization: \"your-org\"' > config.yaml\n\n# 4. Set your GitHub token\necho 'GITHUB_TOKEN=ghp_your_token_here' > .env\n\n# 5. Run analysis\ngitflow-analytics -c config.yaml --weeks 8\n```\n\n**What you get:**\n- \ud83d\udcc8 Weekly metrics CSV with developer productivity trends\n- \ud83d\udc65 Developer profiles with project distribution and work styles\n- \ud83d\udd0d Untracked work analysis with ML-powered categorization\n- \ud83d\udccb Executive summary with actionable insights\n- \ud83d\udcca Rich markdown report ready for stakeholders\n\n### Sample Output Preview\n\n```markdown\n## Executive Summary\n- **Total Commits**: 156 across 3 projects\n- **Active Developers**: 5 team members\n- **Ticket Coverage**: 73.2% (industry benchmark: 60-80%)\n- **Top Contributor**: Sarah Chen (32 commits, FRONTEND focus)\n\n## Key Insights\n\ud83c\udfaf **High Productivity**: Team averaged 31 commits/week\n\ud83d\udcca **Balanced Workload**: No single developer >40% of total work\n\u2705 **Good Process**: 73% ticket coverage shows strong tracking\n```\n\n## \u2728 Latest Features (v1.2.x)\n\n- **\ud83d\ude80 Two-Step Processing**: Optimized fetch-then-classify workflow for better performance\n- **\ud83d\udcb0 Cost Tracking**: Monitor LLM API usage with detailed token and cost reporting\n- **\u26a1 Smart Caching**: Intelligent caching reduces analysis time by up to 90%\n- **\ud83d\udd04 Automatic Updates**: Repositories automatically fetch latest commits before analysis\n- **\ud83d\udcca Weekly Trends**: Track classification pattern changes over time\n- **\ud83c\udfaf Enhanced Categorization**: All commits properly categorized with confidence scores\n\n## \ud83d\udd25 Core Capabilities\n\n**\ud83d\udcca Analysis & Insights**\n- Multi-repository analysis with intelligent project grouping\n- ML-enhanced commit categorization (85-95% accuracy)\n- Developer productivity metrics and work pattern analysis\n- Story point extraction from commits and PRs\n- Ticket tracking across JIRA, GitHub, ClickUp, and Linear\n\n**\ud83c\udfe2 Enterprise Features**\n- Organization-wide repository discovery from GitHub\n- Automated developer identity resolution and consolidation\n- Database-backed caching for sub-second report generation\n- Data anonymization for secure external sharing\n- Batch processing optimized for large repositories\n\n**\ud83d\udcc8 Professional Reporting**\n- Rich markdown narratives with executive summaries\n- Weekly CSV exports with trend analysis\n- Customizable output formats and filtering\n- Performance benchmarking and team comparisons\n\n## \ud83d\udcda Documentation\n\nComprehensive guides for every use case:\n\n| **Getting Started** | **Advanced Usage** | **Integration** |\n|-------------------|------------------|---------------|\n| [Installation](docs/getting-started/installation.md) | [Complete Configuration](docs/guides/configuration.md) | [CLI Reference](docs/reference/cli-commands.md) |\n| [5-Minute Tutorial](docs/getting-started/quickstart.md) | [ML Categorization](docs/guides/ml-categorization.md) | [JSON Export Schema](docs/reference/json-export-schema.md) |\n| [First Analysis](docs/getting-started/first-analysis.md) | [Enterprise Setup](docs/examples/enterprise-setup.md) | [CI Integration](docs/examples/ci-integration.md) |\n\n**\ud83c\udfaf Quick Links:**\n- \ud83d\udcd6 [**Documentation Hub**](docs/README.md) - Complete guide index\n- \ud83d\ude80 [**Quick Start**](docs/getting-started/quickstart.md) - Get running in 5 minutes\n- \u2699\ufe0f [**Configuration**](docs/guides/configuration.md) - Full reference\n- \ud83e\udd1d [**Contributing**](docs/developer/contributing.md) - Join the project\n\n## \u26a1 Installation Options\n\n### Standard Installation\n```bash\npip install gitflow-analytics\n```\n\n### With ML Enhancement (Recommended)\n```bash\npip install gitflow-analytics\npython -m spacy download en_core_web_sm\n```\n\n### Development Installation\n```bash\ngit clone https://github.com/bobmatnyc/gitflow-analytics.git\ncd gitflow-analytics\npip install -e \".[dev]\"\npython -m spacy download en_core_web_sm\n```\n\n## \ud83d\udd27 Configuration\n\n### Option 1: Organization Analysis (Recommended)\n```yaml\n# config.yaml\nversion: \"1.0\"\ngithub:\n token: \"${GITHUB_TOKEN}\"\n organization: \"your-org\" # Auto-discovers all repositories\n\nanalysis:\n ml_categorization:\n enabled: true\n min_confidence: 0.7\n```\n\n### Option 2: Specific Repositories\n```yaml\n# config.yaml \nversion: \"1.0\"\ngithub:\n token: \"${GITHUB_TOKEN}\"\n \nrepositories:\n - name: \"my-app\"\n path: \"~/code/my-app\"\n github_repo: \"myorg/my-app\"\n project_key: \"APP\"\n```\n\n### Environment Setup\n```bash\n# .env (same directory as config.yaml)\nGITHUB_TOKEN=ghp_your_token_here\n```\n\n### Run Analysis\n```bash\n# Analyze last 8 weeks\ngitflow-analytics -c config.yaml --weeks 8\n\n# With custom output directory\ngitflow-analytics -c config.yaml --weeks 8 --output ./reports\n```\n\n> \ud83d\udca1 **Need more configuration options?** See the [Complete Configuration Guide](docs/guides/configuration.md) for advanced features, integrations, and customization.\n\n## \ud83c\udfaf Excluding Merge Commits from Metrics\n\nGitFlow Analytics can exclude merge commits from filtered line count calculations, following DORA metrics best practices.\n\n### Why Exclude Merge Commits?\n\nMerge commits represent repository management, not original development work:\n- **Average merge commit**: 236.6 filtered lines vs 30.8 for regular commits (7.7x higher)\n- Merge commits can **skew productivity metrics** and velocity calculations\n- **DORA metrics best practice**: Focus on original development work, not repository management\n\n### Configuration\n\nAdd this setting to your analysis configuration:\n\n```yaml\nanalysis:\n # Exclude merge commits from filtered line counts (DORA metrics best practice)\n exclude_merge_commits: true # Default: false\n```\n\n### Impact Example\n\nReal metrics from EWTN dataset analysis:\n\n| Metric | With Merge Commits | Without Merge Commits | Change |\n|--------|-------------------|----------------------|--------|\n| **Total Filtered Lines** | 138,730 | 54,808 | -60% |\n| **Merge Commits** | 355 commits | 355 commits | (excluded from line counts) |\n| **Regular Commits** | 1,426 commits | 1,426 commits | (unchanged) |\n\n### What Gets Excluded?\n\nWhen `exclude_merge_commits: true`:\n\n\u2705 **Filtered Stats**: Merge commits (2+ parents) have `filtered_insertions = 0` and `filtered_deletions = 0`\n\u2705 **Raw Stats**: Always preserved for all commits (accurate commit counts)\n\u2705 **Reports**: Line count metrics reflect only original development work\n\n\u274c **Not affected**: Commit counts, developer activity tracking, ticket references\n\n### When to Use\n\n**\u2705 Enable when:**\n- You want DORA-compliant metrics for productivity tracking\n- Your workflow uses merge commits for pull requests\n- You need accurate developer velocity without repository overhead\n- You're comparing metrics across teams with different merge strategies\n\n**\u274c Disable when:**\n- You want to track all repository activity including management overhead\n- Merge commits represent significant manual conflict resolution in your workflow\n- You're analyzing repositories without merge-heavy workflows\n- You need to measure total repository churn including merges\n\n### Example Configuration\n\n```yaml\n# Full configuration example\nanalysis:\n weeks_back: 8\n include_weekends: true\n\n # DORA-compliant metrics: exclude merge commits\n exclude_merge_commits: true\n\n # Analyze ALL branches to capture feature branch work\n branch_patterns:\n - \"*\" # Include all branches (feature, develop, hotfix, etc.)\n```\n\n> \ud83d\udca1 **Pro Tip**: Combine `exclude_merge_commits: true` with `branch_patterns: [\"*\"]` to analyze all development work without merge overhead.\n\n## \ud83d\udcca Generated Reports\n\nGitFlow Analytics generates comprehensive reports for different audiences:\n\n### \ud83d\udcc8 CSV Data Files\n- **weekly_metrics.csv** - Developer productivity trends by week\n- **weekly_velocity.csv** - Lines-per-story-point velocity analysis\n- **developers.csv** - Complete team profiles and statistics \n- **summary.csv** - Project-wide statistics and benchmarks\n- **untracked_commits.csv** - ML-categorized uncommitted work analysis\n\n### \ud83d\udccb Executive Reports\n- **narrative_summary.md** - Rich markdown report with:\n - Executive summary with key metrics\n - Team composition and work distribution \n - Project activity breakdown\n - Development patterns and recommendations\n - Weekly trend analysis\n\n### Sample Executive Summary\n```markdown\n## Executive Summary\n- **Total Commits**: 324 commits across 4 projects\n- **Active Developers**: 8 team members \n- **Ticket Coverage**: 78.4% (above industry benchmark)\n- **Top Areas**: Frontend (45%), API (32%), Infrastructure (23%)\n\n## Key Insights \n\u2705 **Strong Process Adherence**: 78% ticket coverage\n\ud83c\udfaf **Balanced Team**: No developer >35% of total work\n\ud83d\udcc8 **Growth Trend**: +15% productivity vs last quarter\n```\n\n## \ud83d\udee0\ufe0f Common Use Cases\n\n**\ud83d\udc65 Team Lead Dashboard**\n- Track individual developer productivity and growth\n- Identify workload distribution and potential burnout\n- Monitor code quality trends and technical debt\n\n**\ud83d\udcc8 Engineering Management** \n- Generate executive reports on team velocity\n- Analyze process adherence and ticket coverage\n- Benchmark performance across projects and quarters\n\n**\ud83d\udd0d Process Optimization**\n- Identify untracked work patterns that should be formalized\n- Optimize developer focus and reduce context switching \n- Improve estimation accuracy with historical data\n\n**\ud83c\udfe2 Enterprise Analytics**\n- Organization-wide repository analysis across dozens of projects\n- Automated identity resolution for large, distributed teams\n- Cost-effective analysis without expensive PM tool dependencies\n\n## Command Line Interface\n\n### Main Commands\n\n```bash\n# Analyze repositories (default command)\ngitflow-analytics -c config.yaml --weeks 12 --output ./reports\n\n# Explicit analyze command (backward compatibility)\ngitflow-analytics analyze -c config.yaml --weeks 12 --output ./reports\n\n# Show cache statistics\ngitflow-analytics cache-stats -c config.yaml\n\n# List known developers\ngitflow-analytics list-developers -c config.yaml\n\n# Analyze developer identities\ngitflow-analytics identities -c config.yaml\n\n# Merge developer identities\ngitflow-analytics merge-identity -c config.yaml dev1_id dev2_id\n\n# Discover story point fields in your PM platform\ngitflow-analytics discover-storypoint-fields -c config.yaml\n```\n\n### Options\n\n- `--weeks, -w`: Number of weeks to analyze (default: 12)\n- `--output, -o`: Output directory for reports (default: ./reports)\n- `--anonymize`: Anonymize developer information\n- `--no-cache`: Disable caching for fresh analysis\n- `--clear-cache`: Clear cache before analysis\n- `--validate-only`: Validate configuration without running\n- `--skip-identity-analysis`: Skip automatic identity analysis\n- `--apply-identity-suggestions`: Apply identity suggestions without prompting\n\n## Complete Configuration Example\n\nHere's a complete example showing `.env` file and corresponding YAML configuration:\n\n### `.env` file\n```bash\n# GitHub Configuration\nGITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx\nGITHUB_ORG=your-organization\n\n# PM Platform Configuration\nJIRA_ACCESS_USER=developer@company.com\nJIRA_ACCESS_TOKEN=ATATT3xxxxxxxxxxx\nLINEAR_API_KEY=lin_api_xxxxxxxxxxxx\nCLICKUP_API_TOKEN=pk_xxxxxxxxxxxx\n\n# Note: GitHub Issues uses GITHUB_TOKEN automatically\n```\n\n### `config.yaml` file\n```yaml\nversion: \"1.0\"\n\n# GitHub configuration with organization discovery\ngithub:\n token: \"${GITHUB_TOKEN}\"\n organization: \"${GITHUB_ORG}\"\n\n# Multi-platform PM integration\npm:\n jira:\n access_user: \"${JIRA_ACCESS_USER}\"\n access_token: \"${JIRA_ACCESS_TOKEN}\"\n base_url: \"https://company.atlassian.net\"\n\n linear:\n api_key: \"${LINEAR_API_KEY}\"\n team_ids: [\"team_123abc\"] # Optional: filter by specific teams\n\n clickup:\n api_token: \"${CLICKUP_API_TOKEN}\"\n workspace_url: \"https://app.clickup.com/12345/v/\"\n\n# JIRA story point integration (optional)\njira_integration:\n enabled: true\n fetch_story_points: true\n story_point_fields:\n - \"Story point estimate\" # Your field name\n - \"customfield_10016\" # Fallback field ID\n\n# Analysis configuration\nanalysis:\n # Track tickets from all configured platforms\n ticket_platforms:\n - jira\n - linear\n - clickup\n - github # GitHub Issues (uses GITHUB_TOKEN)\n \n # Exclude bot commits and boilerplate files\n exclude:\n authors:\n - \"dependabot[bot]\"\n - \"renovate[bot]\"\n paths:\n - \"**/node_modules/**\"\n - \"**/*.min.js\"\n - \"**/package-lock.json\"\n \n # Developer identity consolidation\n identity:\n similarity_threshold: 0.85\n manual_mappings:\n - name: \"John Doe\"\n primary_email: \"john.doe@company.com\"\n aliases:\n - \"jdoe@oldcompany.com\"\n - \"john@personal.com\"\n\n# Output configuration\noutput:\n directory: \"./reports\"\n formats:\n - csv\n - markdown\n```\n\n## Output Reports\n\nThe tool generates comprehensive CSV reports and markdown summaries:\n\n### CSV Reports\n\n1. **Weekly Metrics** (`weekly_metrics_YYYYMMDD.csv`)\n - Week-by-week developer productivity\n - Story points, commits, lines changed\n - Ticket coverage percentages\n - Per-project breakdown\n\n2. **Weekly Velocity** (`weekly_velocity_YYYYMMDD.csv`)\n - Lines of code per story point analysis\n - Efficiency trends and velocity patterns\n - PR-based vs commit-based story points breakdown\n - Team velocity benchmarking and week-over-week trends\n\n3. **Summary Statistics** (`summary_YYYYMMDD.csv`)\n - Overall project statistics\n - Platform-specific ticket counts\n - Top contributors\n\n4. **Developer Report** (`developers_YYYYMMDD.csv`)\n - Complete developer profiles\n - Total contributions\n - Identity aliases\n\n5. **Untracked Commits Report** (`untracked_commits_YYYYMMDD.csv`)\n - Detailed analysis of commits without ticket references\n - Commit categorization (bug_fix, feature, refactor, documentation, maintenance, test, style, build)\n - Enhanced metadata: commit hash, author, timestamp, project, message, file/line changes\n - Configurable file change threshold for filtering significant commits\n\n### Enhanced Untracked Commit Analysis\n\nThe untracked commits report provides deep insights into work that bypasses ticket tracking:\n\n**CSV Columns:**\n- `commit_hash` / `short_hash`: Full and abbreviated commit identifiers\n- `author` / `author_email` / `canonical_id`: Developer identification (with anonymization support)\n- `date`: Commit timestamp\n- `project`: Project key for multi-repository analysis\n- `message`: Commit message (truncated for readability)\n- `category`: Automated categorization of work type\n- `files_changed` / `lines_added` / `lines_removed` / `lines_changed`: Change metrics\n- `is_merge`: Boolean flag for merge commits\n\n**Automatic Categorization:**\n- **Feature**: New functionality development (`add`, `new`, `implement`, `create`)\n- **Bug Fix**: Error corrections (`fix`, `bug`, `error`, `resolve`, `hotfix`)\n- **Refactor**: Code restructuring (`refactor`, `optimize`, `improve`, `cleanup`)\n- **Documentation**: Documentation updates (`doc`, `readme`, `comment`, `guide`)\n- **Maintenance**: Routine upkeep (`update`, `upgrade`, `dependency`, `config`)\n- **Test**: Testing-related changes (`test`, `spec`, `mock`, `fixture`)\n- **Style**: Formatting changes (`format`, `lint`, `prettier`, `whitespace`)\n- **Build**: Build system changes (`build`, `compile`, `ci`, `docker`)\n\n### Markdown Reports\n\n5. **Narrative Summary** (`narrative_summary_YYYYMMDD.md`)\n - **Executive Summary**: High-level metrics and team overview\n - **Team Composition**: Developer profiles with project percentages and work patterns\n - **Project Activity**: Detailed breakdown by project with contributor percentages and **commit classifications**\n - **Development Patterns**: Key insights from productivity and collaboration analysis\n - **Pull Request Analysis**: PR metrics including size, lifetime, and review activity\n - **Weekly Trends** (v1.1.0+): Week-over-week changes in classification patterns\n\n6. **Database-Backed Qualitative Report** (`database_qualitative_report_YYYYMMDD.md`) (v1.1.0+)\n - Generated directly from SQLite storage for fast retrieval\n - Includes weekly trend analysis per developer/project\n - Shows classification changes over time (e.g., \"Features: +15%, Bug Fixes: -5%\")\n - **Issue Tracking**: Platform usage and coverage analysis with simplified display\n - **Enhanced Untracked Work Analysis**: Comprehensive categorization with dual percentage metrics\n - **PM Platform Integration**: Story point tracking and correlation insights (when available)\n - **Recommendations**: Actionable insights based on analysis patterns\n\n### Enhanced Narrative Report Sections\n\nThe narrative report provides comprehensive insights through multiple detailed sections:\n\n#### Team Composition Section\n- **Developer Profiles**: Individual developer statistics with commit counts\n- **Project Distribution**: Shows ALL projects each developer works on with precise percentages\n- **Work Style Classification**: Categorizes developers as \"Focused\", \"Multi-project\", or \"Highly Focused\"\n- **Activity Patterns**: Identifies time patterns like \"Standard Hours\" or \"Extended Hours\"\n\n**Example developer profile:**\n```markdown\n**John Developer**\n- Commits: 15\n- Projects: FRONTEND (85.0%), SERVICE_TS (15.0%)\n- Work Style: Focused\n- Active Pattern: Standard Hours\n```\n\n#### Project Activity Section\n- **Activity by Project**: Commits and percentage of total activity per project\n- **Contributor Breakdown**: Shows each developer's contribution percentage within each project\n- **Lines Changed**: Quantifies the scale of changes per project\n\n#### Issue Tracking with Simplified Display\n- **Platform Usage**: Clean display of ticket platform distribution (JIRA, GitHub, etc.)\n- **Coverage Analysis**: Percentage of commits that reference tickets\n- **Enhanced Untracked Work Analysis**: Detailed categorization and recommendations\n\n### Interpreting Dual Percentage Metrics\n\nThe enhanced untracked work analysis provides two key percentage metrics for better context:\n\n1. **Percentage of Total Untracked Work**: Shows how much each developer contributes to the overall untracked work pool\n2. **Percentage of Developer's Individual Work**: Shows what proportion of a specific developer's commits are untracked\n\n**Example interpretation:**\n```\n- John Doe: 25 commits (40% of untracked, 15% of their work) - maintenance, style\n```\n\nThis means:\n- John contributed 25 untracked commits\n- These represent 40% of all untracked commits in the analysis period \n- Only 15% of John's total work was untracked (85% was properly tracked)\n- Most untracked work was maintenance and style changes (acceptable categories)\n\n**Process Insights:**\n- High \"% of untracked\" + low \"% of their work\" = Developer doing most of the acceptable maintenance work\n- Low \"% of untracked\" + high \"% of their work\" = Developer needs process guidance\n- High percentages in feature/bug_fix categories = Process improvement opportunity\n\n### Example Report Outputs\n\n#### Untracked Commits CSV Sample\n```csv\ncommit_hash,short_hash,author,author_email,canonical_id,date,project,message,category,files_changed,lines_added,lines_removed,lines_changed,is_merge\na1b2c3d4e5f6...,a1b2c3d,John Doe,john@company.com,ID0001,2024-01-15 14:30:22,FRONTEND,Update dependency versions for security patches,maintenance,2,45,12,57,false\nf6e5d4c3b2a1...,f6e5d4c,Jane Smith,jane@company.com,ID0002,2024-01-15 09:15:10,BACKEND,Fix typo in error message,bug_fix,1,1,1,2,false\n9876543210ab...,9876543,Bob Wilson,bob@company.com,ID0003,2024-01-14 16:45:33,FRONTEND,Add JSDoc comments to utility functions,documentation,3,28,0,28,false\n```\n\n#### Complete Narrative Report Sample\n```markdown\n# GitFlow Analytics Report\n\n**Generated**: 2025-08-04 14:27:47\n**Analysis Period**: Last 4 weeks\n\n## Executive Summary\n\n- **Total Commits**: 35\n- **Active Developers**: 3\n- **Lines Changed**: 910\n- **Ticket Coverage**: 71.4%\n- **Active Projects**: FRONTEND, SERVICE_TS, SERVICES\n- **Top Contributor**: John Developer with 15 commits\n\n## Team Composition\n\n### Developer Profiles\n\n**John Developer**\n- Commits: 15\n- Projects: FRONTEND (85.0%), SERVICE_TS (15.0%)\n- Work Style: Focused\n- Active Pattern: Standard Hours\n\n**Jane Smith**\n- Commits: 12\n- Projects: SERVICE_TS (70.0%), FRONTEND (30.0%)\n- Work Style: Multi-project\n- Active Pattern: Extended Hours\n\n## Project Activity\n\n### Activity by Project\n\n**FRONTEND**\n- Commits: 14 (50.0% of total)\n- Lines Changed: 450\n- Contributors: John Developer (71.4%), Jane Smith (28.6%)\n\n**SERVICE_TS**\n- Commits: 8 (28.6% of total)\n- Lines Changed: 280\n- Contributors: Jane Smith (100.0%)\n\n## Issue Tracking\n\n### Platform Usage\n\n- **Jira**: 15 tickets (60.0%)\n- **Github**: 8 tickets (32.0%)\n- **Clickup**: 2 tickets (8.0%)\n\n### Untracked Work Analysis\n\n**Summary**: 10 commits (28.6% of total) lack ticket references.\n\n#### Work Categories\n\n- **Maintenance**: 4 commits (40.0%), avg 23 lines *(acceptable untracked)*\n- **Bug Fix**: 3 commits (30.0%), avg 15 lines *(should be tracked)*\n- **Documentation**: 2 commits (20.0%), avg 12 lines *(acceptable untracked)*\n\n#### Top Contributors (Untracked Work)\n\n- **John Developer**: 1 commits (50.0% of untracked, 6.7% of their work) - *refactor*\n- **Jane Smith**: 1 commits (50.0% of untracked, 8.3% of their work) - *style*\n\n#### Recommendations for Untracked Work\n\n\ud83c\udfaf **Excellent tracking**: Less than 20% of commits are untracked - the team shows strong process adherence.\n\n## Recommendations\n\n\u2705 The team shows healthy development patterns. Continue current practices while monitoring for changes.\n```\n\n### Configuration for Enhanced Narrative Reports\n\nThe narrative reports automatically include all available sections based on your configuration and data availability:\n\n**Always Generated:**\n- Executive Summary, Team Composition, Project Activity, Development Patterns, Issue Tracking, Recommendations\n\n**Conditionally Generated:**\n- **Pull Request Analysis**: Requires GitHub integration with PR data\n- **PM Platform Integration**: Requires JIRA or other PM platform configuration\n- **Qualitative Analysis**: Requires ChatGPT integration setup\n\n**Customizing Report Content:**\n```yaml\n# config.yaml\noutput:\n formats:\n - csv\n - markdown # Enables narrative report generation\n \n# Optional: Enhance narrative reports with additional data\njira:\n access_user: \"${JIRA_ACCESS_USER}\"\n access_token: \"${JIRA_ACCESS_TOKEN}\"\n base_url: \"https://company.atlassian.net\"\n\n# Optional: Add qualitative insights\nanalysis:\n chatgpt:\n enabled: true\n api_key: \"${OPENAI_API_KEY}\"\n```\n\n## Story Point Patterns\n\nConfigure custom regex patterns to match your team's story point format:\n\n```yaml\nstory_point_patterns:\n - \"SP: (\\\\d+)\" # SP: 5\n - \"\\\\[([0-9]+) pts\\\\]\" # [3 pts]\n - \"estimate: (\\\\d+)\" # estimate: 8\n```\n\n## Ticket Platform Support\n\nAutomatically detects and tracks tickets from multiple PM platforms:\n- **JIRA**: `PROJ-123`\n- **GitHub Issues**: `#123`, `GH-123`\n- **ClickUp**: `CU-abc123`\n- **Linear**: `ENG-123`\n\n### Multi-Platform PM Integration\n\nGitFlow Analytics supports multiple project management platforms simultaneously. You can configure one or more platforms based on your team's workflow:\n\n```yaml\n# Configure which platforms to track\nanalysis:\n ticket_platforms:\n - jira\n - linear\n - clickup\n - github # GitHub Issues\n\n# Platform-specific configuration\npm:\n jira:\n access_user: \"${JIRA_ACCESS_USER}\"\n access_token: \"${JIRA_ACCESS_TOKEN}\"\n base_url: \"https://your-company.atlassian.net\"\n\n linear:\n api_key: \"${LINEAR_API_KEY}\"\n team_ids: # Optional: filter by team\n - \"team_123abc\"\n\n clickup:\n api_token: \"${CLICKUP_API_TOKEN}\"\n workspace_url: \"https://app.clickup.com/12345/v/\"\n\n# GitHub Issues uses existing GitHub token automatically\ngithub:\n token: \"${GITHUB_TOKEN}\"\n```\n\n### Platform Setup Guides\n\n#### JIRA Setup\n1. **Get API Token**: Go to [Atlassian API Tokens](https://id.atlassian.com/manage-profile/security/api-tokens)\n2. **Required Permissions**: Read access to projects and issues\n3. **Configuration**:\n ```yaml\n pm:\n jira:\n access_user: \"${JIRA_ACCESS_USER}\" # Your Atlassian email\n access_token: \"${JIRA_ACCESS_TOKEN}\"\n base_url: \"https://your-company.atlassian.net\"\n ```\n\n#### Linear Setup\n1. **Get API Key**: Go to [Linear Settings \u2192 API](https://linear.app/settings/api)\n2. **Required Permissions**: Read access to issues\n3. **Configuration**:\n ```yaml\n pm:\n linear:\n api_key: \"${LINEAR_API_KEY}\"\n team_ids: [\"team_123abc\"] # Optional: specify team IDs\n ```\n\n#### ClickUp Setup\n1. **Get API Token**: Go to [ClickUp Settings \u2192 Apps](https://app.clickup.com/settings/apps)\n2. **Get Workspace URL**: Copy from browser when viewing your workspace\n3. **Configuration**:\n ```yaml\n pm:\n clickup:\n api_token: \"${CLICKUP_API_TOKEN}\"\n workspace_url: \"https://app.clickup.com/12345/v/\"\n ```\n\n#### GitHub Issues Setup\nGitHub Issues is automatically enabled when GitHub integration is configured. No additional setup required:\n```yaml\ngithub:\n token: \"${GITHUB_TOKEN}\" # Same token for repo access and issues\n```\n\n### JIRA Story Point Integration\n\nGitFlow Analytics can fetch story points directly from JIRA tickets:\n\n```yaml\njira_integration:\n enabled: true\n fetch_story_points: true\n story_point_fields:\n - \"Story point estimate\" # Your custom field name\n - \"customfield_10016\" # Or use field ID\n```\n\nTo discover your JIRA story point fields:\n```bash\ngitflow-analytics discover-storypoint-fields -c config.yaml\n```\n\n### Environment Variables for Credentials\n\nStore credentials securely in a `.env` file:\n\n```bash\n# .env file (keep this secure and don't commit to git!)\nGITHUB_TOKEN=ghp_your_token_here\n\n# PM Platform Credentials\nJIRA_ACCESS_USER=your.email@company.com\nJIRA_ACCESS_TOKEN=ATATT3xxxxxxxxxxx\nLINEAR_API_KEY=lin_api_xxxxxxxxxxxx\nCLICKUP_API_TOKEN=pk_xxxxxxxxxxxx\n```\n\n## Caching\n\nThe tool uses SQLite for intelligent caching:\n- Commit analysis results\n- Developer identity mappings\n- Pull request data\n\nCache is automatically managed with configurable TTL.\n\n## Developer Identity Resolution\n\nGitFlow Analytics intelligently consolidates developer identities across different email addresses and name variations:\n\n### Automatic Identity Analysis (New!)\n\nIdentity analysis now runs **automatically by default** when no manual mappings exist. The system will:\n\n1. **Analyze all developer identities** in your commits\n2. **Show suggested consolidations** with a clear preview\n3. **Prompt for approval** with a simple Y/n\n4. **Update your configuration** automatically\n5. **Continue analysis** with consolidated identities\n\nExample of the interactive prompt:\n```\n\ud83d\udd0d Analyzing developer identities...\n\n\u26a0\ufe0f Found 3 potential identity clusters:\n\n\ud83d\udccb Suggested identity mappings:\n john.doe@company.com\n \u2192 123456+johndoe@users.noreply.github.com\n \u2192 jdoe@personal.email.com\n\n\ud83e\udd16 Found 2 bot accounts to exclude:\n - dependabot[bot]\n - renovate[bot]\n\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nApply these identity mappings to your configuration? [Y/n]: \n```\n\nThis prompt appears at most once every 7 days. \n\nTo skip automatic identity analysis:\n```bash\n# Simplified syntax (default)\ngitflow-analytics -c config.yaml --skip-identity-analysis\n\n# Explicit analyze command\ngitflow-analytics analyze -c config.yaml --skip-identity-analysis\n```\n\nTo manually run identity analysis:\n```bash\ngitflow-analytics identities -c config.yaml\n```\n\n### Smart Identity Matching\n\nThe system automatically detects:\n- **GitHub noreply emails** (e.g., `150280367+username@users.noreply.github.com`)\n- **Name variations** (e.g., \"John Doe\" vs \"John D\" vs \"jdoe\")\n- **Common email patterns** across domains\n- **Bot accounts** for automatic exclusion\n\n### Manual Configuration\n\nYou can also manually configure identity mappings in your YAML:\n\n```yaml\nanalysis:\n identity:\n manual_mappings:\n - name: \"John Doe\" # Optional: preferred display name for reports\n primary_email: john.doe@company.com\n aliases:\n - jdoe@personal.email.com\n - 123456+johndoe@users.noreply.github.com\n - name: \"Sarah Smith\"\n primary_email: sarah.smith@company.com\n aliases:\n - s.smith@oldcompany.com\n```\n\n### Display Name Control\n\nThe optional `name` field in manual mappings allows you to control how developer names appear in reports. This is particularly useful for:\n\n- **Standardizing display names** across different email formats\n- **Resolving duplicates** when the same person appears with slight name variations\n- **Using preferred names** instead of technical email formats\n\n**Example use cases:**\n```yaml\nanalysis:\n identity:\n manual_mappings:\n # Consolidate Austin Zach identities\n - name: \"Austin Zach\"\n primary_email: \"john.smith@company.com\"\n aliases:\n - \"150280367+jsmith@users.noreply.github.com\"\n - \"jsmith-company@users.noreply.github.com\"\n \n # Standardize name variations\n - name: \"John Doe\" # Consistent display across all reports\n primary_email: \"john.doe@company.com\"\n aliases:\n - \"johndoe@company.com\"\n - \"j.doe@company.com\"\n```\n\nWithout the `name` field, the system uses the canonical email's associated name, which might not be ideal for reporting.\n\n### Disabling Automatic Analysis\n\nTo disable the automatic identity prompt:\n```yaml\nanalysis:\n identity:\n auto_analysis: false\n```\n\n## ML-Enhanced Commit Categorization\n\nGitFlow Analytics includes sophisticated machine learning capabilities for categorizing commits with high accuracy and confidence scoring.\n\n### How It Works\n\nThe ML categorization system uses a **hybrid approach** combining:\n\n1. **Semantic Analysis**: Uses spaCy NLP models to understand commit message meaning\n2. **File Pattern Recognition**: Analyzes changed files for additional context signals \n3. **Rule-based Fallback**: Falls back to traditional regex patterns when ML confidence is low\n4. **Confidence Scoring**: Provides confidence metrics for all categorizations\n\n### Categories Detected\n\nThe system automatically categorizes commits into:\n\n- **Feature**: New functionality development (`add`, `implement`, `create`)\n- **Bug Fix**: Error corrections (`fix`, `resolve`, `correct`)\n- **Refactor**: Code restructuring (`refactor`, `optimize`, `improve`) \n- **Documentation**: Documentation updates (`docs`, `readme`, `comment`)\n- **Maintenance**: Routine upkeep (`update`, `upgrade`, `dependency`)\n- **Test**: Testing-related changes (`test`, `spec`, `coverage`)\n- **Style**: Formatting changes (`format`, `lint`, `prettier`)\n- **Build**: Build system changes (`build`, `ci`, `docker`)\n- **Security**: Security-related fixes (`security`, `vulnerability`)\n- **Hotfix**: Urgent production fixes (`hotfix`, `critical`, `emergency`)\n- **Config**: Configuration changes (`config`, `settings`, `environment`)\n\n### Configuration\n\n```yaml\nanalysis:\n ml_categorization:\n # Enable/disable ML categorization (default: true)\n enabled: true\n \n # Minimum confidence for ML predictions (0.0-1.0, default: 0.6)\n min_confidence: 0.6\n \n # Semantic vs file pattern weighting (default: 0.7 vs 0.3)\n semantic_weight: 0.7\n file_pattern_weight: 0.3\n \n # Confidence threshold for ML vs rule-based (default: 0.5)\n hybrid_threshold: 0.5\n \n # Caching for performance\n enable_caching: true\n cache_duration_days: 30\n \n # Processing settings\n batch_size: 100\n```\n\n### Installation Requirements\n\nFor ML categorization, install the spaCy English model:\n\n```bash\npython -m spacy download en_core_web_sm\n```\n\n**Alternative models** (if the default is unavailable):\n```bash\n# Medium model (more accurate, larger)\npython -m spacy download en_core_web_md\n\n# Large model (most accurate, largest)\npython -m spacy download en_core_web_lg\n```\n\n### Performance Expectations\n\n- **Accuracy**: 85-95% accuracy on typical commit messages\n- **Speed**: ~50-100 commits/second with caching enabled\n- **Fallback**: Gracefully disables qualitative analysis if spaCy model unavailable (provides helpful error messages)\n- **Memory**: ~200MB additional memory usage for spaCy models\n\n### Enhanced Reports\n\nWith ML categorization enabled, reports include:\n\n- **Confidence scores** for each categorization\n- **Method indicators** (ML, rules, or cached)\n- **Alternative predictions** for uncertain cases\n- **ML performance statistics** in analysis summaries\n\n### Example Enhanced Output\n\n```csv\ncommit_hash,category,ml_confidence,ml_method,message\na1b2c3d,feature,0.89,ml,\"Add user authentication system\" \nf6e5d4c,bug_fix,0.92,ml,\"Fix memory leak in cache cleanup\"\n9876543,maintenance,0.74,rules,\"Update dependency versions\"\n```\n\n## Troubleshooting\n\n### YAML Configuration Errors\n\nGitFlow Analytics provides helpful error messages when YAML configuration issues are encountered. Here are common errors and their solutions:\n\n#### Tab Characters Not Allowed\n```\n\u274c YAML configuration error at line 3, column 1:\n\ud83d\udeab Tab characters are not allowed in YAML files!\n```\n**Fix**: Replace all tabs with spaces (use 2 or 4 spaces for indentation)\n- Most editors can show whitespace characters and convert tabs to spaces\n- In VS Code: View \u2192 Render Whitespace, then Edit \u2192 Convert Indentation to Spaces\n\n#### Missing Colons\n```\n\u274c YAML configuration error at line 5, column 10:\n\ud83d\udeab Missing colon (:) after a key name!\n```\n**Fix**: Add a colon and space after each key name\n```yaml\n# Correct:\nrepositories:\n - name: my-repo\n \n# Incorrect:\nrepositories\n - name my-repo\n```\n\n#### Unclosed Quotes\n```\n\u274c YAML configuration error at line 8, column 15:\n\ud83d\udeab Unclosed quoted string!\n```\n**Fix**: Ensure all quotes are properly closed\n```yaml\n# Correct:\ntoken: \"my-token-value\"\n\n# Incorrect:\ntoken: \"my-token-value\n```\n\n#### Invalid Indentation\n```\n\u274c YAML configuration error:\n\ud83d\udeab Indentation error or invalid structure!\n```\n**Fix**: Use consistent indentation (either 2 or 4 spaces)\n```yaml\n# Correct:\nanalysis:\n exclude:\n paths:\n - \"vendor/**\"\n \n# Incorrect:\nanalysis:\n exclude:\n paths: # 3 spaces - inconsistent!\n - \"vendor/**\"\n```\n\n### Tips for Valid YAML\n\n1. **Use a YAML validator**: Check your configuration with online YAML validators before using\n2. **Enable whitespace display**: Make tabs and spaces visible in your editor\n3. **Use quotes for special characters**: Wrap values containing `:`, `#`, `@`, etc. in quotes\n4. **Consistent indentation**: Pick 2 or 4 spaces and stick to it throughout the file\n5. **Check the sample config**: Reference `config-sample.yaml` for proper structure\n\n### Configuration Validation\n\nBeyond YAML syntax, GitFlow Analytics validates:\n- Required fields (`repositories` must have `name` and `path`)\n- Environment variable resolution\n- File path existence\n- Valid configuration structure\n\nIf you encounter persistent issues, run with `--debug` for detailed error information:\n```bash\n# Simplified syntax (default)\ngitflow-analytics -c config.yaml --debug\n\n# Explicit analyze command\ngitflow-analytics analyze -c config.yaml --debug\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Analyze Git repositories for developer productivity insights",
"version": "3.12.6",
"project_urls": {
"Documentation": "https://github.com/bobmatnyc/gitflow-analytics/blob/main/README.md",
"Homepage": "https://github.com/bobmatnyc/gitflow-analytics",
"Issues": "https://github.com/bobmatnyc/gitflow-analytics/issues",
"Repository": "https://github.com/bobmatnyc/gitflow-analytics"
},
"split_keywords": [
"git",
" analytics",
" productivity",
" metrics",
" development"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "404eb5c2ec732a1b3477cc8b38c59c72b10464c13bf752a87be982a1eeea76fd",
"md5": "b4accbab1384e82256b06d8b5917bfcf",
"sha256": "7c35f27cd7e057affb4f070b768408990d82a1b2649a5729c7517cb8f2b52b2a"
},
"downloads": -1,
"filename": "gitflow_analytics-3.12.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b4accbab1384e82256b06d8b5917bfcf",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 625857,
"upload_time": "2025-11-06T23:37:17",
"upload_time_iso_8601": "2025-11-06T23:37:17.593813Z",
"url": "https://files.pythonhosted.org/packages/40/4e/b5c2ec732a1b3477cc8b38c59c72b10464c13bf752a87be982a1eeea76fd/gitflow_analytics-3.12.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "9f9024d6d8c31ae61439b9e258445f7c548f7b7c9591b04e919e086d2e3765f6",
"md5": "12af64c4e47595ab0bfa425216e943e2",
"sha256": "d2d1958907912ed1564bbe2072d03857600faa5e41d7d8ff53e3670e19ef4342"
},
"downloads": -1,
"filename": "gitflow_analytics-3.12.6.tar.gz",
"has_sig": false,
"md5_digest": "12af64c4e47595ab0bfa425216e943e2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 825631,
"upload_time": "2025-11-06T23:37:19",
"upload_time_iso_8601": "2025-11-06T23:37:19.571437Z",
"url": "https://files.pythonhosted.org/packages/9f/90/24d6d8c31ae61439b9e258445f7c548f7b7c9591b04e919e086d2e3765f6/gitflow_analytics-3.12.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-06 23:37:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "bobmatnyc",
"github_project": "gitflow-analytics",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "gitflow-analytics"
}