Name | git-pandas JSON |
Version |
2.1.0
JSON |
| download |
home_page | None |
Summary | A utility for interacting with data from git repositories as Pandas dataframes |
upload_time | 2025-03-31 23:34:34 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | BSD |
keywords |
analysis
data
git
pandas
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
Git-Pandas
==========
 [](https://badge.fury.io/py/git-pandas) 
Git-Pandas is a powerful Python library that transforms Git repository data into pandas DataFrames, making it easy to analyze and visualize your codebase's history, contributors, and development patterns. Built on top of GitPython, it provides a simple yet powerful interface for extracting meaningful insights from your Git repositories.

## Why Git-Pandas?
- **Easy to Use**: Simple API that converts Git data into familiar pandas DataFrames
- **Comprehensive Analysis**: From basic commit history to complex metrics like bus factor
- **Flexible**: Works with single repositories or entire project directories
- **Visualization Ready**: Built-in plotting utilities for common Git analytics
- **Performance Optimized**: Optional caching support for memory-intensive operations
## Core Components
### Repository
The `Repository` class provides a wrapper around a single Git repository, offering methods to:
- Extract commit history with filtering by extension and directory
- Analyze file changes and blame information
- Track branch and tag information
- Generate cumulative blame statistics
- Calculate file ownership and contribution patterns
### ProjectDirectory
The `ProjectDirectory` class enables analysis across multiple repositories:
- Automatically discovers and analyzes nested Git repositories
- Aggregates metrics across multiple repositories
- Provides project-level insights and statistics
- Calculates cross-repository metrics like total development time
## Key Features
### Repository Analysis
- **Commit History**: Track changes with extension and directory filtering
- **File Analysis**: Monitor edited files and blame information
- **Branch & Tag Management**: Access repository structure information
- **Cumulative Blame**: Generate time-series data of code ownership
- **File Ownership**: Approximate file ownership and contribution patterns
### Project Insights
- **Bus Factor**: Calculate project sustainability metrics
- **Development Time**: Estimate hours spent per project or author
- **Contributor Analysis**: Track individual and team contributions
- **Project Health**: Generate comprehensive project information tables
### GitHub Integration
- **Profile Analysis**: Analyze GitHub.com profiles via `GitHubProfile` object
- **Repository Metrics**: Extract repository-specific insights
- **Contributor Insights**: Track external contributions and collaborations
### Visualization Tools
- **Plotting Helpers**: Built-in utilities for common Git analytics
- **Punchcard Analysis**: Generate and visualize commit patterns
- **Blame Visualization**: Create cumulative blame charts
- **Time Series Analysis**: Track changes and patterns over time
## Installation
Git-Pandas supports Python 2.7+ and 3.3+. Install using pip:
```bash
pip install git-pandas
```
## Quick Start
```python
from gitpandas import Repository
# Analyze a single repository
repo = Repository('/path/to/repo')
commits_df = repo.commit_history()
blame_df = repo.blame()
# Analyze multiple repositories
from gitpandas import ProjectDirectory
project = ProjectDirectory('/path/to/project')
project_info = project.general_information()
```
## Documentation
Comprehensive documentation is available at [http://wdm0006.github.io/git-pandas/](http://wdm0006.github.io/git-pandas/)
## Performance Optimization
For memory-intensive operations, Git-Pandas supports:
- Memory-based caching
- Redis-based caching
- Configurable cache durations
## Projects Using Git-Pandas
- [GitNOC](https://github.com/wdm0006/gitnoc): Network of Code analysis tool
- [Commit Opener](https://github.com/lbillingham/commit_opener): Commit analysis and visualization tool
## Contributing
We welcome contributions! Please review our [Contributing Guidelines](CONTRIBUTING.md) for details on:
- Code of Conduct
- Development Setup
- Pull Request Process
- Starter Issues
## License
This project is BSD licensed (see [LICENSE.md](LICENSE.md))
Raw data
{
"_id": null,
"home_page": null,
"name": "git-pandas",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "analysis, data, git, pandas",
"author": null,
"author_email": "Will McGinnis <will@pedalwrencher.com>",
"download_url": "https://files.pythonhosted.org/packages/45/c6/88c5f7165494ca2ab726b408bdaa27626bccaa312b43cdfef2c34745b1a9/git_pandas-2.1.0.tar.gz",
"platform": null,
"description": "Git-Pandas\n==========\n\n [](https://badge.fury.io/py/git-pandas)  \n\nGit-Pandas is a powerful Python library that transforms Git repository data into pandas DataFrames, making it easy to analyze and visualize your codebase's history, contributors, and development patterns. Built on top of GitPython, it provides a simple yet powerful interface for extracting meaningful insights from your Git repositories.\n\n\n\n## Why Git-Pandas?\n\n- **Easy to Use**: Simple API that converts Git data into familiar pandas DataFrames\n- **Comprehensive Analysis**: From basic commit history to complex metrics like bus factor\n- **Flexible**: Works with single repositories or entire project directories\n- **Visualization Ready**: Built-in plotting utilities for common Git analytics\n- **Performance Optimized**: Optional caching support for memory-intensive operations\n\n## Core Components\n\n### Repository\nThe `Repository` class provides a wrapper around a single Git repository, offering methods to:\n- Extract commit history with filtering by extension and directory\n- Analyze file changes and blame information\n- Track branch and tag information\n- Generate cumulative blame statistics\n- Calculate file ownership and contribution patterns\n\n### ProjectDirectory\nThe `ProjectDirectory` class enables analysis across multiple repositories:\n- Automatically discovers and analyzes nested Git repositories\n- Aggregates metrics across multiple repositories\n- Provides project-level insights and statistics\n- Calculates cross-repository metrics like total development time\n\n## Key Features\n\n### Repository Analysis\n- **Commit History**: Track changes with extension and directory filtering\n- **File Analysis**: Monitor edited files and blame information\n- **Branch & Tag Management**: Access repository structure information\n- **Cumulative Blame**: Generate time-series data of code ownership\n- **File Ownership**: Approximate file ownership and contribution patterns\n\n### Project Insights\n- **Bus Factor**: Calculate project sustainability metrics\n- **Development Time**: Estimate hours spent per project or author\n- **Contributor Analysis**: Track individual and team contributions\n- **Project Health**: Generate comprehensive project information tables\n\n### GitHub Integration\n- **Profile Analysis**: Analyze GitHub.com profiles via `GitHubProfile` object\n- **Repository Metrics**: Extract repository-specific insights\n- **Contributor Insights**: Track external contributions and collaborations\n\n### Visualization Tools\n- **Plotting Helpers**: Built-in utilities for common Git analytics\n- **Punchcard Analysis**: Generate and visualize commit patterns\n- **Blame Visualization**: Create cumulative blame charts\n- **Time Series Analysis**: Track changes and patterns over time\n\n## Installation\n\nGit-Pandas supports Python 2.7+ and 3.3+. Install using pip:\n\n```bash\npip install git-pandas\n```\n\n## Quick Start\n\n```python\nfrom gitpandas import Repository\n\n# Analyze a single repository\nrepo = Repository('/path/to/repo')\ncommits_df = repo.commit_history()\nblame_df = repo.blame()\n\n# Analyze multiple repositories\nfrom gitpandas import ProjectDirectory\nproject = ProjectDirectory('/path/to/project')\nproject_info = project.general_information()\n```\n\n## Documentation\n\nComprehensive documentation is available at [http://wdm0006.github.io/git-pandas/](http://wdm0006.github.io/git-pandas/)\n\n## Performance Optimization\n\nFor memory-intensive operations, Git-Pandas supports:\n- Memory-based caching\n- Redis-based caching\n- Configurable cache durations\n\n## Projects Using Git-Pandas\n\n- [GitNOC](https://github.com/wdm0006/gitnoc): Network of Code analysis tool\n- [Commit Opener](https://github.com/lbillingham/commit_opener): Commit analysis and visualization tool\n\n## Contributing\n\nWe welcome contributions! Please review our [Contributing Guidelines](CONTRIBUTING.md) for details on:\n- Code of Conduct\n- Development Setup\n- Pull Request Process\n- Starter Issues\n\n## License\n\nThis project is BSD licensed (see [LICENSE.md](LICENSE.md))",
"bugtrack_url": null,
"license": "BSD",
"summary": "A utility for interacting with data from git repositories as Pandas dataframes",
"version": "2.1.0",
"project_urls": null,
"split_keywords": [
"analysis",
" data",
" git",
" pandas"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "45c688c5f7165494ca2ab726b408bdaa27626bccaa312b43cdfef2c34745b1a9",
"md5": "b88534748c31b53bb3279dee85bba746",
"sha256": "05be2e55b4f0a3bdb7bdfd0595f9aeaab1cdb278b68314581f5e9d1c6ba20117"
},
"downloads": -1,
"filename": "git_pandas-2.1.0.tar.gz",
"has_sig": false,
"md5_digest": "b88534748c31b53bb3279dee85bba746",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 459591,
"upload_time": "2025-03-31T23:34:34",
"upload_time_iso_8601": "2025-03-31T23:34:34.692195Z",
"url": "https://files.pythonhosted.org/packages/45/c6/88c5f7165494ca2ab726b408bdaa27626bccaa312b43cdfef2c34745b1a9/git_pandas-2.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-03-31 23:34:34",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "git-pandas"
}