# gitin - GitHub Repository Content Extractor
[![PyPI version](https://badge.fury.io/py/gitin.svg)](https://badge.fury.io/py/gitin)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python Versions](https://img.shields.io/pypi/pyversions/gitin.svg)](https://pypi.org/project/gitin/)
A command-line tool designed to extract and format GitHub repository content for effective use with Large Language Models (LLMs). Perfect for providing codebases as context in LLM conversations.
## Features
- 🔍 **Smart Extraction**: Extract files based on patterns and content
- 📝 **LLM-Optimized**: Clean markdown output with token estimation
- 🚀 **Progress Tracking**: Visual progress bars for operations
- 🎯 **Flexible Filtering**: Include/exclude patterns and content search
- 📊 **Size Control**: File size limits and statistics
- 🔄 **Version Control**: Respects .gitignore rules
## Installation
### From PyPI (Recommended)
```bash
pip install gitin
```
### From Source
```bash
git clone https://github.com/unclecode/gitin
cd gitin
pip install -e .
```
## Quick Start
```bash
# Basic usage
gitin https://github.com/username/repo -o output.md
# Extract Python files only
gitin https://github.com/username/repo --include="*.py" -o python_files.md
# Search for specific content
gitin https://github.com/username/repo --search="TODO,FIXME" -o review.md
```
## Advanced Usage
### Code Review Workflow
```bash
# Extract implementation files, excluding tests
gitin https://github.com/username/repo \
--include="src/*.py" \
--exclude="test_*,*_test.py" \
--search="TODO,FIXME,HACK" \
-o code_review.md
```
### Feature Analysis
```bash
# Find async/await usage
gitin https://github.com/username/repo \
--include="*.py,*.js" \
--search="async def,await" \
-o async_patterns.md
```
### Documentation Extraction
```bash
# Get all documentation files
gitin https://github.com/username/repo \
--include="docs/*,*.md,*.rst" \
--exclude="**/tests/*" \
-o documentation.md
```
## Command-Line Options
```
Options:
--version Show the version and exit.
--exclude TEXT Comma-separated glob patterns to exclude
Example: --exclude="test_*,*.tmp,docs/*"
--include TEXT Comma-separated glob patterns to include
Example: --include="*.py,src/*.js,lib/*.rb"
--search TEXT Comma-separated strings to search in content
Example: --search="TODO,FIXME,HACK"
--max-size INTEGER Maximum file size in bytes (default: 1MB)
-o, --output TEXT Output markdown file path [required]
--help Show this message and exit.
```
## LLM Integration Tips
1. **Context Window Management**
- Check the token count in the summary
- Use `--max-size` to limit file sizes
- Use `--search` to focus on relevant sections
2. **Effective Filtering**
- Use `--include` for specific file types
- Use `--exclude` to remove noise
- Combine with `--search` for precise results
3. **Best Practices**
```bash
# Extract core functionality
gitin https://github.com/org/repo \
--include="src/**/*.py" \
--exclude="**/*test*" \
--max-size=50000 \
-o core_logic.md
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Changelog
See [CHANGELOG.md](CHANGELOG.md) for release history.
Raw data
{
"_id": null,
"home_page": "https://github.com/unclecode/gitin",
"name": "gitin",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "github, llm, content-extraction, markdown, repository-analysis",
"author": "unclecode",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/f3/ab/8dc1a2d9ac9be8633ec09966a476203632646104ab58016122506ba33ff2/gitin-0.1.0.tar.gz",
"platform": null,
"description": "# gitin - GitHub Repository Content Extractor\n\n[![PyPI version](https://badge.fury.io/py/gitin.svg)](https://badge.fury.io/py/gitin)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python Versions](https://img.shields.io/pypi/pyversions/gitin.svg)](https://pypi.org/project/gitin/)\n\nA command-line tool designed to extract and format GitHub repository content for effective use with Large Language Models (LLMs). Perfect for providing codebases as context in LLM conversations.\n\n## Features\n\n- \ud83d\udd0d **Smart Extraction**: Extract files based on patterns and content\n- \ud83d\udcdd **LLM-Optimized**: Clean markdown output with token estimation\n- \ud83d\ude80 **Progress Tracking**: Visual progress bars for operations\n- \ud83c\udfaf **Flexible Filtering**: Include/exclude patterns and content search\n- \ud83d\udcca **Size Control**: File size limits and statistics\n- \ud83d\udd04 **Version Control**: Respects .gitignore rules\n\n## Installation\n\n### From PyPI (Recommended)\n```bash\npip install gitin\n```\n\n### From Source\n```bash\ngit clone https://github.com/unclecode/gitin\ncd gitin\npip install -e .\n```\n\n## Quick Start\n\n```bash\n# Basic usage\ngitin https://github.com/username/repo -o output.md\n\n# Extract Python files only\ngitin https://github.com/username/repo --include=\"*.py\" -o python_files.md\n\n# Search for specific content\ngitin https://github.com/username/repo --search=\"TODO,FIXME\" -o review.md\n```\n\n## Advanced Usage\n\n### Code Review Workflow\n```bash\n# Extract implementation files, excluding tests\ngitin https://github.com/username/repo \\\n --include=\"src/*.py\" \\\n --exclude=\"test_*,*_test.py\" \\\n --search=\"TODO,FIXME,HACK\" \\\n -o code_review.md\n```\n\n### Feature Analysis\n```bash\n# Find async/await usage\ngitin https://github.com/username/repo \\\n --include=\"*.py,*.js\" \\\n --search=\"async def,await\" \\\n -o async_patterns.md\n```\n\n### Documentation Extraction\n```bash\n# Get all documentation files\ngitin https://github.com/username/repo \\\n --include=\"docs/*,*.md,*.rst\" \\\n --exclude=\"**/tests/*\" \\\n -o documentation.md\n```\n\n## Command-Line Options\n\n```\nOptions:\n --version Show the version and exit.\n --exclude TEXT Comma-separated glob patterns to exclude\n Example: --exclude=\"test_*,*.tmp,docs/*\"\n --include TEXT Comma-separated glob patterns to include\n Example: --include=\"*.py,src/*.js,lib/*.rb\"\n --search TEXT Comma-separated strings to search in content\n Example: --search=\"TODO,FIXME,HACK\"\n --max-size INTEGER Maximum file size in bytes (default: 1MB)\n -o, --output TEXT Output markdown file path [required]\n --help Show this message and exit.\n```\n\n## LLM Integration Tips\n\n1. **Context Window Management**\n - Check the token count in the summary\n - Use `--max-size` to limit file sizes\n - Use `--search` to focus on relevant sections\n\n2. **Effective Filtering**\n - Use `--include` for specific file types\n - Use `--exclude` to remove noise\n - Combine with `--search` for precise results\n\n3. **Best Practices**\n ```bash\n # Extract core functionality\n gitin https://github.com/org/repo \\\n --include=\"src/**/*.py\" \\\n --exclude=\"**/*test*\" \\\n --max-size=50000 \\\n -o core_logic.md\n ```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for release history.\n",
"bugtrack_url": null,
"license": null,
"summary": "Extract and format GitHub repository content for LLMs",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/unclecode/gitin/issues",
"Changelog": "https://github.com/unclecode/gitin/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/unclecode/gitin#readme",
"Homepage": "https://github.com/unclecode/gitin",
"Source Code": "https://github.com/unclecode/gitin"
},
"split_keywords": [
"github",
" llm",
" content-extraction",
" markdown",
" repository-analysis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d4f0a964d978a11d3b7be3d78762c2750198dbc1579354d901cd8c450a18c20f",
"md5": "b676942736f9814c53a44de886a043ca",
"sha256": "055de2b7e911bb22b0053ebf5a76c889788ecb75bb1da7d2317e5480e2360886"
},
"downloads": -1,
"filename": "gitin-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b676942736f9814c53a44de886a043ca",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 6546,
"upload_time": "2024-12-31T10:27:17",
"upload_time_iso_8601": "2024-12-31T10:27:17.046285Z",
"url": "https://files.pythonhosted.org/packages/d4/f0/a964d978a11d3b7be3d78762c2750198dbc1579354d901cd8c450a18c20f/gitin-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f3ab8dc1a2d9ac9be8633ec09966a476203632646104ab58016122506ba33ff2",
"md5": "1f4a7e69eaa05b4d91ae02656e987859",
"sha256": "ea4aa27d6d6afcec5ceb1e60fac237ed75331b88d6488366be93995ee1563783"
},
"downloads": -1,
"filename": "gitin-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "1f4a7e69eaa05b4d91ae02656e987859",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 6184,
"upload_time": "2024-12-31T10:27:18",
"upload_time_iso_8601": "2024-12-31T10:27:18.414529Z",
"url": "https://files.pythonhosted.org/packages/f3/ab/8dc1a2d9ac9be8633ec09966a476203632646104ab58016122506ba33ff2/gitin-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-31 10:27:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "unclecode",
"github_project": "gitin",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "click",
"specs": [
[
"==",
"8.1.7"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.31.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.66.1"
]
]
}
],
"lcname": "gitin"
}