gitin


Namegitin JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/unclecode/gitin
SummaryExtract and format GitHub repository content for LLMs
upload_time2024-12-31 10:27:18
maintainerNone
docs_urlNone
authorunclecode
requires_python>=3.6
licenseNone
keywords github llm content-extraction markdown repository-analysis
VCS
bugtrack_url
requirements click requests tqdm
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # gitin - GitHub Repository Content Extractor

[![PyPI version](https://badge.fury.io/py/gitin.svg)](https://badge.fury.io/py/gitin)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python Versions](https://img.shields.io/pypi/pyversions/gitin.svg)](https://pypi.org/project/gitin/)

A command-line tool designed to extract and format GitHub repository content for effective use with Large Language Models (LLMs). Perfect for providing codebases as context in LLM conversations.

## Features

- 🔍 **Smart Extraction**: Extract files based on patterns and content
- 📝 **LLM-Optimized**: Clean markdown output with token estimation
- 🚀 **Progress Tracking**: Visual progress bars for operations
- 🎯 **Flexible Filtering**: Include/exclude patterns and content search
- 📊 **Size Control**: File size limits and statistics
- 🔄 **Version Control**: Respects .gitignore rules

## Installation

### From PyPI (Recommended)
```bash
pip install gitin
```

### From Source
```bash
git clone https://github.com/unclecode/gitin
cd gitin
pip install -e .
```

## Quick Start

```bash
# Basic usage
gitin https://github.com/username/repo -o output.md

# Extract Python files only
gitin https://github.com/username/repo --include="*.py" -o python_files.md

# Search for specific content
gitin https://github.com/username/repo --search="TODO,FIXME" -o review.md
```

## Advanced Usage

### Code Review Workflow
```bash
# Extract implementation files, excluding tests
gitin https://github.com/username/repo \
  --include="src/*.py" \
  --exclude="test_*,*_test.py" \
  --search="TODO,FIXME,HACK" \
  -o code_review.md
```

### Feature Analysis
```bash
# Find async/await usage
gitin https://github.com/username/repo \
  --include="*.py,*.js" \
  --search="async def,await" \
  -o async_patterns.md
```

### Documentation Extraction
```bash
# Get all documentation files
gitin https://github.com/username/repo \
  --include="docs/*,*.md,*.rst" \
  --exclude="**/tests/*" \
  -o documentation.md
```

## Command-Line Options

```
Options:
  --version           Show the version and exit.
  --exclude TEXT      Comma-separated glob patterns to exclude
                     Example: --exclude="test_*,*.tmp,docs/*"
  --include TEXT      Comma-separated glob patterns to include
                     Example: --include="*.py,src/*.js,lib/*.rb"
  --search TEXT       Comma-separated strings to search in content
                     Example: --search="TODO,FIXME,HACK"
  --max-size INTEGER  Maximum file size in bytes (default: 1MB)
  -o, --output TEXT   Output markdown file path [required]
  --help             Show this message and exit.
```

## LLM Integration Tips

1. **Context Window Management**
   - Check the token count in the summary
   - Use `--max-size` to limit file sizes
   - Use `--search` to focus on relevant sections

2. **Effective Filtering**
   - Use `--include` for specific file types
   - Use `--exclude` to remove noise
   - Combine with `--search` for precise results

3. **Best Practices**
   ```bash
   # Extract core functionality
   gitin https://github.com/org/repo \
     --include="src/**/*.py" \
     --exclude="**/*test*" \
     --max-size=50000 \
     -o core_logic.md
   ```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for release history.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/unclecode/gitin",
    "name": "gitin",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "github, llm, content-extraction, markdown, repository-analysis",
    "author": "unclecode",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/f3/ab/8dc1a2d9ac9be8633ec09966a476203632646104ab58016122506ba33ff2/gitin-0.1.0.tar.gz",
    "platform": null,
    "description": "# gitin - GitHub Repository Content Extractor\n\n[![PyPI version](https://badge.fury.io/py/gitin.svg)](https://badge.fury.io/py/gitin)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python Versions](https://img.shields.io/pypi/pyversions/gitin.svg)](https://pypi.org/project/gitin/)\n\nA command-line tool designed to extract and format GitHub repository content for effective use with Large Language Models (LLMs). Perfect for providing codebases as context in LLM conversations.\n\n## Features\n\n- \ud83d\udd0d **Smart Extraction**: Extract files based on patterns and content\n- \ud83d\udcdd **LLM-Optimized**: Clean markdown output with token estimation\n- \ud83d\ude80 **Progress Tracking**: Visual progress bars for operations\n- \ud83c\udfaf **Flexible Filtering**: Include/exclude patterns and content search\n- \ud83d\udcca **Size Control**: File size limits and statistics\n- \ud83d\udd04 **Version Control**: Respects .gitignore rules\n\n## Installation\n\n### From PyPI (Recommended)\n```bash\npip install gitin\n```\n\n### From Source\n```bash\ngit clone https://github.com/unclecode/gitin\ncd gitin\npip install -e .\n```\n\n## Quick Start\n\n```bash\n# Basic usage\ngitin https://github.com/username/repo -o output.md\n\n# Extract Python files only\ngitin https://github.com/username/repo --include=\"*.py\" -o python_files.md\n\n# Search for specific content\ngitin https://github.com/username/repo --search=\"TODO,FIXME\" -o review.md\n```\n\n## Advanced Usage\n\n### Code Review Workflow\n```bash\n# Extract implementation files, excluding tests\ngitin https://github.com/username/repo \\\n  --include=\"src/*.py\" \\\n  --exclude=\"test_*,*_test.py\" \\\n  --search=\"TODO,FIXME,HACK\" \\\n  -o code_review.md\n```\n\n### Feature Analysis\n```bash\n# Find async/await usage\ngitin https://github.com/username/repo \\\n  --include=\"*.py,*.js\" \\\n  --search=\"async def,await\" \\\n  -o async_patterns.md\n```\n\n### Documentation Extraction\n```bash\n# Get all documentation files\ngitin https://github.com/username/repo \\\n  --include=\"docs/*,*.md,*.rst\" \\\n  --exclude=\"**/tests/*\" \\\n  -o documentation.md\n```\n\n## Command-Line Options\n\n```\nOptions:\n  --version           Show the version and exit.\n  --exclude TEXT      Comma-separated glob patterns to exclude\n                     Example: --exclude=\"test_*,*.tmp,docs/*\"\n  --include TEXT      Comma-separated glob patterns to include\n                     Example: --include=\"*.py,src/*.js,lib/*.rb\"\n  --search TEXT       Comma-separated strings to search in content\n                     Example: --search=\"TODO,FIXME,HACK\"\n  --max-size INTEGER  Maximum file size in bytes (default: 1MB)\n  -o, --output TEXT   Output markdown file path [required]\n  --help             Show this message and exit.\n```\n\n## LLM Integration Tips\n\n1. **Context Window Management**\n   - Check the token count in the summary\n   - Use `--max-size` to limit file sizes\n   - Use `--search` to focus on relevant sections\n\n2. **Effective Filtering**\n   - Use `--include` for specific file types\n   - Use `--exclude` to remove noise\n   - Combine with `--search` for precise results\n\n3. **Best Practices**\n   ```bash\n   # Extract core functionality\n   gitin https://github.com/org/repo \\\n     --include=\"src/**/*.py\" \\\n     --exclude=\"**/*test*\" \\\n     --max-size=50000 \\\n     -o core_logic.md\n   ```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for release history.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Extract and format GitHub repository content for LLMs",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/unclecode/gitin/issues",
        "Changelog": "https://github.com/unclecode/gitin/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/unclecode/gitin#readme",
        "Homepage": "https://github.com/unclecode/gitin",
        "Source Code": "https://github.com/unclecode/gitin"
    },
    "split_keywords": [
        "github",
        " llm",
        " content-extraction",
        " markdown",
        " repository-analysis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d4f0a964d978a11d3b7be3d78762c2750198dbc1579354d901cd8c450a18c20f",
                "md5": "b676942736f9814c53a44de886a043ca",
                "sha256": "055de2b7e911bb22b0053ebf5a76c889788ecb75bb1da7d2317e5480e2360886"
            },
            "downloads": -1,
            "filename": "gitin-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b676942736f9814c53a44de886a043ca",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 6546,
            "upload_time": "2024-12-31T10:27:17",
            "upload_time_iso_8601": "2024-12-31T10:27:17.046285Z",
            "url": "https://files.pythonhosted.org/packages/d4/f0/a964d978a11d3b7be3d78762c2750198dbc1579354d901cd8c450a18c20f/gitin-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f3ab8dc1a2d9ac9be8633ec09966a476203632646104ab58016122506ba33ff2",
                "md5": "1f4a7e69eaa05b4d91ae02656e987859",
                "sha256": "ea4aa27d6d6afcec5ceb1e60fac237ed75331b88d6488366be93995ee1563783"
            },
            "downloads": -1,
            "filename": "gitin-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1f4a7e69eaa05b4d91ae02656e987859",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 6184,
            "upload_time": "2024-12-31T10:27:18",
            "upload_time_iso_8601": "2024-12-31T10:27:18.414529Z",
            "url": "https://files.pythonhosted.org/packages/f3/ab/8dc1a2d9ac9be8633ec09966a476203632646104ab58016122506ba33ff2/gitin-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-31 10:27:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "unclecode",
    "github_project": "gitin",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.31.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.66.1"
                ]
            ]
        }
    ],
    "lcname": "gitin"
}
        
Elapsed time: 0.43389s