# LlamaIndex Readers Integration: Github
`pip install llama-index-readers-github`
The github readers package consists of three separate readers:
1. Repository Reader
2. Issues Reader
3. Collaborators Reader
All three readers will require a personal access token (which you can generate under your account settings).
## Repository Reader
This reader will read through a repo, with options to specifically filter directories, file extensions, file paths, and custom processing logic.
### Basic Usage
```python
from llama_index.readers.github import GithubRepositoryReader, GithubClient
client = github_client = GithubClient(github_token=github_token, verbose=False)
reader = GithubRepositoryReader(
github_client=github_client,
owner="run-llama",
repo="llama_index",
use_parser=False,
verbose=True,
filter_directories=(
["docs"],
GithubRepositoryReader.FilterType.INCLUDE,
),
filter_file_extensions=(
[
".png",
".jpg",
".jpeg",
".gif",
".svg",
".ico",
"json",
".ipynb",
],
GithubRepositoryReader.FilterType.EXCLUDE,
),
)
documents = reader.load_data(branch="main")
```
### Advanced Filtering Options
#### Filter Specific File Paths
```python
# Include only specific files
reader = GithubRepositoryReader(
github_client=github_client,
owner="run-llama",
repo="llama_index",
filter_file_paths=(
["README.md", "src/main.py", "docs/guide.md"],
GithubRepositoryReader.FilterType.INCLUDE,
),
)
# Exclude specific files
reader = GithubRepositoryReader(
github_client=github_client,
owner="run-llama",
repo="llama_index",
filter_file_paths=(
["tests/test_file.py", "temp/cache.txt"],
GithubRepositoryReader.FilterType.EXCLUDE,
),
)
```
#### Custom File Processing Callback
```python
def process_file_callback(file_path: str, file_size: int) -> tuple[bool, str]:
"""Custom logic to determine if a file should be processed.
Args:
file_path: The full path to the file
file_size: The size of the file in bytes
Returns:
Tuple of (should_process: bool, reason: str)
"""
# Skip large files
if file_size > 1024 * 1024: # 1MB
return False, f"File too large: {file_size} bytes"
# Skip test files
if "test" in file_path.lower():
return False, "Skipping test files"
# Skip binary files by extension
binary_extensions = [".exe", ".bin", ".so", ".dylib"]
if any(file_path.endswith(ext) for ext in binary_extensions):
return False, "Skipping binary files"
return True, ""
reader = GithubRepositoryReader(
github_client=github_client,
owner="run-llama",
repo="llama_index",
process_file_callback=process_file_callback,
fail_on_error=False, # Continue processing if callback fails
)
```
#### Custom Folder for Temporary Files
```python
from llama_index.core.readers.base import BaseReader
# Custom parser for specific file types
class CustomMarkdownParser(BaseReader):
def load_data(self, file_path, extra_info=None):
# Custom parsing logic here
pass
reader = GithubRepositoryReader(
github_client=github_client,
owner="run-llama",
repo="llama_index",
use_parser=True,
custom_parsers={".md": CustomMarkdownParser()},
custom_folder="/tmp/github_processing", # Custom temp directory
)
```
### Event System Integration
The reader integrates with LlamaIndex's instrumentation system to provide detailed events during processing:
```python
from llama_index.core.instrumentation import get_dispatcher
from llama_index.core.instrumentation.event_handlers import BaseEventHandler
from llama_index.readers.github.repository.event import (
GitHubFileProcessedEvent,
GitHubFileSkippedEvent,
GitHubFileFailedEvent,
GitHubRepositoryProcessingStartedEvent,
GitHubRepositoryProcessingCompletedEvent,
)
class GitHubEventHandler(BaseEventHandler):
def handle(self, event):
if isinstance(event, GitHubRepositoryProcessingStartedEvent):
print(f"Started processing repository: {event.repository_name}")
elif isinstance(event, GitHubFileProcessedEvent):
print(
f"Processed file: {event.file_path} ({event.file_size} bytes)"
)
elif isinstance(event, GitHubFileSkippedEvent):
print(f"Skipped file: {event.file_path} - {event.reason}")
elif isinstance(event, GitHubFileFailedEvent):
print(f"Failed to process file: {event.file_path} - {event.error}")
elif isinstance(event, GitHubRepositoryProcessingCompletedEvent):
print(
f"Completed processing. Total documents: {event.total_documents}"
)
# Register the event handler
dispatcher = get_dispatcher()
handler = GitHubEventHandler()
dispatcher.add_event_handler(handler)
# Use the reader - events will be automatically dispatched
reader = GithubRepositoryReader(
github_client=github_client,
owner="run-llama",
repo="llama_index",
)
documents = reader.load_data(branch="main")
```
#### Available Events
The following events are dispatched during repository processing:
- **`GitHubRepositoryProcessingStartedEvent`**: Fired when repository processing begins
- `repository_name`: Name of the repository (owner/repo)
- `branch_or_commit`: Branch name or commit SHA being processed
- **`GitHubRepositoryProcessingCompletedEvent`**: Fired when repository processing completes
- `repository_name`: Name of the repository
- `branch_or_commit`: Branch name or commit SHA
- `total_documents`: Number of documents created
- **`GitHubTotalFilesToProcessEvent`**: Fired with the total count of files to be processed
- `repository_name`: Name of the repository
- `branch_or_commit`: Branch name or commit SHA
- `total_files`: Total number of files found
- **`GitHubFileProcessingStartedEvent`**: Fired when individual file processing starts
- `file_path`: Path to the file being processed
- `file_type`: File extension
- **`GitHubFileProcessedEvent`**: Fired when a file is successfully processed
- `file_path`: Path to the processed file
- `file_type`: File extension
- `file_size`: Size of the file in bytes
- `document`: The created Document object
- **`GitHubFileSkippedEvent`**: Fired when a file is skipped
- `file_path`: Path to the skipped file
- `file_type`: File extension
- `reason`: Reason why the file was skipped
- **`GitHubFileFailedEvent`**: Fired when file processing fails
- `file_path`: Path to the failed file
- `file_type`: File extension
- `error`: Error message describing the failure
## Issues Reader
```python
from llama_index.readers.github import (
GitHubRepositoryIssuesReader,
GitHubIssuesClient,
)
github_client = GitHubIssuesClient(github_token=github_token, verbose=True)
reader = GitHubRepositoryIssuesReader(
github_client=github_client,
owner="moncho",
repo="dry",
verbose=True,
)
documents = reader.load_data(
state=GitHubRepositoryIssuesReader.IssueState.ALL,
labelFilters=[("bug", GitHubRepositoryIssuesReader.FilterType.INCLUDE)],
)
```
## Collaborators Reader
```python
from llama_index.readers.github import (
GitHubRepositoryCollaboratorsReader,
GitHubCollaboratorsClient,
)
github_client = GitHubCollaboratorsClient(
github_token=github_token, verbose=True
)
reader = GitHubRepositoryCollaboratorsReader(
github_client=github_client,
owner="moncho",
repo="dry",
verbose=True,
)
documents = reader.load_data()
```
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-readers-github",
"maintainer": "ahmetkca, moncho, rwood-97",
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "code, collaborators, git, github, issues, placeholder, repository, source code",
"author": null,
"author_email": "Your Name <you@example.com>",
"download_url": "https://files.pythonhosted.org/packages/55/6a/e8c742abbc3e0d02bd8958f49dab075f9ecce535769d64feb7cae54dd228/llama_index_readers_github-0.8.0.tar.gz",
"platform": null,
"description": "# LlamaIndex Readers Integration: Github\n\n`pip install llama-index-readers-github`\n\nThe github readers package consists of three separate readers:\n\n1. Repository Reader\n2. Issues Reader\n3. Collaborators Reader\n\nAll three readers will require a personal access token (which you can generate under your account settings).\n\n## Repository Reader\n\nThis reader will read through a repo, with options to specifically filter directories, file extensions, file paths, and custom processing logic.\n\n### Basic Usage\n\n```python\nfrom llama_index.readers.github import GithubRepositoryReader, GithubClient\n\nclient = github_client = GithubClient(github_token=github_token, verbose=False)\n\nreader = GithubRepositoryReader(\n github_client=github_client,\n owner=\"run-llama\",\n repo=\"llama_index\",\n use_parser=False,\n verbose=True,\n filter_directories=(\n [\"docs\"],\n GithubRepositoryReader.FilterType.INCLUDE,\n ),\n filter_file_extensions=(\n [\n \".png\",\n \".jpg\",\n \".jpeg\",\n \".gif\",\n \".svg\",\n \".ico\",\n \"json\",\n \".ipynb\",\n ],\n GithubRepositoryReader.FilterType.EXCLUDE,\n ),\n)\n\ndocuments = reader.load_data(branch=\"main\")\n```\n\n### Advanced Filtering Options\n\n#### Filter Specific File Paths\n\n```python\n# Include only specific files\nreader = GithubRepositoryReader(\n github_client=github_client,\n owner=\"run-llama\",\n repo=\"llama_index\",\n filter_file_paths=(\n [\"README.md\", \"src/main.py\", \"docs/guide.md\"],\n GithubRepositoryReader.FilterType.INCLUDE,\n ),\n)\n\n# Exclude specific files\nreader = GithubRepositoryReader(\n github_client=github_client,\n owner=\"run-llama\",\n repo=\"llama_index\",\n filter_file_paths=(\n [\"tests/test_file.py\", \"temp/cache.txt\"],\n GithubRepositoryReader.FilterType.EXCLUDE,\n ),\n)\n```\n\n#### Custom File Processing Callback\n\n```python\ndef process_file_callback(file_path: str, file_size: int) -> tuple[bool, str]:\n \"\"\"Custom logic to determine if a file should be processed.\n\n Args:\n file_path: The full path to the file\n file_size: The size of the file in bytes\n\n Returns:\n Tuple of (should_process: bool, reason: str)\n \"\"\"\n # Skip large files\n if file_size > 1024 * 1024: # 1MB\n return False, f\"File too large: {file_size} bytes\"\n\n # Skip test files\n if \"test\" in file_path.lower():\n return False, \"Skipping test files\"\n\n # Skip binary files by extension\n binary_extensions = [\".exe\", \".bin\", \".so\", \".dylib\"]\n if any(file_path.endswith(ext) for ext in binary_extensions):\n return False, \"Skipping binary files\"\n\n return True, \"\"\n\n\nreader = GithubRepositoryReader(\n github_client=github_client,\n owner=\"run-llama\",\n repo=\"llama_index\",\n process_file_callback=process_file_callback,\n fail_on_error=False, # Continue processing if callback fails\n)\n```\n\n#### Custom Folder for Temporary Files\n\n```python\nfrom llama_index.core.readers.base import BaseReader\n\n\n# Custom parser for specific file types\nclass CustomMarkdownParser(BaseReader):\n def load_data(self, file_path, extra_info=None):\n # Custom parsing logic here\n pass\n\n\nreader = GithubRepositoryReader(\n github_client=github_client,\n owner=\"run-llama\",\n repo=\"llama_index\",\n use_parser=True,\n custom_parsers={\".md\": CustomMarkdownParser()},\n custom_folder=\"/tmp/github_processing\", # Custom temp directory\n)\n```\n\n### Event System Integration\n\nThe reader integrates with LlamaIndex's instrumentation system to provide detailed events during processing:\n\n```python\nfrom llama_index.core.instrumentation import get_dispatcher\nfrom llama_index.core.instrumentation.event_handlers import BaseEventHandler\nfrom llama_index.readers.github.repository.event import (\n GitHubFileProcessedEvent,\n GitHubFileSkippedEvent,\n GitHubFileFailedEvent,\n GitHubRepositoryProcessingStartedEvent,\n GitHubRepositoryProcessingCompletedEvent,\n)\n\n\nclass GitHubEventHandler(BaseEventHandler):\n def handle(self, event):\n if isinstance(event, GitHubRepositoryProcessingStartedEvent):\n print(f\"Started processing repository: {event.repository_name}\")\n elif isinstance(event, GitHubFileProcessedEvent):\n print(\n f\"Processed file: {event.file_path} ({event.file_size} bytes)\"\n )\n elif isinstance(event, GitHubFileSkippedEvent):\n print(f\"Skipped file: {event.file_path} - {event.reason}\")\n elif isinstance(event, GitHubFileFailedEvent):\n print(f\"Failed to process file: {event.file_path} - {event.error}\")\n elif isinstance(event, GitHubRepositoryProcessingCompletedEvent):\n print(\n f\"Completed processing. Total documents: {event.total_documents}\"\n )\n\n\n# Register the event handler\ndispatcher = get_dispatcher()\nhandler = GitHubEventHandler()\ndispatcher.add_event_handler(handler)\n\n# Use the reader - events will be automatically dispatched\nreader = GithubRepositoryReader(\n github_client=github_client,\n owner=\"run-llama\",\n repo=\"llama_index\",\n)\ndocuments = reader.load_data(branch=\"main\")\n```\n\n#### Available Events\n\nThe following events are dispatched during repository processing:\n\n- **`GitHubRepositoryProcessingStartedEvent`**: Fired when repository processing begins\n\n - `repository_name`: Name of the repository (owner/repo)\n - `branch_or_commit`: Branch name or commit SHA being processed\n\n- **`GitHubRepositoryProcessingCompletedEvent`**: Fired when repository processing completes\n\n - `repository_name`: Name of the repository\n - `branch_or_commit`: Branch name or commit SHA\n - `total_documents`: Number of documents created\n\n- **`GitHubTotalFilesToProcessEvent`**: Fired with the total count of files to be processed\n\n - `repository_name`: Name of the repository\n - `branch_or_commit`: Branch name or commit SHA\n - `total_files`: Total number of files found\n\n- **`GitHubFileProcessingStartedEvent`**: Fired when individual file processing starts\n\n - `file_path`: Path to the file being processed\n - `file_type`: File extension\n\n- **`GitHubFileProcessedEvent`**: Fired when a file is successfully processed\n\n - `file_path`: Path to the processed file\n - `file_type`: File extension\n - `file_size`: Size of the file in bytes\n - `document`: The created Document object\n\n- **`GitHubFileSkippedEvent`**: Fired when a file is skipped\n\n - `file_path`: Path to the skipped file\n - `file_type`: File extension\n - `reason`: Reason why the file was skipped\n\n- **`GitHubFileFailedEvent`**: Fired when file processing fails\n - `file_path`: Path to the failed file\n - `file_type`: File extension\n - `error`: Error message describing the failure\n\n## Issues Reader\n\n```python\nfrom llama_index.readers.github import (\n GitHubRepositoryIssuesReader,\n GitHubIssuesClient,\n)\n\ngithub_client = GitHubIssuesClient(github_token=github_token, verbose=True)\n\nreader = GitHubRepositoryIssuesReader(\n github_client=github_client,\n owner=\"moncho\",\n repo=\"dry\",\n verbose=True,\n)\n\ndocuments = reader.load_data(\n state=GitHubRepositoryIssuesReader.IssueState.ALL,\n labelFilters=[(\"bug\", GitHubRepositoryIssuesReader.FilterType.INCLUDE)],\n)\n```\n\n## Collaborators Reader\n\n```python\nfrom llama_index.readers.github import (\n GitHubRepositoryCollaboratorsReader,\n GitHubCollaboratorsClient,\n)\n\ngithub_client = GitHubCollaboratorsClient(\n github_token=github_token, verbose=True\n)\n\nreader = GitHubRepositoryCollaboratorsReader(\n github_client=github_client,\n owner=\"moncho\",\n repo=\"dry\",\n verbose=True,\n)\n\ndocuments = reader.load_data()\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "llama-index readers github integration",
"version": "0.8.0",
"project_urls": null,
"split_keywords": [
"code",
" collaborators",
" git",
" github",
" issues",
" placeholder",
" repository",
" source code"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f346df7c8899469aac3a3b3805a338c486619e9bacfcaec396cdf700ed919cdc",
"md5": "82c9457631741a1ebd64fda85804c955",
"sha256": "3bfec0100d44025ad2b36c621a87ca3f9243701fd56322da0defe3539f226e10"
},
"downloads": -1,
"filename": "llama_index_readers_github-0.8.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "82c9457631741a1ebd64fda85804c955",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 26775,
"upload_time": "2025-07-31T02:44:35",
"upload_time_iso_8601": "2025-07-31T02:44:35.112225Z",
"url": "https://files.pythonhosted.org/packages/f3/46/df7c8899469aac3a3b3805a338c486619e9bacfcaec396cdf700ed919cdc/llama_index_readers_github-0.8.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "556ae8c742abbc3e0d02bd8958f49dab075f9ecce535769d64feb7cae54dd228",
"md5": "f672718055cc443c3f73446995deb00b",
"sha256": "881e4f8127521c5919003c9bd0149b619892fd30209be02f9845305110af2a0f"
},
"downloads": -1,
"filename": "llama_index_readers_github-0.8.0.tar.gz",
"has_sig": false,
"md5_digest": "f672718055cc443c3f73446995deb00b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 20246,
"upload_time": "2025-07-31T02:44:36",
"upload_time_iso_8601": "2025-07-31T02:44:36.223762Z",
"url": "https://files.pythonhosted.org/packages/55/6a/e8c742abbc3e0d02bd8958f49dab075f9ecce535769d64feb7cae54dd228/llama_index_readers_github-0.8.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-31 02:44:36",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-readers-github"
}