# GHS: Semantic Search for GitHub Stars
A command-line tool to semantically search your starred GitHub repositories.
**¿WHY?** If you are like me, who goes **starring** repositories as a way to bookmark them, but you later find it hard to recall a specific tool or library due to the archaic search feature in **GitHub**, which does not do semantic similarity search, then this tool is for you.
## Features
- Unified command-line interface with intuitive subcommands
- Fetches all starred repositories from your GitHub profile
- **Parallel processing** with 5 concurrent workers for fast README fetching
- **Intelligent rate limit handling** - automatically detects and waits for GitHub API limits to reset
- Extracts and parses README files (supports .md, .txt, and plain README)
- Generates embeddings using a lightweight sentence-transformer model (all-MiniLM-L6-v2)
- Stores data efficiently using sqlite-vec for fast vector similarity search
- Smart refresh command to sync added/removed stars
- Semantic search to find repositories by meaning, not just keywords
- Real-time progress feedback showing currently processing repositories
## Installation
### Option 1: Install from PyPI (Recommended)
```bash
# Install the package
pip install github-stars-search
# For CPU-only PyTorch (faster, no CUDA overhead):
pip install github-stars-search --extra-index-url https://download.pytorch.org/whl/cpu
```
After installation, the tool will be available as the `ghs` command.
### Option 2: Install from Source
```bash
# Clone the repository
git clone https://github.com/yourusername/github-stars-organizer.git
cd github-stars-organizer
# Install in development mode
pip install -e .
# For CPU-only PyTorch:
pip install -e . --extra-index-url https://download.pytorch.org/whl/cpu
```
## Setup
1. Create a GitHub Personal Access Token:
- Go to https://github.com/settings/tokens
- Create a new token with `public_repo` scope
- Copy the token
3. Configure environment:
```bash
cp .env.example .env
# Edit .env and add your GitHub token
```
## Usage
The tool provides a unified CLI with four main commands:
### Fetch - Initial Indexing
Fetch and index all your starred repositories:
```bash
ghs fetch
```
This will:
1. Check your GitHub API rate limit status
2. Fetch all your starred repositories from GitHub
3. Download and parse their READMEs in parallel (5 concurrent workers)
4. Generate embeddings using the all-MiniLM-L6-v2 model (384-dimensional)
5. Store everything in a local SQLite database with vector search capabilities
6. Skip repositories that are already stored
**Rate Limiting:** The tool automatically monitors GitHub API rate limits and will pause with a clear message if limits are reached, then resume when they reset.
### Search - Semantic Search
Search your stars using natural language queries:
```bash
ghs search "your search query"
```
Examples:
```bash
ghs search "machine learning frameworks"
ghs search "web scraping tools"
ghs search "rust web server"
ghs search "react component libraries" --limit 5
```
Options:
- `-l, --limit N`: Number of results to return (default: 10)
### Refresh - Sync Changes
Synchronize your database with your current GitHub stars (adds new stars, removes unstarred repositories):
```bash
ghs refresh
```
This command:
1. Fetches your current starred repositories
2. Compares with the local database
3. Adds newly starred repositories
4. Removes repositories you've unstarred
5. Shows a summary of changes
### Stats - Database Statistics
Show database statistics:
```bash
ghs stats
```
Displays:
- Total repositories indexed
- Number of repositories with embeddings
- Number of repositories with README files
- README coverage percentage
## Command Quick Reference
```bash
ghs fetch # Initial fetch and index
ghs search "query" # Search repositories
ghs search "query" --limit 5 # Limit results
ghs refresh # Sync added/removed stars
ghs stats # Show statistics
```
## How It Works
1. **GitHub API**: Uses PyGithub to fetch your starred repositories with intelligent rate limit handling
2. **Parallel README Fetching**: Downloads READMEs using 5 concurrent workers with shared rate limit detection
3. **README Extraction**: Uses GitHub's dedicated README API endpoint for efficient fetching
4. **Embeddings**: Uses sentence-transformers (all-MiniLM-L6-v2) to generate 384-dim vectors
5. **Vector Search**: Stores embeddings in sqlite-vec for fast similarity search using cosine distance
6. **Smart Sync**: Refresh command intelligently adds/removes repositories based on current stars
7. **Rate Limit Protection**: Automatically detects rate limits, displays clear wait times, and resumes when ready
## Database Schema
The tool creates a `stars.db` SQLite database with:
**repositories table:**
- Repository metadata (id, name, description, URL, stars, language)
- README content and type
- Timestamps
**vec_repositories table (virtual):**
- Vector embeddings for semantic search
- Linked to repositories via repo_id
Raw data
{
"_id": null,
"home_page": "https://github.com/yourusername/github-stars-organizer",
"name": "github-stars-search",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "github, stars, search, semantic, embeddings, vector-search",
"author": "Nicol\u00e1s Iglesias",
"author_email": "Nicol\u00e1s Iglesias <nfiglesias@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/8a/f3/493169b33a8853738d826a15b185f60b813faf5f67930295e3a43bce1044/github_stars_search-0.1.0.tar.gz",
"platform": null,
"description": "# GHS: Semantic Search for GitHub Stars\n\nA command-line tool to semantically search your starred GitHub repositories.\n\n**\u00bfWHY?** If you are like me, who goes **starring** repositories as a way to bookmark them, but you later find it hard to recall a specific tool or library due to the archaic search feature in **GitHub**, which does not do semantic similarity search, then this tool is for you.\n\n## Features\n\n- Unified command-line interface with intuitive subcommands\n- Fetches all starred repositories from your GitHub profile\n- **Parallel processing** with 5 concurrent workers for fast README fetching\n- **Intelligent rate limit handling** - automatically detects and waits for GitHub API limits to reset\n- Extracts and parses README files (supports .md, .txt, and plain README)\n- Generates embeddings using a lightweight sentence-transformer model (all-MiniLM-L6-v2)\n- Stores data efficiently using sqlite-vec for fast vector similarity search\n- Smart refresh command to sync added/removed stars\n- Semantic search to find repositories by meaning, not just keywords\n- Real-time progress feedback showing currently processing repositories\n\n## Installation\n\n### Option 1: Install from PyPI (Recommended)\n\n```bash\n# Install the package\npip install github-stars-search\n\n# For CPU-only PyTorch (faster, no CUDA overhead):\npip install github-stars-search --extra-index-url https://download.pytorch.org/whl/cpu\n```\n\nAfter installation, the tool will be available as the `ghs` command.\n\n### Option 2: Install from Source\n\n```bash\n# Clone the repository\ngit clone https://github.com/yourusername/github-stars-organizer.git\ncd github-stars-organizer\n\n# Install in development mode\npip install -e .\n\n# For CPU-only PyTorch:\npip install -e . --extra-index-url https://download.pytorch.org/whl/cpu\n```\n\n## Setup\n\n1. Create a GitHub Personal Access Token:\n - Go to https://github.com/settings/tokens\n - Create a new token with `public_repo` scope\n - Copy the token\n\n3. Configure environment:\n```bash\ncp .env.example .env\n# Edit .env and add your GitHub token\n```\n\n## Usage\n\nThe tool provides a unified CLI with four main commands:\n\n### Fetch - Initial Indexing\n\nFetch and index all your starred repositories:\n\n```bash\nghs fetch\n```\n\nThis will:\n1. Check your GitHub API rate limit status\n2. Fetch all your starred repositories from GitHub\n3. Download and parse their READMEs in parallel (5 concurrent workers)\n4. Generate embeddings using the all-MiniLM-L6-v2 model (384-dimensional)\n5. Store everything in a local SQLite database with vector search capabilities\n6. Skip repositories that are already stored\n\n**Rate Limiting:** The tool automatically monitors GitHub API rate limits and will pause with a clear message if limits are reached, then resume when they reset.\n\n### Search - Semantic Search\n\nSearch your stars using natural language queries:\n\n```bash\nghs search \"your search query\"\n```\n\nExamples:\n```bash\nghs search \"machine learning frameworks\"\nghs search \"web scraping tools\"\nghs search \"rust web server\"\nghs search \"react component libraries\" --limit 5\n```\n\nOptions:\n- `-l, --limit N`: Number of results to return (default: 10)\n\n### Refresh - Sync Changes\n\nSynchronize your database with your current GitHub stars (adds new stars, removes unstarred repositories):\n\n```bash\nghs refresh\n```\n\nThis command:\n1. Fetches your current starred repositories\n2. Compares with the local database\n3. Adds newly starred repositories\n4. Removes repositories you've unstarred\n5. Shows a summary of changes\n\n### Stats - Database Statistics\n\nShow database statistics:\n\n```bash\nghs stats\n```\n\nDisplays:\n- Total repositories indexed\n- Number of repositories with embeddings\n- Number of repositories with README files\n- README coverage percentage\n\n## Command Quick Reference\n\n```bash\nghs fetch # Initial fetch and index\nghs search \"query\" # Search repositories\nghs search \"query\" --limit 5 # Limit results\nghs refresh # Sync added/removed stars\nghs stats # Show statistics\n```\n\n## How It Works\n\n1. **GitHub API**: Uses PyGithub to fetch your starred repositories with intelligent rate limit handling\n2. **Parallel README Fetching**: Downloads READMEs using 5 concurrent workers with shared rate limit detection\n3. **README Extraction**: Uses GitHub's dedicated README API endpoint for efficient fetching\n4. **Embeddings**: Uses sentence-transformers (all-MiniLM-L6-v2) to generate 384-dim vectors\n5. **Vector Search**: Stores embeddings in sqlite-vec for fast similarity search using cosine distance\n6. **Smart Sync**: Refresh command intelligently adds/removes repositories based on current stars\n7. **Rate Limit Protection**: Automatically detects rate limits, displays clear wait times, and resumes when ready\n\n## Database Schema\n\nThe tool creates a `stars.db` SQLite database with:\n\n**repositories table:**\n- Repository metadata (id, name, description, URL, stars, language)\n- README content and type\n- Timestamps\n\n**vec_repositories table (virtual):**\n- Vector embeddings for semantic search\n- Linked to repositories via repo_id\n",
"bugtrack_url": null,
"license": null,
"summary": "A command-line tool to semantically search your starred GitHub repositories",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/webpolis/ghs",
"Issues": "https://github.com/webpolis/ghs/issues",
"Repository": "https://github.com/webpolis/ghs"
},
"split_keywords": [
"github",
" stars",
" search",
" semantic",
" embeddings",
" vector-search"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "73b008c7d004f7bea5a4b7bdf8fe639e82e1876f9a62a0f98f7a7d635464c773",
"md5": "f9fc246cbc850d991c1e819c241df519",
"sha256": "c61bc3e31941991eeb9e9910b68a8726c9eef6b4e12e51b098d3aa5bf889dc8b"
},
"downloads": -1,
"filename": "github_stars_search-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f9fc246cbc850d991c1e819c241df519",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 13548,
"upload_time": "2025-10-25T16:19:33",
"upload_time_iso_8601": "2025-10-25T16:19:33.401114Z",
"url": "https://files.pythonhosted.org/packages/73/b0/08c7d004f7bea5a4b7bdf8fe639e82e1876f9a62a0f98f7a7d635464c773/github_stars_search-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8af3493169b33a8853738d826a15b185f60b813faf5f67930295e3a43bce1044",
"md5": "e6c50eeb51271939ac2c8259394f175f",
"sha256": "17b70de0bc1846174c3da1900d3a28fb3fac70f79986aa4d3ed815e25eda0a7a"
},
"downloads": -1,
"filename": "github_stars_search-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "e6c50eeb51271939ac2c8259394f175f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 12406,
"upload_time": "2025-10-25T16:19:34",
"upload_time_iso_8601": "2025-10-25T16:19:34.798815Z",
"url": "https://files.pythonhosted.org/packages/8a/f3/493169b33a8853738d826a15b185f60b813faf5f67930295e3a43bce1044/github_stars_search-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-25 16:19:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "github-stars-organizer",
"github_not_found": true,
"lcname": "github-stars-search"
}