# github-dependents-to-sqlite
[](https://pypi.org/project/github-dependents-to-sqlite/)
[](https://github.com/caomingpei/github-dependents-to-sqlite/releases)
[](https://github.com/caomingpei/github-dependents-to-sqlite/blob/main/LICENSE)
Save GitHub dependents data to a SQLite database by scraping the GitHub dependency graph.
## Features
This tool scrapes the GitHub dependency graph to find repositories that depend on a specific repository and saves this data to a SQLite database.
## Installation
Requires Python 3.8 or higher.
```bash
$ pip install github-dependents-to-sqlite
```
## Authentication
Create a GitHub personal access token: https://github.com/settings/tokens
Run this command to setup authentication:
```bash
$ github-dependents-to-sqlite auth
```
Or for local development:
```bash
$ python -m src.cli auth
```
This will create a file called `auth.json` in your current directory containing the required value. To save the file at a different path or filename, use the `-a/--auth=myauth.json` option.
As an alternative to using an `auth.json` file you can add your access token to an environment variable called `GITHUB_TOKEN`.
## Basic Usage
The GitHub dependency graph can show other GitHub projects that depend on a specific repo, for example [anchor-lang](https://github.com/coral-xyz/anchor).
This data is not yet available through the GitHub API. This tool scrapes those pages and uses the GitHub API to load full versions of the dependent repositories.
### Commands
```bash
# Setup authentication (first time)
$ github-dependents-to-sqlite auth
# Scrape dependents
$ github-dependents-to-sqlite scrape github.db owner/repo
# Multiple repositories
$ github-dependents-to-sqlite scrape github.db owner/repo1 owner/repo2
```
### Local Development (without install)
```bash
# Setup auth
$ python -m src.cli auth
# Scrape dependents
$ python -m src.cli scrape github.db owner/repo -v
```
### Package Selection
Many repositories have multiple packages. The tool will automatically detect them and offer choices:
**Interactive Mode** (default):
```bash
$ github-dependents-to-sqlite scrape github.db solana-foundation/anchor
```
You'll see a menu like:
```
📦 Processing repository: solana-foundation/anchor
Found 67 package(s)
Available packages:
1. @andresmgsl2/spl-associated-token-account
2. @betdex/anchor
3. @coral-xyz/anchor
...
68. All packages (scrape each one)
69. Skip package selection (may find fewer dependents)
Select a package [68]: 3
Selected: @coral-xyz/anchor
Total dependents: 23,089
Scraping dependents: 100%|████████████| 23089/23089 [15:23<00:00, 24.98repo/s]
✅ Found 23,089 new dependent(s)
🎉 Done!
```
**Command-line Mode** (use `-p` to specify package):
```bash
# By package name
$ github-dependents-to-sqlite scrape github.db solana-foundation/anchor -p "anchor-lang"
# By package ID
$ github-dependents-to-sqlite scrape github.db solana-foundation/anchor -p "UGFja2FnZS0zNDg2OTY2MDg4"
```
### Options
- `-p, --package TEXT`: Specify package name or ID (skips interactive selection)
- `-v, --verbose`: Verbose output with detailed progress information
- `-a, --auth PATH`: Path to auth.json file (default: auth.json)
### Database Schema
The tool creates the following tables:
- `repos`: Repository information for both the target repo and its dependents
- `users`: User/organization information for repository owners
- `licenses`: License information for repositories
- `dependents`: Junction table linking repositories to their dependents
The tool also creates:
- Full-text search indices on relevant columns
- Foreign key relationships between tables
- A `dependent_repos` view for easy querying
### Example Query
After scraping, you can query the database to find all dependents:
```sql
SELECT * FROM dependent_repos ORDER BY dependent_stars DESC;
```
## Development
To contribute to this project:
1. Clone the repository
2. Install development dependencies: `pip install -e ".[test]"`
3. Run tests: `pytest`
## Acknowledgments
This project is based on [github-to-sqlite](https://github.com/dogsheep/github-to-sqlite) by Simon Willison. The original project focused on saving GitHub API data to SQLite. This fork extends that concept to specifically handle package dependency graph scraping, allowing you to discover which repositories depend on specific packages.
## License
Apache License 2.0
Raw data
{
"_id": null,
"home_page": null,
"name": "github-dependents-to-sqlite",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "github, sqlite, dependents, scraping, dependency-graph",
"author": "Mingpei Cao",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/c2/51/ca7303a47ec9d75b1461c1a7653bc203a71fee4164d14248010017c5e293/github_dependents_to_sqlite-0.1.0.tar.gz",
"platform": null,
"description": "# github-dependents-to-sqlite\n\n[](https://pypi.org/project/github-dependents-to-sqlite/)\n[](https://github.com/caomingpei/github-dependents-to-sqlite/releases)\n[](https://github.com/caomingpei/github-dependents-to-sqlite/blob/main/LICENSE)\n\nSave GitHub dependents data to a SQLite database by scraping the GitHub dependency graph.\n\n## Features\n\nThis tool scrapes the GitHub dependency graph to find repositories that depend on a specific repository and saves this data to a SQLite database.\n\n## Installation\n\nRequires Python 3.8 or higher.\n\n```bash\n$ pip install github-dependents-to-sqlite\n```\n\n## Authentication\n\nCreate a GitHub personal access token: https://github.com/settings/tokens\n\nRun this command to setup authentication:\n\n```bash\n$ github-dependents-to-sqlite auth\n```\n\nOr for local development:\n\n```bash\n$ python -m src.cli auth\n```\n\nThis will create a file called `auth.json` in your current directory containing the required value. To save the file at a different path or filename, use the `-a/--auth=myauth.json` option.\n\nAs an alternative to using an `auth.json` file you can add your access token to an environment variable called `GITHUB_TOKEN`.\n\n## Basic Usage\n\nThe GitHub dependency graph can show other GitHub projects that depend on a specific repo, for example [anchor-lang](https://github.com/coral-xyz/anchor).\n\nThis data is not yet available through the GitHub API. This tool scrapes those pages and uses the GitHub API to load full versions of the dependent repositories.\n\n### Commands\n\n```bash\n# Setup authentication (first time)\n$ github-dependents-to-sqlite auth\n\n# Scrape dependents\n$ github-dependents-to-sqlite scrape github.db owner/repo\n\n# Multiple repositories\n$ github-dependents-to-sqlite scrape github.db owner/repo1 owner/repo2\n```\n\n### Local Development (without install)\n\n```bash\n# Setup auth\n$ python -m src.cli auth\n\n# Scrape dependents\n$ python -m src.cli scrape github.db owner/repo -v\n```\n\n### Package Selection\n\nMany repositories have multiple packages. The tool will automatically detect them and offer choices:\n\n**Interactive Mode** (default):\n\n```bash\n$ github-dependents-to-sqlite scrape github.db solana-foundation/anchor\n```\n\nYou'll see a menu like:\n\n```\n\ud83d\udce6 Processing repository: solana-foundation/anchor\nFound 67 package(s)\n\nAvailable packages:\n 1. @andresmgsl2/spl-associated-token-account\n 2. @betdex/anchor\n 3. @coral-xyz/anchor\n ...\n 68. All packages (scrape each one)\n 69. Skip package selection (may find fewer dependents)\n\nSelect a package [68]: 3\nSelected: @coral-xyz/anchor\nTotal dependents: 23,089\nScraping dependents: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 23089/23089 [15:23<00:00, 24.98repo/s]\n\u2705 Found 23,089 new dependent(s)\n\ud83c\udf89 Done!\n```\n\n**Command-line Mode** (use `-p` to specify package):\n\n```bash\n# By package name\n$ github-dependents-to-sqlite scrape github.db solana-foundation/anchor -p \"anchor-lang\"\n\n# By package ID\n$ github-dependents-to-sqlite scrape github.db solana-foundation/anchor -p \"UGFja2FnZS0zNDg2OTY2MDg4\"\n```\n\n### Options\n\n- `-p, --package TEXT`: Specify package name or ID (skips interactive selection)\n- `-v, --verbose`: Verbose output with detailed progress information\n- `-a, --auth PATH`: Path to auth.json file (default: auth.json)\n\n### Database Schema\n\nThe tool creates the following tables:\n\n- `repos`: Repository information for both the target repo and its dependents\n- `users`: User/organization information for repository owners\n- `licenses`: License information for repositories\n- `dependents`: Junction table linking repositories to their dependents\n\nThe tool also creates:\n\n- Full-text search indices on relevant columns\n- Foreign key relationships between tables\n- A `dependent_repos` view for easy querying\n\n### Example Query\n\nAfter scraping, you can query the database to find all dependents:\n\n```sql\nSELECT * FROM dependent_repos ORDER BY dependent_stars DESC;\n```\n\n## Development\n\nTo contribute to this project:\n\n1. Clone the repository\n2. Install development dependencies: `pip install -e \".[test]\"`\n3. Run tests: `pytest`\n\n## Acknowledgments\n\nThis project is based on [github-to-sqlite](https://github.com/dogsheep/github-to-sqlite) by Simon Willison. The original project focused on saving GitHub API data to SQLite. This fork extends that concept to specifically handle package dependency graph scraping, allowing you to discover which repositories depend on specific packages.\n\n## License\n\nApache License 2.0\n",
"bugtrack_url": null,
"license": null,
"summary": "Save GitHub package dependents data to a SQLite database by scraping the dependency graph with support for specific package selection",
"version": "0.1.0",
"project_urls": {
"Changelog": "https://github.com/caomingpei/github-dependents-to-sqlite/releases",
"Homepage": "https://github.com/caomingpei/github-dependents-to-sqlite",
"Issues": "https://github.com/caomingpei/github-dependents-to-sqlite/issues",
"Repository": "https://github.com/caomingpei/github-dependents-to-sqlite"
},
"split_keywords": [
"github",
" sqlite",
" dependents",
" scraping",
" dependency-graph"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b788c76afeb3d4b6b6ff757918435fc30e8d02002dae83331e544d314a359d22",
"md5": "8215bb0dc1699963b132aa1111ddd6ef",
"sha256": "805ccbe8dcb668ec4d8c596478bd1e8709f6f04844a93b183aaaf2f1bd07815a"
},
"downloads": -1,
"filename": "github_dependents_to_sqlite-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8215bb0dc1699963b132aa1111ddd6ef",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 15322,
"upload_time": "2025-11-12T06:11:55",
"upload_time_iso_8601": "2025-11-12T06:11:55.316739Z",
"url": "https://files.pythonhosted.org/packages/b7/88/c76afeb3d4b6b6ff757918435fc30e8d02002dae83331e544d314a359d22/github_dependents_to_sqlite-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c251ca7303a47ec9d75b1461c1a7653bc203a71fee4164d14248010017c5e293",
"md5": "735e3c4c6aa501df497c4e14a23a91f1",
"sha256": "75bbb3983c07a27978d8fed5743b83a55722a09cf54a08fe5551214e18d682fc"
},
"downloads": -1,
"filename": "github_dependents_to_sqlite-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "735e3c4c6aa501df497c4e14a23a91f1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 15191,
"upload_time": "2025-11-12T06:11:56",
"upload_time_iso_8601": "2025-11-12T06:11:56.999588Z",
"url": "https://files.pythonhosted.org/packages/c2/51/ca7303a47ec9d75b1461c1a7653bc203a71fee4164d14248010017c5e293/github_dependents_to_sqlite-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-12 06:11:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "caomingpei",
"github_project": "github-dependents-to-sqlite",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "github-dependents-to-sqlite"
}