# tap-github
`tap-github` is a Singer tap for GitHub.
Built with the [Singer SDK](https://gitlab.com/meltano/singer-sdk).
## Installation
```bash
# use uv (https://docs.astral.sh/uv/)
uv tool install meltanolabs-tap-github
# or pipx (https://pipx.pypa.io/stable/)
pipx install meltanolabs-tap-github
# or Meltano
meltano add extractor tap-github
```
A list of release versions is available at https://github.com/MeltanoLabs/tap-github/releases
## Configuration
### Accepted Config Options
This tap accepts the following configuration options:
- Required: One and only one of the following modes:
1. `repositories`: An array of strings specifying the GitHub repositories to be included. Each element of the array should be of the form `<org>/<repository>`, e.g. `MeltanoLabs/tap-github`.
2. `organizations`: An array of strings containing the github organizations to be included
3. `searches`: An array of search descriptor objects with the following properties:
- `name`: A human readable name for the search query
- `query`: A github search string (generally the same as would come after `?q=` in the URL)
4. `user_usernames`: A list of github usernames
5. `user_ids`: A list of github user ids [int]
- Highly recommended:
- Personal access tokens (PATs) for authentication can be provided in 3 ways:
- `auth_token` - Takes a single token.
- `additional_auth_tokens` - Takes a list of tokens. Can be used together with `auth_token` or as the sole source of PATs.
- Any environment variables beginning with `GITHUB_TOKEN` will be assumed to be PATs. These tokens will be used in addition to `auth_token` (if provided), but will not be used if `additional_auth_tokens` is provided.
- GitHub App keys are another option for authentication, and can be used in combination with PATs if desired. App IDs and keys should be assembled into the format `:app_id:;;-----BEGIN RSA PRIVATE KEY-----\n_YOUR_P_KEY_\n-----END RSA PRIVATE KEY-----` where the key can be generated from the `Private keys` section on https://github.com/organizations/:organization_name/settings/apps/:app_name. Read more about GitHub App quotas [here](https://docs.github.com/en/enterprise-server@3.3/developers/apps/building-github-apps/rate-limits-for-github-apps#server-to-server-requests). Formatted app keys can be provided in 2 ways:
- `auth_app_keys` - List of GitHub App keys in the prescribed format.
- If `auth_app_keys` is not provided but there is an environment variable with the name `GITHUB_APP_PRIVATE_KEY`, it will be assumed to be an App key in the prescribed format.
- Optional:
- `user_agent`
- `start_date`
- `metrics_log_level`
- `stream_maps`
- `stream_maps_config`
- `stream_options`: Options which can change the behaviour of a specific stream are nested within.
- `milestones`: Valid options for the `milestones` stream are nested within.
- `state`: Determines which milestones will be extracted. One of `open` (default), `closed`, `all`.
- `rate_limit_buffer`: A buffer to avoid consuming all query points for the auth_token at hand. Defaults to 1000.
- `expiry_time_buffer`: A buffer used when determining when to refresh GitHub app tokens. Only relevant when authenticating as a GitHub app. Defaults to 10 minutes. Tokens generated by GitHub apps expire 1 hour after creation, and will be refreshed once fewer than `expiry_time_buffer` minutes remain until the anticipated expiry time.
Note that modes 1-3 are `repository` modes and 4-5 are `user` modes and will not run the same set of streams.
A full list of supported settings and capabilities for this tap is available by running:
```bash
tap-github --about
```
### Source Authentication and Authorization
A small number of records may be pulled without an auth token. However, a Github auth token should generally be considered "required" since it gives more realistic rate limits. (See GitHub API docs for more info.)
## Usage
### API Limitation - Pagination
The GitHub API is limited for some resources such as `/events`. For some resources, users might encounter the following error:
```
In order to keep the API fast for everyone, pagination is limited for this resource. Check the rel=last link relation in the Link response header to see how far back you can traverse.
```
To avoid this, the GitHub streams will exit early. I.e. when there are no more `next page` available. If you are fecthing `/events` at the repository level, beware of letting the tap disabled for longer than a few days or you will have gaps in your data.
You can easily run `tap-github` by itself or in a pipeline using [Meltano](www.meltano.com).
### Notes regarding permissions
* For the `traffic_*` streams, [you will need write access to the repository](https://docs.github.com/en/rest/metrics/traffic?apiVersion=2022-11-28). You can enable extraction for these streams by [selecting them in the catalog](https://hub.meltano.com/singer/spec/#metadata).
### Executing the Tap Directly
```bash
tap-github --version
tap-github --help
tap-github --config CONFIG --discover > ./catalog.json
```
## Contributing
This project uses parent-child streams. Learn more about them [here.](https://gitlab.com/meltano/sdk/-/blob/main/docs/parent_streams.md)
### Initialize your Development Environment
```bash
pipx install poetry
poetry install
```
### Create and Run Tests
Create tests within the `tap_github/tests` subfolder and
then run:
```bash
poetry run pytest
```
You can also test the `tap-github` CLI interface directly using `poetry run`:
```bash
poetry run tap-github --help
```
### Testing with [Meltano](meltano.com)
_**Note:** This tap will work in any Singer environment and does not require Meltano.
Examples here are for convenience and to streamline end-to-end orchestration scenarios._
Your project comes with a custom `meltano.yml` project file already created. Open the `meltano.yml` and follow any _"TODO"_ items listed in
the file.
Next, install Meltano (if you haven't already) and any needed plugins:
```bash
# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-github
meltano install
```
Now you can test and orchestrate using Meltano:
```bash
# Test invocation:
meltano invoke tap-github --version
# OR run a test `elt` pipeline:
meltano elt tap-github target-jsonl
```
One-liner to recreate output directory, run elt, and write out state file:
```bash
# Update this when you want a fresh state file:
TESTJOB=testjob1
# Run everything in one line
mkdir -p .output && meltano elt tap-github target-jsonl --job_id $TESTJOB && meltano elt tap-github target-jsonl --job_id $TESTJOB --dump=state > .output/state.json
```
### Singer SDK Dev Guide
See the [dev guide](../../docs/dev_guide.md) for more instructions on how to use the Singer SDK to
develop your own taps and targets.
Raw data
{
"_id": null,
"home_page": null,
"name": "meltanolabs-tap-github",
"maintainer": "Meltano and Meltano Community",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "hello@meltano.com",
"keywords": "Meltano, Singer, Meltano SDK, Singer SDK, ELT, GitHub",
"author": "Meltano and Meltano Community",
"author_email": "hello@meltano.com",
"download_url": "https://files.pythonhosted.org/packages/fb/d6/ce6b8ea18d0a362c28a7877348010126d7a6084dd124159ba2e8583e832c/meltanolabs_tap_github-1.20.0.tar.gz",
"platform": null,
"description": "# tap-github\n\n`tap-github` is a Singer tap for GitHub.\n\nBuilt with the [Singer SDK](https://gitlab.com/meltano/singer-sdk).\n\n## Installation\n\n```bash\n# use uv (https://docs.astral.sh/uv/)\nuv tool install meltanolabs-tap-github\n\n# or pipx (https://pipx.pypa.io/stable/)\npipx install meltanolabs-tap-github\n\n# or Meltano\nmeltano add extractor tap-github\n```\n\nA list of release versions is available at https://github.com/MeltanoLabs/tap-github/releases\n\n## Configuration\n\n### Accepted Config Options\n\nThis tap accepts the following configuration options:\n\n- Required: One and only one of the following modes:\n 1. `repositories`: An array of strings specifying the GitHub repositories to be included. Each element of the array should be of the form `<org>/<repository>`, e.g. `MeltanoLabs/tap-github`.\n 2. `organizations`: An array of strings containing the github organizations to be included\n 3. `searches`: An array of search descriptor objects with the following properties:\n - `name`: A human readable name for the search query\n - `query`: A github search string (generally the same as would come after `?q=` in the URL)\n 4. `user_usernames`: A list of github usernames\n 5. `user_ids`: A list of github user ids [int]\n- Highly recommended:\n - Personal access tokens (PATs) for authentication can be provided in 3 ways:\n - `auth_token` - Takes a single token.\n - `additional_auth_tokens` - Takes a list of tokens. Can be used together with `auth_token` or as the sole source of PATs.\n - Any environment variables beginning with `GITHUB_TOKEN` will be assumed to be PATs. These tokens will be used in addition to `auth_token` (if provided), but will not be used if `additional_auth_tokens` is provided.\n - GitHub App keys are another option for authentication, and can be used in combination with PATs if desired. App IDs and keys should be assembled into the format `:app_id:;;-----BEGIN RSA PRIVATE KEY-----\\n_YOUR_P_KEY_\\n-----END RSA PRIVATE KEY-----` where the key can be generated from the `Private keys` section on https://github.com/organizations/:organization_name/settings/apps/:app_name. Read more about GitHub App quotas [here](https://docs.github.com/en/enterprise-server@3.3/developers/apps/building-github-apps/rate-limits-for-github-apps#server-to-server-requests). Formatted app keys can be provided in 2 ways:\n - `auth_app_keys` - List of GitHub App keys in the prescribed format.\n - If `auth_app_keys` is not provided but there is an environment variable with the name `GITHUB_APP_PRIVATE_KEY`, it will be assumed to be an App key in the prescribed format.\n- Optional:\n - `user_agent`\n - `start_date`\n - `metrics_log_level`\n - `stream_maps`\n - `stream_maps_config`\n - `stream_options`: Options which can change the behaviour of a specific stream are nested within.\n - `milestones`: Valid options for the `milestones` stream are nested within.\n - `state`: Determines which milestones will be extracted. One of `open` (default), `closed`, `all`.\n - `rate_limit_buffer`: A buffer to avoid consuming all query points for the auth_token at hand. Defaults to 1000.\n - `expiry_time_buffer`: A buffer used when determining when to refresh GitHub app tokens. Only relevant when authenticating as a GitHub app. Defaults to 10 minutes. Tokens generated by GitHub apps expire 1 hour after creation, and will be refreshed once fewer than `expiry_time_buffer` minutes remain until the anticipated expiry time.\n\nNote that modes 1-3 are `repository` modes and 4-5 are `user` modes and will not run the same set of streams.\n\nA full list of supported settings and capabilities for this tap is available by running:\n\n```bash\ntap-github --about\n```\n\n### Source Authentication and Authorization\n\nA small number of records may be pulled without an auth token. However, a Github auth token should generally be considered \"required\" since it gives more realistic rate limits. (See GitHub API docs for more info.)\n\n## Usage\n\n### API Limitation - Pagination\n\nThe GitHub API is limited for some resources such as `/events`. For some resources, users might encounter the following error:\n\n```\nIn order to keep the API fast for everyone, pagination is limited for this resource. Check the rel=last link relation in the Link response header to see how far back you can traverse.\n```\n\nTo avoid this, the GitHub streams will exit early. I.e. when there are no more `next page` available. If you are fecthing `/events` at the repository level, beware of letting the tap disabled for longer than a few days or you will have gaps in your data.\n\nYou can easily run `tap-github` by itself or in a pipeline using [Meltano](www.meltano.com).\n\n### Notes regarding permissions\n\n* For the `traffic_*` streams, [you will need write access to the repository](https://docs.github.com/en/rest/metrics/traffic?apiVersion=2022-11-28). You can enable extraction for these streams by [selecting them in the catalog](https://hub.meltano.com/singer/spec/#metadata).\n\n### Executing the Tap Directly\n\n```bash\ntap-github --version\ntap-github --help\ntap-github --config CONFIG --discover > ./catalog.json\n```\n\n## Contributing\nThis project uses parent-child streams. Learn more about them [here.](https://gitlab.com/meltano/sdk/-/blob/main/docs/parent_streams.md)\n\n### Initialize your Development Environment\n\n```bash\npipx install poetry\npoetry install\n```\n\n### Create and Run Tests\n\nCreate tests within the `tap_github/tests` subfolder and\nthen run:\n\n```bash\npoetry run pytest\n```\n\nYou can also test the `tap-github` CLI interface directly using `poetry run`:\n\n```bash\npoetry run tap-github --help\n```\n\n### Testing with [Meltano](meltano.com)\n\n_**Note:** This tap will work in any Singer environment and does not require Meltano.\nExamples here are for convenience and to streamline end-to-end orchestration scenarios._\n\nYour project comes with a custom `meltano.yml` project file already created. Open the `meltano.yml` and follow any _\"TODO\"_ items listed in\nthe file.\n\nNext, install Meltano (if you haven't already) and any needed plugins:\n\n```bash\n# Install meltano\npipx install meltano\n# Initialize meltano within this directory\ncd tap-github\nmeltano install\n```\n\nNow you can test and orchestrate using Meltano:\n\n```bash\n# Test invocation:\nmeltano invoke tap-github --version\n# OR run a test `elt` pipeline:\nmeltano elt tap-github target-jsonl\n```\n\nOne-liner to recreate output directory, run elt, and write out state file:\n\n```bash\n# Update this when you want a fresh state file:\nTESTJOB=testjob1\n\n# Run everything in one line\nmkdir -p .output && meltano elt tap-github target-jsonl --job_id $TESTJOB && meltano elt tap-github target-jsonl --job_id $TESTJOB --dump=state > .output/state.json\n```\n\n### Singer SDK Dev Guide\n\nSee the [dev guide](../../docs/dev_guide.md) for more instructions on how to use the Singer SDK to\ndevelop your own taps and targets.\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Singer tap for GitHub, built with the Singer SDK.",
"version": "1.20.0",
"project_urls": {
"Homepage": "https://github.com/MeltanoLabs/tap-github",
"Issue Tracker": "https://github.com/MeltanoLabs/tap-github/issues",
"Repository": "https://github.com/MeltanoLabs/tap-github"
},
"split_keywords": [
"meltano",
" singer",
" meltano sdk",
" singer sdk",
" elt",
" github"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c6508f096b5ac3b1d6b35b105c07be7fa16dc5c0d04ef2005b7d604a3f4392fb",
"md5": "4a3c9d3989152204b43ae0b20b9332b9",
"sha256": "0505d88da282ec8edf5993adbef45c6b6cf78d6b65ae58d895fc3a2f3807f683"
},
"downloads": -1,
"filename": "meltanolabs_tap_github-1.20.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4a3c9d3989152204b43ae0b20b9332b9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 60093,
"upload_time": "2025-07-08T17:38:34",
"upload_time_iso_8601": "2025-07-08T17:38:34.643125Z",
"url": "https://files.pythonhosted.org/packages/c6/50/8f096b5ac3b1d6b35b105c07be7fa16dc5c0d04ef2005b7d604a3f4392fb/meltanolabs_tap_github-1.20.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "fbd6ce6b8ea18d0a362c28a7877348010126d7a6084dd124159ba2e8583e832c",
"md5": "748ea38479c2826eb777da3d55162b70",
"sha256": "97840a0db316d3d43293511a14a194ad597c84af458f2ad9a42ced9e17eaa120"
},
"downloads": -1,
"filename": "meltanolabs_tap_github-1.20.0.tar.gz",
"has_sig": false,
"md5_digest": "748ea38479c2826eb777da3d55162b70",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 55165,
"upload_time": "2025-07-08T17:38:36",
"upload_time_iso_8601": "2025-07-08T17:38:36.033284Z",
"url": "https://files.pythonhosted.org/packages/fb/d6/ce6b8ea18d0a362c28a7877348010126d7a6084dd124159ba2e8583e832c/meltanolabs_tap_github-1.20.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-08 17:38:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MeltanoLabs",
"github_project": "tap-github",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "meltanolabs-tap-github"
}