# IGA<img alt="IGA logo" width="12%" align="right" src="https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/cloud-upload.png">
IGA is the _InvenioRDM GitHub Archiver_, a standalone program as well as a [GitHub Actions](https://github.com/marketplace/actions/inveniordm-github-archiver) workflow that lets you automatically archive GitHub software releases in an [InvenioRDM](https://inveniosoftware.org/products/rdm/) repository.
[![Latest release](https://img.shields.io/github/v/release/caltechlibrary/iga.svg?style=flat-square&color=b44e88&label=Latest%20release)](https://github.com/caltechlibrary/iga/releases) [![License](https://img.shields.io/badge/License-BSD--like-lightgrey.svg?style=flat-square)](https://github.com/caltechlibrary/iga/blob/develop/LICENSE) [![Python](https://img.shields.io/badge/Python-3.9+-brightgreen.svg?style=flat-square)](https://www.python.org/downloads/release/python-390/) [![PyPI](https://img.shields.io/pypi/v/iga.svg?style=flat-square&color=orange&label=PyPI)](https://pypi.org/project/iga/) [![DOI](https://img.shields.io/badge/dynamic/json.svg?label=DOI&style=flat-square&colorA=gray&colorB=navy&query=$.pids.doi.identifier&uri=https://data.caltech.edu/api/records/zsmem-2pg20/versions/latest)](https://data.caltech.edu/records/zsmem-2pg20/latest)
## Table of contents
* [Introduction](#introduction)
* [Installation](#installation)
* [IGA as a standalone program](#iga-as-a-standalone-program)
* [IGA as a GitHub Actions workflow](#iga-as-a-github-actions-workflow)
* [Quick start](#quick-start)
* [Usage](#usage)
* [Identifying the InvenioRDM server](#identifying-the-inveniordm-server)
* [Providing an InvenioRDM access token](#providing-an-inveniordm-access-token)
* [Providing a GitHub access token](#providing-a-github-access-token)
* [Specifying a GitHub release](#specifying-a-github-release)
* [Gathering metadata for an InvenioRDM record](#gathering-metadata-for-an-inveniordm-record)
* [Specifying GitHub file assets](#specifying-github-file-assets)
* [Handling communities](#handling-communities)
* [Indicating draft versus published records](#indicating-draft-versus-published-records)
* [Versioning records](#versioning-records)
* [Other options recognized by IGA](#other-options-recognized-by-iga)
* [Summary of command-line options](#summary-of-command-line-options)
* [Return values](#return-values)
* [Adding a DOI badge to your GitHub repository](#adding-a-doi-badge-to-your-github-repository)
* [Known issues and limitations](#known-issues-and-limitations)
* [Getting help](#getting-help)
* [Contributing](#contributing)
* [License](#license)
* [Acknowledgments](#acknowledgments)
## Introduction
[InvenioRDM](https://inveniosoftware.org/products/rdm/) is the basis for many institutional repositories such as [CaltechDATA](https://data.caltech.edu) that enable users to preserve software and data sets in long-term archive. Though such repositories are critical resources, creating detailed records and uploading assets can be a tedious and error-prone process if done manually. This is where the [_InvenioRDM GitHub Archiver_](https://github.com/caltechlibrary/iga) (IGA) comes in.
IGA creates metadata records and sends releases from GitHub to an InvenioRDM-based repository server. IGA can be invoked from the command line; it also can be set up as a [GitHub Actions](https://docs.github.com/en/actions) workflow to archive GitHub releases automatically for a repository each time they are made.
<p align=center>
<img align="middle" src="https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/example-github-release.jpg" alt="Screenshot of a software release in GitHub" width="40%">
<span style="font-size: 150%">➜</span>
<img align="middle" src="https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/example-record-landing-page.jpg" alt="Screenshot of the corresponding entry in InvenioRDM" width="40%">
</p>
IGA offers many notable features:
* Automatic metadata extraction from GitHub plus [`codemeta.json`](https://codemeta.github.io) and [`CITATION.cff`](https://citation-file-format.github.io) files
* Thorough coverage of [InvenioRDM record metadata](https://inveniordm.docs.cern.ch/reference/metadata) using painstaking procedures
* Recognition of identifiers in CodeMeta & CFF files: [ORCID](https://orcid.org), [DOI](https://www.doi.org), [PMCID](https://www.ncbi.nlm.nih.gov/pmc/about/public-access-info/), and more
* Automatic lookup of publication data in [DOI.org](https://www.doi.org), [PubMed](https://www.ncbi.nlm.nih.gov/pmc/about/public-access-info/), Google, and other sources
* Automatic lookup of organization names in [ROR](https://ror.org) (assuming ROR id's are provided)
* Automatic lookup of human names in [ORCID.org](https://orcid.org) (assuming ORCID id's are provided)
* Automatic splitting of human names into family & given names using [ML](https://en.wikipedia.org/wiki/Machine_learning) methods
* Support for InvenioRDM [communities](https://invenio-communities.readthedocs.io/en/latest/)
* Support for overriding the record that IGA creates, for complete control if you need it
* Support for using the GitHub API without a [GitHub access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) in simple cases
* Extensive use of logging so you can see what's going on under the hood
The IGA GitHub action workflow automatically will add the archived DOI to a CodeMeta file and create/update a CFF file using the [CodeMeta2CFF workflow](https://github.com/caltechlibrary/codemeta2cff).
## Installation
IGA can be installed as either (or both) a command-line program on your computer or a [GitHub Action](https://docs.github.com/en/actions) in a GitHub repository.
### IGA as a standalone program
Please choose an approach that suits your situation and preferences.
<details><summary><h4><i>Alternative 1: using <code>pipx</code></i></h4></summary>
[Pipx](https://github.com/pypa/pipx) lets you install Python programs in a way that isolates Python dependencies from other Python programs on your system, and yet the resulting `iga` command can be run from any shell and directory – like any normal program on your computer. If you use `pipx` on your system, you can install IGA with the following command:
```sh
pipx install iga
```
After installation, a program named `iga` should end up in a location where other command-line programs are installed on your computer. Test it by running the following command in a shell:
```shell
iga --help
```
</details>
<details><summary><h4><i>Alternative 2: using <code>pip</code></i></h4></summary>
IGA is available from the [Python package repository PyPI](https://pypi.org) and can be installed using [`pip`](https://pip.pypa.io/en/stable/installing/):
```sh
python3 -m pip install iga
```
As an alternative to getting it from [PyPI](https://pypi.org), you can install `iga` directly from GitHub:
```sh
python3 -m pip install git+https://github.com/caltechlibrary/iga.git
```
_If you already installed IGA once before_, and want to update to the latest version, add `--upgrade` to the end of either command line above.
After installation, a program named `iga` should end up in a location where other command-line programs are installed on your computer. Test it by running the following command in a shell:
```shell
iga --help
```
</details>
<details><summary><h4><i>Alternative 3: from sources</i></h4></summary>
If you prefer to install IGA directly from the source code, first obtain a copy by either downloading the source archive from the [IGA releases page on GitHub](https://github.com/caltechlibrary/iga/releases), or by using `git` to clone the repository to a location on your computer. For example,
```sh
git clone https://github.com/caltechlibrary/iga
```
Next, after getting a copy of the files, run `setup.py` inside the code directory:
```sh
cd iga
python3 setup.py install
```
</details>
Once you have installed `iga`, the next steps are (1) [get an InvenioRDM token](#getting-an-inveniordm-token) and (2) [configure `iga` for running locally](#configuring-and-running-iga-locally).
### IGA as a GitHub Actions workflow
A [GitHub Actions](https://docs.github.com/en/actions) workflow is an automated process that runs on GitHub's servers under control of a file in your repository. Follow these steps to create the IGA workflow file:
1. In the main branch of your GitHub repository, create a `.github/workflows` directory
2. In the `.github/workflows` directory, create a file named (e.g.) `iga.yml` and copy the [following contents](https://raw.githubusercontent.com/caltechlibrary/iga/v1/sample-workflow.yml) into it:
```yaml
# GitHub Actions workflow for InvenioRDM GitHub Archiver version 1.3.4
# This is available as the file "sample-workflow.yml" from the open-
# source repository for IGA at https://github.com/caltechlibrary/iga/.
# ╭────────────────────────────────────────────╮
# │ Configure this section │
# ╰────────────────────────────────────────────╯
env:
# 👋🏻 Set the next variable to your InvenioRDM server address 👋🏻
INVENIO_SERVER: https://your-invenio-server.org
# Set to an InvenioRDM record ID to mark release as a new version.
parent_record: none
# The variables below are other IGA options. Please see the docs.
community: none
draft: false
all_assets: false
all_metadata: false
debug: false
# This variable is a setting for post-archiving CodeMeta file updates.
# If you don't have a CodeMeta file, you can remove the add_doi_codemeta
# and Codemeta2CFF jobs at the bottom of this file.
ref: main
# ╭────────────────────────────────────────────╮
# │ The rest of this file should be left as-is │
# ╰────────────────────────────────────────────╯
name: InvenioRDM GitHub Archiver
on:
release:
types: [published]
workflow_dispatch:
inputs:
release_tag:
description: The release tag (empty = latest)
parent_record:
description: ID of parent record (for versioning)
community:
description: Name of InvenioRDM community (if any)
draft:
description: Mark the record as a draft
type: boolean
all_assets:
description: Attach all GitHub assets
type: boolean
all_metadata:
description: Include additional GitHub metadata
type: boolean
debug:
description: Print debug info in the GitHub log
type: boolean
run-name: Archive ${{inputs.release_tag || 'latest release'}} in InvenioRDM
jobs:
run_iga:
name: Send to ${{needs.get_repository.outputs.server}}
runs-on: ubuntu-latest
needs: get_repository
outputs:
record_doi: ${{steps.iga.outputs.record_doi}}
steps:
- uses: caltechlibrary/iga@v1
id: iga
with:
INVENIO_SERVER: ${{env.INVENIO_SERVER}}
INVENIO_TOKEN: ${{secrets.INVENIO_TOKEN}}
all_assets: ${{github.event.inputs.all_assets || env.all_assets}}
all_metadata: ${{github.event.inputs.all_metadata || env.all_metadata}}
debug: ${{github.event.inputs.debug || env.debug}}
draft: ${{github.event.inputs.draft || env.draft}}
community: ${{github.event.inputs.community || env.community}}
parent_record: ${{github.event.inputs.parent_record || env.parent_record}}
release_tag: ${{github.event.inputs.release_tag || 'latest'}}
get_repository:
name: Get repository name
runs-on: ubuntu-latest
outputs:
server: ${{steps.parse.outputs.host}}
steps:
- name: Extract name from INVENIO_SERVER
id: parse
run: echo "host=$(cut -d'/' -f3 <<< ${{env.INVENIO_SERVER}} | cut -d':' -f1)" >> $GITHUB_OUTPUT
add_doi_codemeta:
name: "Add ${{needs.run_iga.outputs.record_doi}} to codemeta.json"
needs: run_iga
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ env.ref }}
- name: Install sde
run: pip install sde
- name: Add DOI to CodeMeta File
run: sde identifier ${{needs.run_iga.outputs.record_doi}} codemeta.json
- name: Commit CFF
uses: EndBug/add-and-commit@v9
with:
message: 'Add DOI to codemeta.json file'
add: 'codemeta.json'
CodeMeta2CFF:
runs-on: ubuntu-latest
needs: add_doi_codemeta
steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ env.ref }}
- name: Convert CFF
uses: caltechlibrary/codemeta2cff@main
- name: Commit CFF
uses: EndBug/add-and-commit@v9
with:
message: 'Add updated CITATION.cff from codemeta.json file'
add: 'CITATION.cff'
```
3. **Edit the value of the `INVENIO_SERVER` variable (line 7 above)** ↑
4. Optionally, change the [values of other options (`parent_record`, `community`, etc.)](https://caltechlibrary.github.io/iga/gha-usage.html#input-parameters)
5. If you have a [CodeMeta](https://caltechlibrary.github.io/iga/introduction.html#codemeta-citation-cff) file, the GitHub action workflow can automatically add the DOI after IGA has run. The "ref" value is the branch where the CodeMeta file will be updated. If you don't use a CodeMeta file, you can delete the `add_doi_codemeta` part of the workflow.
6. Save the file, commit the changes to git, and push your changes to GitHub
Once you have installed the GitHub Action workflow for IGA, the next steps are (1) [get an InvenioRDM token](#getting-an-inveniordm-token) and (2) [configure the GitHub Action workflow](#configuring-and-running-iga-as-a-github-actions-workflow).
## Quick start
No matter whether IGA is run locally on your computer or as a GitHub Actions workflow, in both cases it must be provided with a personal access token (PAT) for your InvenioRDM server. Getting one is the first step.
### Getting an InvenioRDM token
<img src="https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/get-invenio-pat.png" alt="Screenshot of InvenioRDM PAT menu" width="60%" align="right">
1. Log in to your InvenioRDM account
2. Go to the _Applications_ page in your account profile
3. Click the <kbd>New token</kbd> button next to "Personal access tokens"
4. On the page that is shown after you click that button, name your token (the name does not matter) and click the <kbd>Create</kbd> button
5. After InvenioRDM creates and shows you the token, **copy it to a safe location** because InvenioRDM will not show it again
### Configuring and running IGA locally
To send a GitHub release to your InvenioRDM server, IGA needs this information:
1. (Required) The identity of the GitHub release to be archived
2. (Required) The address of the destination InvenioRDM server
3. (Required) A personal access token for InvenioRDM (from [above](#getting-an-inveniordm-token))
4. (Optional) A [personal access token for GitHub](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token)
The identity of the GitHub release is always given as an argument to IGA on the command line; the remaining values can be provided either via command-line options or environment variables. One approach is to set environment variables in shell scripts or your interactive shell. Here is an example using Bash shell syntax, with fake token values:
```shell
export INVENIO_SERVER=https://data.caltech.edu
export INVENIO_TOKEN=qKLoOH0KYf4D98PGYQGnC09hiuqw3Y1SZllYnonRVzGJbWz2
export GITHUB_TOKEN=ghp_wQXp6sy3AsKyyEo4l9esHNxOdo6T34Zsthz
```
Once these are set, use of IGA can be as simple as providing a URL for a release in GitHub. For example, the following command creates a draft record (the `-d` option is short for `--draft`) for another project in GitHub and tells IGA to open (the `-o` option is short for `--open`) the newly-created InvenioRDM entry in a web browser:
```shell
iga -d -o https://github.com/mhucka/taupe/releases/tag/v1.2.0
```
More options are described in the section on [detailed usage information](#usage) below.
### Configuring and running IGA as a GitHub Actions workflow
After doing the [GitHub Actions installation](#configuring-and-running-iga-as-a-github-actions-workflow) steps and [obtaining an InvenioRDM token](#getting-an-inveniordm-token), one more step is needed: the token must be stored as a "secret" in your GitHub repository.
1. Go to the _Settings_ page of your GitHub repository<p align="center"><img src="https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/github-tabs.png" alt="Screenshot of repository tabs in GitHub" width="85%"></p>
2. In the left-hand sidebar, find _Secrets and variables_ in the Security section, click on it to reveal _Actions_ underneath, then click on _Actions_<p align="center"><img src="https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/github-sidebar-secrets.png" alt="Screenshot of GitHub secrets sidebar item" width="40%"></p>
3. In the next page, click the green <kbd>New repository secret</kbd> button<p align="center"><img src="https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/github-secrets.png" alt="Screenshot of GitHub secrets interface" width="60%"></p>
4. Name the variable `INVENIO_TOKEN` and paste in your InvenioRDM token
5. Finish by clicking the green <kbd>Add secret</kbd> button
#### Testing the workflow
After setting up the workflow and storing the InvenioRDM token in your repository on GitHub, it's a good idea to run the workflow manually to test that it works as expected.
1. Go to the _Actions_ tab in your repository and click on the name of the workflow in the sidebar on the left<p align="center"><img src="https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/github-run-workflow.png" alt="Screenshot of GitHub actions workflow list" width="90%"></p>
2. Click the <kbd>Run workflow</kbd> button in the right-hand side of the blue strip
3. In the pull-down, change the value of "Mark the record as a draft" to `true`<p align="center"><img src="https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/github-workflow-options-circled.png" alt="Screenshot of GitHub Actions workflow menu" width="40%"></p>
4. Click the green <kbd>Run workflow</kbd> button near the bottom
5. Refresh the web page and a new line will be shown named after your workflow file<p align="center"><img src="https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/github-running-workflow.png" alt="Screenshot of a running workflow in GitHub Actions" width="90%"></p>
6. Click the title of the workflow to see the IGA workflow progress and results
#### Running the workflow when releasing software
Once the personal access token from InvenioRDM is stored as a GitHub secret, the workflow should run automatically every time a new release is made on GitHub – no further action should be needed. You can check the results (and look for errors if something went wrong) by going to the _Actions_ tab in your GitHub repository.
## Usage
This section provides detailed information about IGA's operation and options to control it.
### Identifying the InvenioRDM server
The server address must be provided either as the value of the option `--invenio-server` or in an environment variable named `INVENIO_SERVER`. If the server address does not begin with `https://`, IGA will prepend it automatically.
### Providing an InvenioRDM access token
A personal access token (PAT) for making API calls to the InvenioRDM server must be also supplied when invoking IGA. The preferred method is to set the value of the environment variable `INVENIO_TOKEN`. Alternatively, you can use the option `--invenio-token` to pass the token on the command line, but **you are strongly advised to avoid this practice because it is insecure**.
To obtain a PAT from an InvenioRDM server, first log in to the server, then visit the page at `/account/settings/applications` and use the interface there to create a token. The token will be a long string of alphanumeric characters such as `OH0KYf4PGYQGnCM4b53ejSGicOC4s4YnonRVzGJbWxY`; set the value of the variable `INVENIO_TOKEN` to this string.
### Providing a GitHub access token
It _may_ be possible to run IGA without providing a GitHub access token. GitHub allows up to 60 API calls per minute when running without credentials, and though IGA makes several API calls to GitHub each time it runs, for some public repositories IGA will not hit the limit. However, if you are archiving a private repository, run IGA multiple times in a row, or the repository has many contributors, then you will need to supply a GitHub access token. The preferred way of doing that is to set the value of the environment variable `GITHUB_TOKEN`. Alternatively, you can use the option `--github-token` to pass the token on the command line, but **you are strongly advised to avoid this practice because it is insecure**. To obtain a PAT from GitHub, visit [https://docs.github.com/en/authentication](https://docs.github.com/en/authentication) and follow the instructions for creating a "classic" personal access token.
Note that when you run IGA as a GitHub Actions workflow, you do not need to create or set a GitHub token because it is obtained automatically by the GitHub Actions workflow.
### Specifying a GitHub release
A GitHub release can be specified to IGA in one of two mutually-exclusive ways:
1. The full URL of the web page on GitHub of a tagged release. In this case,
the URL must be the final argument on the command line invocation of IGA
and the options `--account` and `--repo` must be omitted.
2. A combination of _account name_, _repository name_, and _tag_. In this
case, the final argument on the command line must be the _tag_, and in
addition, values for the options `--account` and `--repo` must be provided.
Here's an example using approach #1 (assuming environment variables `INVENIO_SERVER`, `INVENIO_TOKEN`, and `GITHUB_TOKEN` have all been set):
```shell
iga https://github.com/mhucka/taupe/releases/tag/v1.2.0
```
and here's the equivalent using approach #2:
```shell
iga --github-account mhucka --github-repo taupe v1.2.0
```
Note that when using this form of the command, the release tag (`v1.2.0` above) must be the last item given on the command line.
### Gathering metadata for an InvenioRDM record
The record created in InvenioRDM is constructed using information obtained using GitHub's API as well as several other APIs as needed. The information includes the following:
* (if one exists) a `codemeta.json` file in the GitHub repository
* (if one exists) a `CITATION.cff` file in the GitHub repository
* data available from GitHub for the release
* data available from GitHub for the repository
* data available from GitHub for the account of the owner
* data available from GitHub for the accounts of repository contributors
* file assets associated with the GitHub release
* data available from ORCID.org for ORCID identifiers
* data available from ROR.org for Research Organization Registry identifiers
* data available from DOI.org, NCBI, Google Books, & others for publications
* data available from spdx.org for software licenses
IGA tries to use [`CodeMeta.json`](https://codemeta.github.io) first and [`CITATION.cff`](https://citation-file-format.github.io) second to fill out the fields of the InvenioRDM record. If neither of those files are present, IGA uses values from the GitHub repository instead. You can make it always use all sources of info with the option `--all-metadata`. Depending on how complete and up-to-date your `CodeMeta.json` and `CITATION.cff` are, this may or may not make the record more comprehensive and may or may not introduce redundancies or unwanted values.
To override the auto-created metadata, use the option `--read-metadata` followed by the path to a JSON file structured according to the InvenioRDM schema used by the destination server. When `--read-metadata` is provided, IGA does _not_ extract the data above, but still obtains the file assets from GitHub.
### Specifying GitHub file assets
By default, IGA attaches to the InvenioRDM record _only_ the ZIP file asset created by GitHub for the release. To make IGA attach all assets associated with the GitHub release, use the option `--all-assets`.
To upload specific file assets and override the default selections made by IGA, you can use the option `--file` followed by a path to a file to be uploaded. You can repeat the option `--file` to upload multiple file assets. Note that if `--file` is provided, then IGA _does not use any file assets from GitHub_; it is the user's responsibility to supply all the files that should be uploaded.
If both `--read-metadata` and `--file` are used, then IGA does not actually contact GitHub for any information.
### Handling communities
To submit your record to a community, use the `--community` option together with a community name. The option `--list-communities` can be used to get a list of communities supported by the InvenioRDM server. Note that submitting a record to a community means that the record will not be finalized and will not be publicly visible when IGA finishes; instead, the record URL that you receive will be for a draft version, pending review by the community moderators.
### Indicating draft versus published records
If the `--community` option is not used, then by default, IGA will finalize and publish the record. To make it stop short and leave the record as a draft instead, use the option `--draft`. The draft option also takes precedence over the community option: if you use both `--draft` and `--community`, IGA will stop after creating the draft record and will _not_ submit it to the community. (You can nevertheless submit the record to a community manually once the draft is created, by visiting the record's web page and using the InvenioRDM interface there.)
### Versioning records
The option `--parent-record` can be used to indicate that the record being constructed is a new version of an existing record. This will make IGA use the InvenioRDM API for [record versioning](https://inveniordm.docs.cern.ch/releases/versions/version-v2.0.0/#versioning-support). The newly-created record will be linked to a parent record identified by the value passed to `--parent-record`. The value must be either an InvenioRDM record identifier (which is a sequence of alphanumeric characters of the form _XXXXX-XXXXX_, such as `bknz4-bch35`, generated by the InvenioRDM server), or a URL to the landing page of the record in the InvenioRDM server. (Note that such URLs end in the record identifier.) Here is an example of using this option:
```shell
iga --parent-record xbcd4-efgh5 https://github.com/mhucka/taupe/releases/tag/v1.2.0
```
### Other options recognized by IGA
Running IGA with the option `--save-metadata` will make it create a metadata record, but instead of uploading the record (and any assets) to the InvenioRDM server, IGA will write the result to the given destination. This can be useful not only for debugging but also for creating a starting point for a custom metadata record: first run IGA with `--save-metadata` to save a record to a file, edit the result, then finally run IGA with the `--read-metadata` option to use the modified record to create a release in the InvenioRDM server.
The `--mode` option can be used to change the run mode. Four run modes are available: `quiet`, `normal`, `verbose`, and `debug`. The default mode is `normal`, in which IGA prints a few messages while it's working. The mode `quiet` will make it avoid printing anything unless an error occurs, the mode `verbose` will make it print a detailed trace of what it is doing, and the mode `debug` will make IGA even more verbose. In addition, in `debug` mode, IGA will drop into the `pdb` debugger if it encounters an exception during execution. On Linux and macOS, debug mode also installs a signal handler on signal USR1 that causes IGA to drop into the `pdb` debugger if the signal USR1 is received. (Use `kill -USR1 NNN`, where NNN is the IGA process id.)
By default, informational output is sent to the standard output (normally the terminal console). The option `--log-dest` can be used to send the output to the given destination instead. The value can be `-` (i.e., a dash) to indicate console output, or it can be a file path to send the output to the file. A special exception is that even if a log destination is given, IGA will still print the final record URL to stdout. This makes it possible to invoke IGA from scripts that capture the record URL while still saving diagnostic output in case debugging is needed.
By default, IGA prints only the record URL when done. The option `--print-doi` will make it also print the DOI of the record. (Note that this only works when publishing records; if options `--draft` or `--community` are used, then there will be no DOI. In those case, only the URL will be printed.)
Reading and writing large files may take a long time; on the other hand, IGA should not wait forever on network operations before reporting an error if a server or network becomes unresponsive. To balance these conflicting needs, IGA automatically scales its network timeout based on file sizes. To override its adaptive algorithm and set an explicit timeout value, use the option `--timeout` with a value in seconds.
If given the `--version` option, this program will print its version and other information, and exit without doing anything else.
Running IGA with the option `--help` will make it print help text and exit without doing anything else.
### Summary of command-line options
As explain above, IGA takes one required argument on the command line: either (1) the full URL of a web page on GitHub of a tagged release, or (2) a release tag name which is to be used in combination with options `--github-account` and `--github-repo`. The following table summarizes all the command line options available.
| Long form option | Short | Meaning | Default | |
|------------------------|----------|--------------------------------------|---------|---|
| `--all-assets` | `-A` | Attach all GitHub assets | Attach only the release source ZIP| |
| `--all-metadata` | `-M` | Include additional metadata from GitHub | Favor CodeMeta & CFF | |
| `--community` _C_ | `-c` _C_ | Submit record to RDM community _C_ | Don't submit record to any community | |
| `--draft` | `-d` | Mark the RDM record as a draft | Publish record when done | |
| `--file` _F_ | `-f` _F_ | Upload local file _F_ instead of GitHub assets | Upload only GitHub assets | ⚑ |
| `--github-account` _A_ | `-a` _A_ | Look in GitHub account _A_ | Get account name from release URL | ✯ |
| `--github-repo` _R_ | `-r` _R_ | Look in GitHub repository _R_ of account _A_ | Get repo name from release URL | ✯ |
| `--github-token` _T_ | `-t` _T_ | Use GitHub access token _T_| Use value in env. var. `GITHUB_TOKEN` | |
| `--help` | `-h` | Print help info and exit | | |
| `--invenio-server` _S_ | `-s` _S_ | Send record to InvenioRDM server at address _S_ | Use value in env. var. `INVENIO_SERVER` | |
| `--invenio-token` _K_ | `-k` _K_ | Use InvenioRDM access token _K_ | Use value in env. var. `INVENIO_TOKEN` | |
| `--list-communities` | `-L` | List communities available for use with `--community` | | |
| `--log-dest` _L_ | `-l` _L_ | Write log output to destination _L_ | Write to terminal | ⚐ |
| `--mode` _M_ | `-m` _M_ | Run in mode `quiet`, `normal`, `verbose`, or `debug` | `normal` | |
| `--open` | `-o` | Open record's web page in a browser when done | Do nothing when done | |
| `--parent-record` _N_ | `-p` _N_ | Make this a new version of existing record _N_ | New record is unrelated to other records | ❖ |
| `--print-doi` | `-i` | Print both the DOI & record URL when done | Print only the record URL | |
| `--read-metadata` _R_ | `-R` _R_ | Read metadata record from file _R_; don\'t build one | Build metadata record | |
| `--save-metadata` _D_ | `-S` _D_ | Save metadata record to file _D_; don\'t upload it | Upload to InvenioRDM server | |
| `--timeout` _X_ | `-T` _X_ | Wait on network operations a max of _X_ seconds | Auto-adjusted based on file size | |
| `--version` | `-V` | Print program version info and exit | | |
⚑ Can repeat the option to specify multiple files.<br>
⚐ To write to the console, use the character `-` as the value of _OUT_; otherwise, _OUT_ must be the name of a file where the output should be written.<br>
✯ When using `--github-account` and `--github-repo`, the last argument on the command line must be a release tag name.<br>
❖ The record identifier must be given either as a sequence of alphanumeric characters of the form _XXXXX-XXXXX_ (e.g., `bknz4-bch35`), or as a URL to the landing page of an existing record in the InvenioRDM server.
### Return values
This program exits with a return status code of 0 if no problem is encountered. Otherwise, it returns a nonzero status code. The following table lists the possible values:
| Code | Meaning |
|:----:|----------------------------------------------------------|
| 0 | success – program completed normally |
| 1 | interrupted |
| 2 | encountered a bad or missing value for an option |
| 3 | encountered a problem with a file or directory |
| 4 | encountered a problem interacting with GitHub |
| 5 | encountered a problem interacting with InvenioRDM |
| 6 | the personal access token was rejected |
| 7 | an exception or fatal error occurred |
### Adding a DOI badge to your GitHub repository
Once you have set up the IGA workflow in your GitHub repository, you may wish to add a DOI badge to your repository's README file. It would be a chore to keep updating the DOI value in this badge every time a new release is made, and thankfully, it's not necessary: it's possible to make the badge get the current DOI value dynamically. Here is how:
1. _After_ you have at least one release archived in your InvenioRDM server, find out the DOI of that release in InvenioRDM, and extract the tail end of that DOI. The DOI assigned by InvenioRDM will be a string such as `10.22002/zsmem-2pg20`; the tail end is the `zsmem-2pg20` part. (Your DOI and tail portion will be different.)
2. Let <i><b><code>SERVERURL</code></b></i> stand for the URL for your InvenioRDM server, and let <i><b><code>IDENTIFIER</code></b></i> stand for the identifier portion of the DOI. In your `README.md` file, write the DOI badge as follows (without line breaks):
<pre><code>[![DOI](https://img.shields.io/badge/dynamic/json.svg?label=DOI&query=$.pids.doi.identifier&uri=<i><b><code>SERVERURL</code></b></i>/api/records/<i><b><code>IDENTIFIER</code></b></i>/versions/latest)](<i><b><code>SERVERURL</code></b></i>/records/<i><b><code>IDENTIFIER</code></b></i>/latest)</code></pre>
For example, for CaltechDATA and the archived IGA releases there,
* <i><b><code>SERVERURL</code></b></i> = `http://data.caltech.edu`
* <i><b><code>IDENTIFIER</code></b></i> = `zsmem-2pg20`
which leads to the following badge string for IGA's `README.md` file:
```txt
[![DOI](https://img.shields.io/badge/dynamic/json.svg?label=DOI&query=$.pids.doi.identifier&uri=https://data.caltech.edu/api/records/zsmem-2pg20/versions/latest)](https://data.caltech.edu/records/zsmem-2pg20/latest)
```
<br>The result looks like this: <div align="center">
[![DOI](https://img.shields.io/badge/dynamic/json.svg?label=DOI&query=$.pids.doi.identifier&uri=https://data.caltech.edu/api/records/zsmem-2pg20/versions/latest)](https://data.caltech.edu/records/zsmem-2pg20/latest)
</div>
You can change the look of the badge by using style parameters. Please refer to the [Shields.io](https://shields.io/badges/static-badge) documentation for static badges.
## Known issues and limitations
The following are known issues and limitations.
* As of mid-2023, InvenioRDM requires names of record creators and other contributors to be split into given (first) and family (surname). This is problematic for multiple reasons. The first is that mononyms are common in many countries: a person's name may legitimately be only a single word which is not conceptually a "given" or "family" name. To compound the difficulty for IGA, names are stored as single fields in GitHub account metadata, so unless a repository has a `codemeta.json` or `CITATION.cff` file (which allow authors more control over how they want their names represented), IGA is forced to try to split the single GitHub name string into two parts. _A foolproof algorithm for doing this does not exist_, so IGA will sometimes get it wrong. (That said, IGA goes to extraordinary lengths to try to do a good job.)
* InvenioRDM requires that identities (creators, contributors, etc.) to be labeled as personal or organizational. The nature of identities is usually made clear in `codemeta.json` and `CITATION.cff` files. GitHub also provides a flag that is meant to be used to label organizational accounts, but sometimes people don't set the GitHub account information correctly. Consequently, if IGA has to use GitHub data to get (e.g.) the list of contributors on a project, it may mislabel identities in the InvenioRDM record it produces.
* Some accounts on GitHub are software automation or "bot" accounts, but are not labeled as such. These accounts are generally indistinguishable from human accounts on GitHub, so if they're not labeled as bot or organizational accounts in GitHub, IGA can't recognize that they're humans. If such an account is the creator of a release in GitHub, and IGA tries to use its name-splitting algorithm on the name of the account, it may produce a nonsensical result. For example, it might turn "Travis CI" into an entry with a first name of "Travis" and last name of "CI".
* Funder and funding information can only be specified in `codemeta.json` files; neither GitHub nor `CITATION.cff` have provisions to store this kind of metadata. The CodeMeta specification defines two fields for this purpose: `funder` and `funding`. Unfortunately, these map imperfectly to the requirements of InvenioRDM's metadata format. In addition, people don't always follow the CodeMeta guidelines, and sometimes they write funding information as text strings (instead of structured objects), the interpretation of which would require software that can recognize grant and funding agency information from free-text descriptions. This combination of factors means IGA often can't fill in the funding metadata in InvenioRDM records even if there is some funding information in the `codemeta.json` file.
## Getting help
If you find an issue, please submit it in [the GitHub issue tracker](https://github.com/caltechlibrary/iga/issues) for this repository.
## Contributing
Your help and participation in enhancing IGA is welcome! Please visit the [guidelines for contributing](CONTRIBUTING.md) for some tips on getting started.
## License
Software produced by the Caltech Library is Copyright © 2022–2024 California Institute of Technology. This software is freely distributed under a BSD-style license. Please see the [LICENSE](LICENSE) file for more information.
## Acknowledgments
This work was funded by the California Institute of Technology Library.
IGA uses multiple other open-source packages, without which it would have taken much longer to write the software. I want to acknowledge this debt. In alphabetical order, the packages are:
* [Aenum](https://github.com/ethanfurman/aenum) – package for advanced enumerations
* [Arrow](https://pypi.org/project/arrow/) – a library for creating & manipulating dates
* [Boltons](https://github.com/mahmoud/boltons/) – package of miscellaneous Python utilities
* [caltechdata_api](https://github.com/caltechlibrary/caltechdata_api) – package for using the CaltechDATA API
* [CommonPy](https://github.com/caltechlibrary/commonpy) – a collection of commonly-useful Python functions
* [dirtyjson](https://github.com/codecobblers/dirtyjson) – JSON decoder that copes with problematic JSON files and reports useful error messages
* [flake8](https://github.com/pycqa/flake8) – Python code linter and style analyzer
* [Ginza](https://github.com/megagonlabs/ginza) – Japanese NLP Library
* [httpx](https://www.python-httpx.org) – HTTP client library that supports HTTP/2
* [humanize](https://github.com/jmoiron/humanize) – make numbers more easily readable by humans
* [idutils](https://github.com/inveniosoftware/idutils) – package for validating and normalizing various kinds of persistent identifiers
* [ipdb](https://github.com/gotcha/ipdb) – the IPython debugger
* [iptools](https://github.com/bd808/python-iptools) – utilities for dealing with IP addresses
* [isbnlib](https://github.com/xlcnd/isbnlib) – utilities for dealing with ISBNs
* [Jamo](https://github.com/JDongian/python-jamo) – Hangul character analysis
* [json5](https://github.com/dpranke/pyjson5) – extended JSON format parser
* [latexcodec](https://github.com/mcmtroffaes/latexcodec) – lexer and codec to work with LaTeX code in Python
* [Lingua](https://github.com/pemistahl/lingua) – language detection library
* [linkify-it-py](https://github.com/tsutsu3/linkify-it-py) – a link recognition library with full unicode support
* [lxml](https://lxml.de) – an XML parsing library
* [Markdown](https://python-markdown.github.io) – Python package for working with Markdown
* [markdown-checklist](https://github.com/FND/markdown-checklist) – GitHub-style checklist extension for Python Markdown package
* [mdx-breakless-lists](https://github.com/adamb70/mdx-breakless-lists) – GitHub-style Markdown lists that don't require a line break above them
* [mdx_linkify](https://github.com/daGrevis/mdx_linkify) – extension for Python Markdown will convert text that look like links to HTML anchors
* [MyST-parser](https://github.com/executablebooks/MyST-Parser) – A Sphinx and Docutils extension to parse an extended version of Markdown
* [nameparser](https://github.com/derek73/python-nameparser) – package for parsing human names into their individual components
* [probablepeople](https://github.com/datamade/probablepeople) – package for parsing names into components using ML-based techniques
* [pybtex](https://pybtex.org) – BibTeX parser and formatter
* [pybtex-apa7-style](https://pypi.org/project/pybtex-apa7-style/) – plugin for [pybtex](https://pybtex.org) that provides APA7 style formatting
* [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) – extensions for Python Markdown
* [pytest](https://docs.pytest.org/en/stable/) – testing framework
* [pytest-cov](https://github.com/pytest-dev/pytest-cov) – coverage reports for use with `pytest`
* [pytest-mock](https://pypi.org/project/pytest-mock/) – wrapper around the `mock` package for use with `pytest`
* [PyYAML](https://pyyaml.org) – YAML parser
* [Rich](https://github.com/Textualize/rich) – library for writing styled text to the terminal
* [rich-click](https://github.com/ewels/rich-click) – CLI interface built on top of [Rich](https://github.com/Textualize/rich)
* [setuptools](https://github.com/pypa/setuptools) – library for `setup.py`
* [Sidetrack](https://github.com/caltechlibrary/sidetrack) – simple debug logging/tracing package
* [spaCy](https://spacy.io) – Natural Language Processing package
* [spacy-alignments](https://github.com/explosion/spacy-alignments) – alternate alignments for [spaCy](https://spacy.io)
* [spacy-legacy](https://pypi.org/project/spacy-legacy/) – [spaCy](https://spacy.io) legacy functions and architectures for backwards compatibility
* [spacy-loggers](https://github.com/explosion/spacy-loggers) – loggers for [spaCy](https://spacy.io)
* [spacy-pkuseg](https://github.com/explosion/spacy-pkuseg) – Chinese word segmentation toolkit for [spaCy](https://spacy.io)
* [spacy-transformers](https://spacy.io) – pretrained Transformers for [spaCy](https://spacy.io)
* [Sphinx](https://www.sphinx-doc.org/en/master/) – documentation generator for Python
* [sphinx-autobuild](https://pypi.org/project/sphinx-autobuild/) – rebuild Sphinx docs automatically
* [sphinx-material](https://bashtage.github.io/sphinx-material/) – a responsive Material Design theme for Sphinx
* [sphinxcontrib-mermaid](https://github.com/mgaitan/sphinxcontrib-mermaid) – support Mermaid diagrams in Sphinx docs
* [StringDist](https://github.com/obulkin/string-dist) – library for calculating string distances
* [Twine](https://github.com/pypa/twine) – utilities for publishing Python packages on [PyPI](https://pypi.org)
* [url-normalize](https://github.com/niksite/url-normalize) – URI/URL normalization utilities
* [validators](https://github.com/kvesteri/validators) – data validation package for Python
* [wheel](https://pypi.org/project/wheel/) – setuptools extension for building wheels
<div align="center">
<br>
<a href="https://www.caltech.edu">
<img width="100" height="100" alt="Caltech logo" src="https://github.com/caltechlibrary/iga/raw/main/.graphics/caltech-round.png">
</a>
</div>
Raw data
{
"_id": null,
"home_page": "https://github.com/caltechlibrary/iga",
"name": "iga",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "Python, applications",
"author": "Tom Morrell",
"author_email": "tmorrell@caltech.edu",
"download_url": "https://files.pythonhosted.org/packages/a4/b3/37bd455e16c2d62efd995389981df552cc21119aa32583b9e9984478a075/iga-1.3.5.tar.gz",
"platform": null,
"description": "# IGA<img alt=\"IGA logo\" width=\"12%\" align=\"right\" src=\"https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/cloud-upload.png\">\n\nIGA is the _InvenioRDM GitHub Archiver_, a standalone program as well as a [GitHub Actions](https://github.com/marketplace/actions/inveniordm-github-archiver) workflow that lets you automatically archive GitHub software releases in an [InvenioRDM](https://inveniosoftware.org/products/rdm/) repository.\n\n[![Latest release](https://img.shields.io/github/v/release/caltechlibrary/iga.svg?style=flat-square&color=b44e88&label=Latest%20release)](https://github.com/caltechlibrary/iga/releases) [![License](https://img.shields.io/badge/License-BSD--like-lightgrey.svg?style=flat-square)](https://github.com/caltechlibrary/iga/blob/develop/LICENSE) [![Python](https://img.shields.io/badge/Python-3.9+-brightgreen.svg?style=flat-square)](https://www.python.org/downloads/release/python-390/) [![PyPI](https://img.shields.io/pypi/v/iga.svg?style=flat-square&color=orange&label=PyPI)](https://pypi.org/project/iga/) [![DOI](https://img.shields.io/badge/dynamic/json.svg?label=DOI&style=flat-square&colorA=gray&colorB=navy&query=$.pids.doi.identifier&uri=https://data.caltech.edu/api/records/zsmem-2pg20/versions/latest)](https://data.caltech.edu/records/zsmem-2pg20/latest)\n\n\n## Table of contents\n\n* [Introduction](#introduction)\n* [Installation](#installation)\n * [IGA as a standalone program](#iga-as-a-standalone-program)\n * [IGA as a GitHub Actions workflow](#iga-as-a-github-actions-workflow)\n* [Quick start](#quick-start)\n* [Usage](#usage)\n * [Identifying the InvenioRDM server](#identifying-the-inveniordm-server)\n * [Providing an InvenioRDM access token](#providing-an-inveniordm-access-token)\n * [Providing a GitHub access token](#providing-a-github-access-token)\n * [Specifying a GitHub release](#specifying-a-github-release)\n * [Gathering metadata for an InvenioRDM record](#gathering-metadata-for-an-inveniordm-record)\n * [Specifying GitHub file assets](#specifying-github-file-assets)\n * [Handling communities](#handling-communities)\n * [Indicating draft versus published records](#indicating-draft-versus-published-records)\n * [Versioning records](#versioning-records)\n * [Other options recognized by IGA](#other-options-recognized-by-iga)\n * [Summary of command-line options](#summary-of-command-line-options)\n * [Return values](#return-values)\n * [Adding a DOI badge to your GitHub repository](#adding-a-doi-badge-to-your-github-repository)\n* [Known issues and limitations](#known-issues-and-limitations)\n* [Getting help](#getting-help)\n* [Contributing](#contributing)\n* [License](#license)\n* [Acknowledgments](#acknowledgments)\n\n\n## Introduction\n\n[InvenioRDM](https://inveniosoftware.org/products/rdm/) is the basis for many institutional repositories such as [CaltechDATA](https://data.caltech.edu) that enable users to preserve software and data sets in long-term archive. Though such repositories are critical resources, creating detailed records and uploading assets can be a tedious and error-prone process if done manually. This is where the [_InvenioRDM GitHub Archiver_](https://github.com/caltechlibrary/iga) (IGA) comes in.\n\nIGA creates metadata records and sends releases from GitHub to an InvenioRDM-based repository server. IGA can be invoked from the command line; it also can be set up as a [GitHub Actions](https://docs.github.com/en/actions) workflow to archive GitHub releases automatically for a repository each time they are made.\n\n<p align=center>\n<img align=\"middle\" src=\"https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/example-github-release.jpg\" alt=\"Screenshot of a software release in GitHub\" width=\"40%\">\n<span style=\"font-size: 150%\">\u279c</span>\n<img align=\"middle\" src=\"https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/example-record-landing-page.jpg\" alt=\"Screenshot of the corresponding entry in InvenioRDM\" width=\"40%\">\n</p>\n\nIGA offers many notable features:\n\n* Automatic metadata extraction from GitHub plus [`codemeta.json`](https://codemeta.github.io) and [`CITATION.cff`](https://citation-file-format.github.io) files\n* Thorough coverage of [InvenioRDM record metadata](https://inveniordm.docs.cern.ch/reference/metadata) using painstaking procedures\n* Recognition of identifiers in CodeMeta & CFF files: [ORCID](https://orcid.org), [DOI](https://www.doi.org), [PMCID](https://www.ncbi.nlm.nih.gov/pmc/about/public-access-info/), and more\n* Automatic lookup of publication data in [DOI.org](https://www.doi.org), [PubMed](https://www.ncbi.nlm.nih.gov/pmc/about/public-access-info/), Google, and other sources\n* Automatic lookup of organization names in [ROR](https://ror.org) (assuming ROR id's are provided)\n* Automatic lookup of human names in [ORCID.org](https://orcid.org) (assuming ORCID id's are provided)\n* Automatic splitting of human names into family & given names using [ML](https://en.wikipedia.org/wiki/Machine_learning) methods\n* Support for InvenioRDM [communities](https://invenio-communities.readthedocs.io/en/latest/)\n* Support for overriding the record that IGA creates, for complete control if you need it\n* Support for using the GitHub API without a [GitHub access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) in simple cases\n* Extensive use of logging so you can see what's going on under the hood\n\nThe IGA GitHub action workflow automatically will add the archived DOI to a CodeMeta file and create/update a CFF file using the [CodeMeta2CFF workflow](https://github.com/caltechlibrary/codemeta2cff).\n\n## Installation\n\nIGA can be installed as either (or both) a command-line program on your computer or a [GitHub Action](https://docs.github.com/en/actions) in a GitHub repository.\n\n### IGA as a standalone program\n\nPlease choose an approach that suits your situation and preferences.\n\n<details><summary><h4><i>Alternative 1: using <code>pipx</code></i></h4></summary>\n\n[Pipx](https://github.com/pypa/pipx) lets you install Python programs in a way that isolates Python dependencies from other Python programs on your system, and yet the resulting `iga` command can be run from any shell and directory – like any normal program on your computer. If you use `pipx` on your system, you can install IGA with the following command:\n\n```sh\npipx install iga\n```\n\nAfter installation, a program named `iga` should end up in a location where other command-line programs are installed on your computer. Test it by running the following command in a shell:\n\n```shell\niga --help\n```\n\n</details>\n\n<details><summary><h4><i>Alternative 2: using <code>pip</code></i></h4></summary>\n\nIGA is available from the [Python package repository PyPI](https://pypi.org) and can be installed using [`pip`](https://pip.pypa.io/en/stable/installing/):\n\n```sh\npython3 -m pip install iga\n```\n\nAs an alternative to getting it from [PyPI](https://pypi.org), you can install `iga` directly from GitHub:\n\n```sh\npython3 -m pip install git+https://github.com/caltechlibrary/iga.git\n```\n\n_If you already installed IGA once before_, and want to update to the latest version, add `--upgrade` to the end of either command line above.\n\nAfter installation, a program named `iga` should end up in a location where other command-line programs are installed on your computer. Test it by running the following command in a shell:\n\n```shell\niga --help\n```\n\n</details>\n\n<details><summary><h4><i>Alternative 3: from sources</i></h4></summary>\n\nIf you prefer to install IGA directly from the source code, first obtain a copy by either downloading the source archive from the [IGA releases page on GitHub](https://github.com/caltechlibrary/iga/releases), or by using `git` to clone the repository to a location on your computer. For example,\n\n```sh\ngit clone https://github.com/caltechlibrary/iga\n```\n\nNext, after getting a copy of the files, run `setup.py` inside the code directory:\n\n```sh\ncd iga\npython3 setup.py install\n```\n\n</details>\n\nOnce you have installed `iga`, the next steps are (1) [get an InvenioRDM token](#getting-an-inveniordm-token) and (2) [configure `iga` for running locally](#configuring-and-running-iga-locally).\n\n\n### IGA as a GitHub Actions workflow\n\nA [GitHub Actions](https://docs.github.com/en/actions) workflow is an automated process that runs on GitHub's servers under control of a file in your repository. Follow these steps to create the IGA workflow file:\n\n1. In the main branch of your GitHub repository, create a `.github/workflows` directory\n2. In the `.github/workflows` directory, create a file named (e.g.) `iga.yml` and copy the [following contents](https://raw.githubusercontent.com/caltechlibrary/iga/v1/sample-workflow.yml) into it:\n\n ```yaml\n # GitHub Actions workflow for InvenioRDM GitHub Archiver version 1.3.4\n # This is available as the file \"sample-workflow.yml\" from the open-\n # source repository for IGA at https://github.com/caltechlibrary/iga/.\n\n # \u256d\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n # \u2502 Configure this section \u2502\n # \u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\n env:\n # \ud83d\udc4b\ud83c\udffb Set the next variable to your InvenioRDM server address \ud83d\udc4b\ud83c\udffb\n INVENIO_SERVER: https://your-invenio-server.org\n\n # Set to an InvenioRDM record ID to mark release as a new version.\n parent_record: none\n\n # The variables below are other IGA options. Please see the docs.\n community: none\n draft: false\n all_assets: false\n all_metadata: false\n debug: false\n\n # This variable is a setting for post-archiving CodeMeta file updates.\n # If you don't have a CodeMeta file, you can remove the add_doi_codemeta\n # and Codemeta2CFF jobs at the bottom of this file.\n ref: main\n\n # \u256d\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n # \u2502 The rest of this file should be left as-is \u2502\n # \u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\n name: InvenioRDM GitHub Archiver\n on:\n release:\n types: [published]\n workflow_dispatch:\n inputs:\n release_tag:\n description: The release tag (empty = latest)\n parent_record:\n description: ID of parent record (for versioning)\n community:\n description: Name of InvenioRDM community (if any)\n draft:\n description: Mark the record as a draft\n type: boolean\n all_assets:\n description: Attach all GitHub assets\n type: boolean\n all_metadata:\n description: Include additional GitHub metadata\n type: boolean\n debug:\n description: Print debug info in the GitHub log\n type: boolean\n\n run-name: Archive ${{inputs.release_tag || 'latest release'}} in InvenioRDM\n jobs:\n run_iga:\n name: Send to ${{needs.get_repository.outputs.server}}\n runs-on: ubuntu-latest\n needs: get_repository\n outputs:\n record_doi: ${{steps.iga.outputs.record_doi}}\n steps:\n - uses: caltechlibrary/iga@v1\n id: iga\n with:\n INVENIO_SERVER: ${{env.INVENIO_SERVER}}\n INVENIO_TOKEN: ${{secrets.INVENIO_TOKEN}}\n all_assets: ${{github.event.inputs.all_assets || env.all_assets}}\n all_metadata: ${{github.event.inputs.all_metadata || env.all_metadata}}\n debug: ${{github.event.inputs.debug || env.debug}}\n draft: ${{github.event.inputs.draft || env.draft}}\n community: ${{github.event.inputs.community || env.community}}\n parent_record: ${{github.event.inputs.parent_record || env.parent_record}}\n release_tag: ${{github.event.inputs.release_tag || 'latest'}}\n get_repository:\n name: Get repository name\n runs-on: ubuntu-latest\n outputs:\n server: ${{steps.parse.outputs.host}}\n steps:\n - name: Extract name from INVENIO_SERVER\n id: parse\n run: echo \"host=$(cut -d'/' -f3 <<< ${{env.INVENIO_SERVER}} | cut -d':' -f1)\" >> $GITHUB_OUTPUT\n add_doi_codemeta:\n name: \"Add ${{needs.run_iga.outputs.record_doi}} to codemeta.json\"\n needs: run_iga\n runs-on: ubuntu-latest\n steps:\n - name: Checkout\n uses: actions/checkout@v4\n with:\n ref: ${{ env.ref }}\n - name: Install sde\n run: pip install sde\n - name: Add DOI to CodeMeta File\n run: sde identifier ${{needs.run_iga.outputs.record_doi}} codemeta.json\n - name: Commit CFF\n uses: EndBug/add-and-commit@v9\n with:\n message: 'Add DOI to codemeta.json file'\n add: 'codemeta.json'\n CodeMeta2CFF:\n runs-on: ubuntu-latest\n needs: add_doi_codemeta\n steps:\n - name: Checkout\n uses: actions/checkout@v4\n with:\n ref: ${{ env.ref }}\n - name: Convert CFF\n uses: caltechlibrary/codemeta2cff@main\n - name: Commit CFF\n uses: EndBug/add-and-commit@v9\n with:\n message: 'Add updated CITATION.cff from codemeta.json file'\n add: 'CITATION.cff'\n ```\n\n3. **Edit the value of the `INVENIO_SERVER` variable (line 7 above)** \u2191\n4. Optionally, change the [values of other options (`parent_record`, `community`, etc.)](https://caltechlibrary.github.io/iga/gha-usage.html#input-parameters)\n5. If you have a [CodeMeta](https://caltechlibrary.github.io/iga/introduction.html#codemeta-citation-cff) file, the GitHub action workflow can automatically add the DOI after IGA has run. The \"ref\" value is the branch where the CodeMeta file will be updated. If you don't use a CodeMeta file, you can delete the `add_doi_codemeta` part of the workflow.\n6. Save the file, commit the changes to git, and push your changes to GitHub\nOnce you have installed the GitHub Action workflow for IGA, the next steps are (1) [get an InvenioRDM token](#getting-an-inveniordm-token) and (2) [configure the GitHub Action workflow](#configuring-and-running-iga-as-a-github-actions-workflow).\n\n\n## Quick start\n\nNo matter whether IGA is run locally on your computer or as a GitHub Actions workflow, in both cases it must be provided with a personal access token (PAT) for your InvenioRDM server. Getting one is the first step.\n\n### Getting an InvenioRDM token\n\n<img src=\"https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/get-invenio-pat.png\" alt=\"Screenshot of InvenioRDM PAT menu\" width=\"60%\" align=\"right\">\n\n1. Log in to your InvenioRDM account\n2. Go to the _Applications_ page in your account profile\n3. Click the <kbd>New token</kbd> button next to \"Personal access tokens\"\n4. On the page that is shown after you click that button, name your token (the name does not matter) and click the <kbd>Create</kbd> button\n5. After InvenioRDM creates and shows you the token, **copy it to a safe location** because InvenioRDM will not show it again\n\n### Configuring and running IGA locally\n\nTo send a GitHub release to your InvenioRDM server, IGA needs this information:\n\n1. (Required) The identity of the GitHub release to be archived\n2. (Required) The address of the destination InvenioRDM server\n3. (Required) A personal access token for InvenioRDM (from [above](#getting-an-inveniordm-token))\n4. (Optional) A [personal access token for GitHub](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token)\n\nThe identity of the GitHub release is always given as an argument to IGA on the command line; the remaining values can be provided either via command-line options or environment variables. One approach is to set environment variables in shell scripts or your interactive shell. Here is an example using Bash shell syntax, with fake token values:\n\n```shell\nexport INVENIO_SERVER=https://data.caltech.edu\nexport INVENIO_TOKEN=qKLoOH0KYf4D98PGYQGnC09hiuqw3Y1SZllYnonRVzGJbWz2\nexport GITHUB_TOKEN=ghp_wQXp6sy3AsKyyEo4l9esHNxOdo6T34Zsthz\n```\n\nOnce these are set, use of IGA can be as simple as providing a URL for a release in GitHub. For example, the following command creates a draft record (the `-d` option is short for `--draft`) for another project in GitHub and tells IGA to open (the `-o` option is short for `--open`) the newly-created InvenioRDM entry in a web browser:\n\n```shell\niga -d -o https://github.com/mhucka/taupe/releases/tag/v1.2.0\n```\n\nMore options are described in the section on [detailed usage information](#usage) below.\n\n\n### Configuring and running IGA as a GitHub Actions workflow\n\nAfter doing the [GitHub Actions installation](#configuring-and-running-iga-as-a-github-actions-workflow) steps and [obtaining an InvenioRDM token](#getting-an-inveniordm-token), one more step is needed: the token must be stored as a \"secret\" in your GitHub repository.\n\n1. Go to the _Settings_ page of your GitHub repository<p align=\"center\"><img src=\"https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/github-tabs.png\" alt=\"Screenshot of repository tabs in GitHub\" width=\"85%\"></p>\n2. In the left-hand sidebar, find _Secrets and variables_ in the Security section, click on it to reveal _Actions_ underneath, then click on _Actions_<p align=\"center\"><img src=\"https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/github-sidebar-secrets.png\" alt=\"Screenshot of GitHub secrets sidebar item\" width=\"40%\"></p>\n3. In the next page, click the green <kbd>New repository secret</kbd> button<p align=\"center\"><img src=\"https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/github-secrets.png\" alt=\"Screenshot of GitHub secrets interface\" width=\"60%\"></p>\n4. Name the variable `INVENIO_TOKEN` and paste in your InvenioRDM token\n5. Finish by clicking the green <kbd>Add secret</kbd> button\n\n#### Testing the workflow\n\nAfter setting up the workflow and storing the InvenioRDM token in your repository on GitHub, it's a good idea to run the workflow manually to test that it works as expected.\n\n1. Go to the _Actions_ tab in your repository and click on the name of the workflow in the sidebar on the left<p align=\"center\"><img src=\"https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/github-run-workflow.png\" alt=\"Screenshot of GitHub actions workflow list\" width=\"90%\"></p>\n2. Click the <kbd>Run workflow</kbd> button in the right-hand side of the blue strip\n3. In the pull-down, change the value of \"Mark the record as a draft\" to `true`<p align=\"center\"><img src=\"https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/github-workflow-options-circled.png\" alt=\"Screenshot of GitHub Actions workflow menu\" width=\"40%\"></p>\n4. Click the green <kbd>Run workflow</kbd> button near the bottom\n5. Refresh the web page and a new line will be shown named after your workflow file<p align=\"center\"><img src=\"https://github.com/caltechlibrary/iga/raw/main/docs/_static/media/github-running-workflow.png\" alt=\"Screenshot of a running workflow in GitHub Actions\" width=\"90%\"></p>\n6. Click the title of the workflow to see the IGA workflow progress and results\n\n\n#### Running the workflow when releasing software\n\nOnce the personal access token from InvenioRDM is stored as a GitHub secret, the workflow should run automatically every time a new release is made on GitHub – no further action should be needed. You can check the results (and look for errors if something went wrong) by going to the _Actions_ tab in your GitHub repository.\n\n\n## Usage\n\nThis section provides detailed information about IGA's operation and options to control it.\n\n### Identifying the InvenioRDM server\n\nThe server address must be provided either as the value of the option `--invenio-server` or in an environment variable named `INVENIO_SERVER`. If the server address does not begin with `https://`, IGA will prepend it automatically.\n\n### Providing an InvenioRDM access token\n\nA personal access token (PAT) for making API calls to the InvenioRDM server must be also supplied when invoking IGA. The preferred method is to set the value of the environment variable `INVENIO_TOKEN`. Alternatively, you can use the option `--invenio-token` to pass the token on the command line, but **you are strongly advised to avoid this practice because it is insecure**.\n\nTo obtain a PAT from an InvenioRDM server, first log in to the server, then visit the page at `/account/settings/applications` and use the interface there to create a token. The token will be a long string of alphanumeric characters such as `OH0KYf4PGYQGnCM4b53ejSGicOC4s4YnonRVzGJbWxY`; set the value of the variable `INVENIO_TOKEN` to this string.\n\n### Providing a GitHub access token\n\nIt _may_ be possible to run IGA without providing a GitHub access token. GitHub allows up to 60 API calls per minute when running without credentials, and though IGA makes several API calls to GitHub each time it runs, for some public repositories IGA will not hit the limit. However, if you are archiving a private repository, run IGA multiple times in a row, or the repository has many contributors, then you will need to supply a GitHub access token. The preferred way of doing that is to set the value of the environment variable `GITHUB_TOKEN`. Alternatively, you can use the option `--github-token` to pass the token on the command line, but **you are strongly advised to avoid this practice because it is insecure**. To obtain a PAT from GitHub, visit [https://docs.github.com/en/authentication](https://docs.github.com/en/authentication) and follow the instructions for creating a \"classic\" personal access token.\n\nNote that when you run IGA as a GitHub Actions workflow, you do not need to create or set a GitHub token because it is obtained automatically by the GitHub Actions workflow.\n\n### Specifying a GitHub release\n\nA GitHub release can be specified to IGA in one of two mutually-exclusive ways:\n\n1. The full URL of the web page on GitHub of a tagged release. In this case,\n the URL must be the final argument on the command line invocation of IGA\n and the options `--account` and `--repo` must be omitted.\n2. A combination of _account name_, _repository name_, and _tag_. In this\n case, the final argument on the command line must be the _tag_, and in\n addition, values for the options `--account` and `--repo` must be provided.\n\nHere's an example using approach #1 (assuming environment variables `INVENIO_SERVER`, `INVENIO_TOKEN`, and `GITHUB_TOKEN` have all been set):\n\n```shell\niga https://github.com/mhucka/taupe/releases/tag/v1.2.0\n```\n\nand here's the equivalent using approach #2:\n\n```shell\niga --github-account mhucka --github-repo taupe v1.2.0\n```\n\nNote that when using this form of the command, the release tag (`v1.2.0` above) must be the last item given on the command line.\n\n### Gathering metadata for an InvenioRDM record\n\nThe record created in InvenioRDM is constructed using information obtained using GitHub's API as well as several other APIs as needed. The information includes the following:\n\n* (if one exists) a `codemeta.json` file in the GitHub repository\n* (if one exists) a `CITATION.cff` file in the GitHub repository\n* data available from GitHub for the release\n* data available from GitHub for the repository\n* data available from GitHub for the account of the owner\n* data available from GitHub for the accounts of repository contributors\n* file assets associated with the GitHub release\n* data available from ORCID.org for ORCID identifiers\n* data available from ROR.org for Research Organization Registry identifiers\n* data available from DOI.org, NCBI, Google Books, & others for publications\n* data available from spdx.org for software licenses\n\nIGA tries to use [`CodeMeta.json`](https://codemeta.github.io) first and [`CITATION.cff`](https://citation-file-format.github.io) second to fill out the fields of the InvenioRDM record. If neither of those files are present, IGA uses values from the GitHub repository instead. You can make it always use all sources of info with the option `--all-metadata`. Depending on how complete and up-to-date your `CodeMeta.json` and `CITATION.cff` are, this may or may not make the record more comprehensive and may or may not introduce redundancies or unwanted values.\n\nTo override the auto-created metadata, use the option `--read-metadata` followed by the path to a JSON file structured according to the InvenioRDM schema used by the destination server. When `--read-metadata` is provided, IGA does _not_ extract the data above, but still obtains the file assets from GitHub.\n\n### Specifying GitHub file assets\n\nBy default, IGA attaches to the InvenioRDM record _only_ the ZIP file asset created by GitHub for the release. To make IGA attach all assets associated with the GitHub release, use the option `--all-assets`.\n\nTo upload specific file assets and override the default selections made by IGA, you can use the option `--file` followed by a path to a file to be uploaded. You can repeat the option `--file` to upload multiple file assets. Note that if `--file` is provided, then IGA _does not use any file assets from GitHub_; it is the user's responsibility to supply all the files that should be uploaded.\n\nIf both `--read-metadata` and `--file` are used, then IGA does not actually contact GitHub for any information.\n\n### Handling communities\n\nTo submit your record to a community, use the `--community` option together with a community name. The option `--list-communities` can be used to get a list of communities supported by the InvenioRDM server. Note that submitting a record to a community means that the record will not be finalized and will not be publicly visible when IGA finishes; instead, the record URL that you receive will be for a draft version, pending review by the community moderators.\n\n### Indicating draft versus published records\n\nIf the `--community` option is not used, then by default, IGA will finalize and publish the record. To make it stop short and leave the record as a draft instead, use the option `--draft`. The draft option also takes precedence over the community option: if you use both `--draft` and `--community`, IGA will stop after creating the draft record and will _not_ submit it to the community. (You can nevertheless submit the record to a community manually once the draft is created, by visiting the record's web page and using the InvenioRDM interface there.)\n\n### Versioning records\n\nThe option `--parent-record` can be used to indicate that the record being constructed is a new version of an existing record. This will make IGA use the InvenioRDM API for [record versioning](https://inveniordm.docs.cern.ch/releases/versions/version-v2.0.0/#versioning-support). The newly-created record will be linked to a parent record identified by the value passed to `--parent-record`. The value must be either an InvenioRDM record identifier (which is a sequence of alphanumeric characters of the form _XXXXX-XXXXX_, such as `bknz4-bch35`, generated by the InvenioRDM server), or a URL to the landing page of the record in the InvenioRDM server. (Note that such URLs end in the record identifier.) Here is an example of using this option:\n\n```shell\niga --parent-record xbcd4-efgh5 https://github.com/mhucka/taupe/releases/tag/v1.2.0\n```\n\n### Other options recognized by IGA\n\nRunning IGA with the option `--save-metadata` will make it create a metadata record, but instead of uploading the record (and any assets) to the InvenioRDM server, IGA will write the result to the given destination. This can be useful not only for debugging but also for creating a starting point for a custom metadata record: first run IGA with `--save-metadata` to save a record to a file, edit the result, then finally run IGA with the `--read-metadata` option to use the modified record to create a release in the InvenioRDM server.\n\nThe `--mode` option can be used to change the run mode. Four run modes are available: `quiet`, `normal`, `verbose`, and `debug`. The default mode is `normal`, in which IGA prints a few messages while it's working. The mode `quiet` will make it avoid printing anything unless an error occurs, the mode `verbose` will make it print a detailed trace of what it is doing, and the mode `debug` will make IGA even more verbose. In addition, in `debug` mode, IGA will drop into the `pdb` debugger if it encounters an exception during execution. On Linux and macOS, debug mode also installs a signal handler on signal USR1 that causes IGA to drop into the `pdb` debugger if the signal USR1 is received. (Use `kill -USR1 NNN`, where NNN is the IGA process id.)\n\nBy default, informational output is sent to the standard output (normally the terminal console). The option `--log-dest` can be used to send the output to the given destination instead. The value can be `-` (i.e., a dash) to indicate console output, or it can be a file path to send the output to the file. A special exception is that even if a log destination is given, IGA will still print the final record URL to stdout. This makes it possible to invoke IGA from scripts that capture the record URL while still saving diagnostic output in case debugging is needed.\n\nBy default, IGA prints only the record URL when done. The option `--print-doi` will make it also print the DOI of the record. (Note that this only works when publishing records; if options `--draft` or `--community` are used, then there will be no DOI. In those case, only the URL will be printed.)\n\nReading and writing large files may take a long time; on the other hand, IGA should not wait forever on network operations before reporting an error if a server or network becomes unresponsive. To balance these conflicting needs, IGA automatically scales its network timeout based on file sizes. To override its adaptive algorithm and set an explicit timeout value, use the option `--timeout` with a value in seconds.\n\nIf given the `--version` option, this program will print its version and other information, and exit without doing anything else.\n\nRunning IGA with the option `--help` will make it print help text and exit without doing anything else.\n\n### Summary of command-line options\n\nAs explain above, IGA takes one required argument on the command line: either (1) the full URL of a web page on GitHub of a tagged release, or (2) a release tag name which is to be used in combination with options `--github-account` and `--github-repo`. The following table summarizes all the command line options available.\n\n| Long form option | Short | Meaning | Default | |\n|------------------------|----------|--------------------------------------|---------|---|\n| `--all-assets` | `-A` | Attach all GitHub assets | Attach only the release source ZIP| |\n| `--all-metadata` | `-M` | Include additional metadata from GitHub | Favor CodeMeta & CFF | |\n| `--community` _C_ | `-c` _C_ | Submit record to RDM community _C_ | Don't submit record to any community | |\n| `--draft` | `-d` | Mark the RDM record as a draft | Publish record when done | |\n| `--file` _F_ | `-f` _F_ | Upload local file _F_ instead of GitHub assets | Upload only GitHub assets | \u2691 |\n| `--github-account` _A_ | `-a` _A_ | Look in GitHub account _A_ | Get account name from release URL | \u272f |\n| `--github-repo` _R_ | `-r` _R_ | Look in GitHub repository _R_ of account _A_ | Get repo name from release URL | \u272f |\n| `--github-token` _T_ | `-t` _T_ | Use GitHub access token _T_| Use value in env. var. `GITHUB_TOKEN` | |\n| `--help` | `-h` | Print help info and exit | | |\n| `--invenio-server` _S_ | `-s` _S_ | Send record to InvenioRDM server at address _S_ | Use value in env. var. `INVENIO_SERVER` | |\n| `--invenio-token` _K_ | `-k` _K_ | Use InvenioRDM access token _K_ | Use value in env. var. `INVENIO_TOKEN` | |\n| `--list-communities` | `-L` | List communities available for use with `--community` | | |\n| `--log-dest` _L_ | `-l` _L_ | Write log output to destination _L_ | Write to terminal | \u2690 |\n| `--mode` _M_ | `-m` _M_ | Run in mode `quiet`, `normal`, `verbose`, or `debug` | `normal` | |\n| `--open` | `-o` | Open record's web page in a browser when done | Do nothing when done | |\n| `--parent-record` _N_ | `-p` _N_ | Make this a new version of existing record _N_ | New record is unrelated to other records | \u2756 |\n| `--print-doi` | `-i` | Print both the DOI & record URL when done | Print only the record URL | |\n| `--read-metadata` _R_ | `-R` _R_ | Read metadata record from file _R_; don\\'t build one | Build metadata record | |\n| `--save-metadata` _D_ | `-S` _D_ | Save metadata record to file _D_; don\\'t upload it | Upload to InvenioRDM server | |\n| `--timeout` _X_ | `-T` _X_ | Wait on network operations a max of _X_ seconds | Auto-adjusted based on file size | |\n| `--version` | `-V` | Print program version info and exit | | |\n\n\u2691 Can repeat the option to specify multiple files.<br>\n\u2690 To write to the console, use the character `-` as the value of _OUT_; otherwise, _OUT_ must be the name of a file where the output should be written.<br>\n\u272f When using `--github-account` and `--github-repo`, the last argument on the command line must be a release tag name.<br>\n\u2756 The record identifier must be given either as a sequence of alphanumeric characters of the form _XXXXX-XXXXX_ (e.g., `bknz4-bch35`), or as a URL to the landing page of an existing record in the InvenioRDM server.\n\n### Return values\n\nThis program exits with a return status code of 0 if no problem is encountered. Otherwise, it returns a nonzero status code. The following table lists the possible values:\n\n| Code | Meaning |\n|:----:|----------------------------------------------------------|\n| 0 | success – program completed normally |\n| 1 | interrupted |\n| 2 | encountered a bad or missing value for an option |\n| 3 | encountered a problem with a file or directory |\n| 4 | encountered a problem interacting with GitHub |\n| 5 | encountered a problem interacting with InvenioRDM |\n| 6 | the personal access token was rejected |\n| 7 | an exception or fatal error occurred |\n\n### Adding a DOI badge to your GitHub repository\n\nOnce you have set up the IGA workflow in your GitHub repository, you may wish to add a DOI badge to your repository's README file. It would be a chore to keep updating the DOI value in this badge every time a new release is made, and thankfully, it's not necessary: it's possible to make the badge get the current DOI value dynamically. Here is how:\n\n1. _After_ you have at least one release archived in your InvenioRDM server, find out the DOI of that release in InvenioRDM, and extract the tail end of that DOI. The DOI assigned by InvenioRDM will be a string such as `10.22002/zsmem-2pg20`; the tail end is the `zsmem-2pg20` part. (Your DOI and tail portion will be different.)\n2. Let <i><b><code>SERVERURL</code></b></i> stand for the URL for your InvenioRDM server, and let <i><b><code>IDENTIFIER</code></b></i> stand for the identifier portion of the DOI. In your `README.md` file, write the DOI badge as follows (without line breaks):\n\n <pre><code>[![DOI](https://img.shields.io/badge/dynamic/json.svg?label=DOI&query=$.pids.doi.identifier&uri=<i><b><code>SERVERURL</code></b></i>/api/records/<i><b><code>IDENTIFIER</code></b></i>/versions/latest)](<i><b><code>SERVERURL</code></b></i>/records/<i><b><code>IDENTIFIER</code></b></i>/latest)</code></pre>\n\nFor example, for CaltechDATA and the archived IGA releases there,\n\n* <i><b><code>SERVERURL</code></b></i> = `http://data.caltech.edu`\n* <i><b><code>IDENTIFIER</code></b></i> = `zsmem-2pg20`\n\nwhich leads to the following badge string for IGA's `README.md` file:\n\n```txt\n[![DOI](https://img.shields.io/badge/dynamic/json.svg?label=DOI&query=$.pids.doi.identifier&uri=https://data.caltech.edu/api/records/zsmem-2pg20/versions/latest)](https://data.caltech.edu/records/zsmem-2pg20/latest)\n```\n\n<br>The result looks like this: <div align=\"center\">\n\n[![DOI](https://img.shields.io/badge/dynamic/json.svg?label=DOI&query=$.pids.doi.identifier&uri=https://data.caltech.edu/api/records/zsmem-2pg20/versions/latest)](https://data.caltech.edu/records/zsmem-2pg20/latest)\n\n</div>\n\nYou can change the look of the badge by using style parameters. Please refer to the [Shields.io](https://shields.io/badges/static-badge) documentation for static badges.\n\n\n## Known issues and limitations\n\nThe following are known issues and limitations.\n\n* As of mid-2023, InvenioRDM requires names of record creators and other contributors to be split into given (first) and family (surname). This is problematic for multiple reasons. The first is that mononyms are common in many countries: a person's name may legitimately be only a single word which is not conceptually a \"given\" or \"family\" name. To compound the difficulty for IGA, names are stored as single fields in GitHub account metadata, so unless a repository has a `codemeta.json` or `CITATION.cff` file (which allow authors more control over how they want their names represented), IGA is forced to try to split the single GitHub name string into two parts. _A foolproof algorithm for doing this does not exist_, so IGA will sometimes get it wrong. (That said, IGA goes to extraordinary lengths to try to do a good job.)\n* InvenioRDM requires that identities (creators, contributors, etc.) to be labeled as personal or organizational. The nature of identities is usually made clear in `codemeta.json` and `CITATION.cff` files. GitHub also provides a flag that is meant to be used to label organizational accounts, but sometimes people don't set the GitHub account information correctly. Consequently, if IGA has to use GitHub data to get (e.g.) the list of contributors on a project, it may mislabel identities in the InvenioRDM record it produces.\n* Some accounts on GitHub are software automation or \"bot\" accounts, but are not labeled as such. These accounts are generally indistinguishable from human accounts on GitHub, so if they're not labeled as bot or organizational accounts in GitHub, IGA can't recognize that they're humans. If such an account is the creator of a release in GitHub, and IGA tries to use its name-splitting algorithm on the name of the account, it may produce a nonsensical result. For example, it might turn \"Travis CI\" into an entry with a first name of \"Travis\" and last name of \"CI\".\n* Funder and funding information can only be specified in `codemeta.json` files; neither GitHub nor `CITATION.cff` have provisions to store this kind of metadata. The CodeMeta specification defines two fields for this purpose: `funder` and `funding`. Unfortunately, these map imperfectly to the requirements of InvenioRDM's metadata format. In addition, people don't always follow the CodeMeta guidelines, and sometimes they write funding information as text strings (instead of structured objects), the interpretation of which would require software that can recognize grant and funding agency information from free-text descriptions. This combination of factors means IGA often can't fill in the funding metadata in InvenioRDM records even if there is some funding information in the `codemeta.json` file.\n\n\n## Getting help\n\nIf you find an issue, please submit it in [the GitHub issue tracker](https://github.com/caltechlibrary/iga/issues) for this repository.\n\n\n## Contributing\n\nYour help and participation in enhancing IGA is welcome! Please visit the [guidelines for contributing](CONTRIBUTING.md) for some tips on getting started.\n\n\n## License\n\nSoftware produced by the Caltech Library is Copyright \u00a9 2022\u20132024 California Institute of Technology. This software is freely distributed under a BSD-style license. Please see the [LICENSE](LICENSE) file for more information.\n\n## Acknowledgments\n\nThis work was funded by the California Institute of Technology Library.\n\nIGA uses multiple other open-source packages, without which it would have taken much longer to write the software. I want to acknowledge this debt. In alphabetical order, the packages are:\n\n* [Aenum](https://github.com/ethanfurman/aenum) – package for advanced enumerations\n* [Arrow](https://pypi.org/project/arrow/) – a library for creating & manipulating dates\n* [Boltons](https://github.com/mahmoud/boltons/) – package of miscellaneous Python utilities\n* [caltechdata_api](https://github.com/caltechlibrary/caltechdata_api) – package for using the CaltechDATA API\n* [CommonPy](https://github.com/caltechlibrary/commonpy) – a collection of commonly-useful Python functions\n* [dirtyjson](https://github.com/codecobblers/dirtyjson) – JSON decoder that copes with problematic JSON files and reports useful error messages\n* [flake8](https://github.com/pycqa/flake8) – Python code linter and style analyzer\n* [Ginza](https://github.com/megagonlabs/ginza) – Japanese NLP Library\n* [httpx](https://www.python-httpx.org) – HTTP client library that supports HTTP/2\n* [humanize](https://github.com/jmoiron/humanize) – make numbers more easily readable by humans\n* [idutils](https://github.com/inveniosoftware/idutils) – package for validating and normalizing various kinds of persistent identifiers\n* [ipdb](https://github.com/gotcha/ipdb) – the IPython debugger\n* [iptools](https://github.com/bd808/python-iptools) – utilities for dealing with IP addresses\n* [isbnlib](https://github.com/xlcnd/isbnlib) – utilities for dealing with ISBNs\n* [Jamo](https://github.com/JDongian/python-jamo) – Hangul character analysis\n* [json5](https://github.com/dpranke/pyjson5) – extended JSON format parser\n* [latexcodec](https://github.com/mcmtroffaes/latexcodec) – lexer and codec to work with LaTeX code in Python\n* [Lingua](https://github.com/pemistahl/lingua) – language detection library\n* [linkify-it-py](https://github.com/tsutsu3/linkify-it-py) – a link recognition library with full unicode support\n* [lxml](https://lxml.de) – an XML parsing library\n* [Markdown](https://python-markdown.github.io) – Python package for working with Markdown\n* [markdown-checklist](https://github.com/FND/markdown-checklist) – GitHub-style checklist extension for Python Markdown package\n* [mdx-breakless-lists](https://github.com/adamb70/mdx-breakless-lists) – GitHub-style Markdown lists that don't require a line break above them\n* [mdx_linkify](https://github.com/daGrevis/mdx_linkify) – extension for Python Markdown will convert text that look like links to HTML anchors\n* [MyST-parser](https://github.com/executablebooks/MyST-Parser) – A Sphinx and Docutils extension to parse an extended version of Markdown\n* [nameparser](https://github.com/derek73/python-nameparser) – package for parsing human names into their individual components\n* [probablepeople](https://github.com/datamade/probablepeople) – package for parsing names into components using ML-based techniques\n* [pybtex](https://pybtex.org) – BibTeX parser and formatter\n* [pybtex-apa7-style](https://pypi.org/project/pybtex-apa7-style/) – plugin for [pybtex](https://pybtex.org) that provides APA7 style formatting\n* [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) – extensions for Python Markdown\n* [pytest](https://docs.pytest.org/en/stable/) – testing framework\n* [pytest-cov](https://github.com/pytest-dev/pytest-cov) – coverage reports for use with `pytest`\n* [pytest-mock](https://pypi.org/project/pytest-mock/) – wrapper around the `mock` package for use with `pytest`\n* [PyYAML](https://pyyaml.org) – YAML parser\n* [Rich](https://github.com/Textualize/rich) – library for writing styled text to the terminal\n* [rich-click](https://github.com/ewels/rich-click) – CLI interface built on top of [Rich](https://github.com/Textualize/rich)\n* [setuptools](https://github.com/pypa/setuptools) – library for `setup.py`\n* [Sidetrack](https://github.com/caltechlibrary/sidetrack) – simple debug logging/tracing package\n* [spaCy](https://spacy.io) – Natural Language Processing package\n* [spacy-alignments](https://github.com/explosion/spacy-alignments) – alternate alignments for [spaCy](https://spacy.io)\n* [spacy-legacy](https://pypi.org/project/spacy-legacy/) – [spaCy](https://spacy.io) legacy functions and architectures for backwards compatibility\n* [spacy-loggers](https://github.com/explosion/spacy-loggers) – loggers for [spaCy](https://spacy.io)\n* [spacy-pkuseg](https://github.com/explosion/spacy-pkuseg) – Chinese word segmentation toolkit for [spaCy](https://spacy.io)\n* [spacy-transformers](https://spacy.io) – pretrained Transformers for [spaCy](https://spacy.io)\n* [Sphinx](https://www.sphinx-doc.org/en/master/) – documentation generator for Python\n* [sphinx-autobuild](https://pypi.org/project/sphinx-autobuild/) – rebuild Sphinx docs automatically\n* [sphinx-material](https://bashtage.github.io/sphinx-material/) – a responsive Material Design theme for Sphinx\n* [sphinxcontrib-mermaid](https://github.com/mgaitan/sphinxcontrib-mermaid) – support Mermaid diagrams in Sphinx docs\n* [StringDist](https://github.com/obulkin/string-dist) – library for calculating string distances\n* [Twine](https://github.com/pypa/twine) – utilities for publishing Python packages on [PyPI](https://pypi.org)\n* [url-normalize](https://github.com/niksite/url-normalize) – URI/URL normalization utilities\n* [validators](https://github.com/kvesteri/validators) – data validation package for Python\n* [wheel](https://pypi.org/project/wheel/) – setuptools extension for building wheels\n\n<div align=\"center\">\n <br>\n <a href=\"https://www.caltech.edu\">\n <img width=\"100\" height=\"100\" alt=\"Caltech logo\" src=\"https://github.com/caltechlibrary/iga/raw/main/.graphics/caltech-round.png\">\n </a>\n</div>\n\n\n",
"bugtrack_url": null,
"license": "https://github.com/caltechlibrary/iga/blob/main/LICENSE",
"summary": "The InvenioRDM GitHub Archiver (IGA) automatically archives GitHub releases in an InvenioRDM repository.",
"version": "1.3.5",
"project_urls": {
"Bug Tracker": "https://github.com/caltechlibrary/iga/issues",
"Homepage": "https://github.com/caltechlibrary/iga",
"Source Code": "https://github.com/caltechlibrary/iga"
},
"split_keywords": [
"python",
" applications"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7c0f35cbbdf59c24d21158f0aca5d2c636ad2451cef0605296f2720d10fb4df0",
"md5": "ef9ada77fd8d7d6a5615383c71ba0cbc",
"sha256": "62bd42f1ae05d7cab0b80fca4cfdb3d0f60c0bcc5b2bfa3e17016804aee01580"
},
"downloads": -1,
"filename": "iga-1.3.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ef9ada77fd8d7d6a5615383c71ba0cbc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 115506,
"upload_time": "2024-11-07T20:33:05",
"upload_time_iso_8601": "2024-11-07T20:33:05.105983Z",
"url": "https://files.pythonhosted.org/packages/7c/0f/35cbbdf59c24d21158f0aca5d2c636ad2451cef0605296f2720d10fb4df0/iga-1.3.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a4b337bd455e16c2d62efd995389981df552cc21119aa32583b9e9984478a075",
"md5": "639387f271b870f5c900c59e9e0efae0",
"sha256": "c175895af07c614a9295680b6d965ef1023662e626594e6c98342572c159b877"
},
"downloads": -1,
"filename": "iga-1.3.5.tar.gz",
"has_sig": false,
"md5_digest": "639387f271b870f5c900c59e9e0efae0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 135570,
"upload_time": "2024-11-07T20:33:07",
"upload_time_iso_8601": "2024-11-07T20:33:07.017244Z",
"url": "https://files.pythonhosted.org/packages/a4/b3/37bd455e16c2d62efd995389981df552cc21119aa32583b9e9984478a075/iga-1.3.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-07 20:33:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "caltechlibrary",
"github_project": "iga",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "iga"
}