gimie


Namegimie JSON
Version 0.6.1 PyPI version JSON
download
home_pagehttps://github.com/SDSC-ORD/gimie
SummaryExtract structured metadata from git repositories.
upload_time2023-11-01 12:56:48
maintainer
docs_urlNone
authorSwiss Data Science Center
requires_python>=3.8,<4.0
licenseApache-2.0
keywords metadata git extraction linked-data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![gimie](docs/logo.svg)](https://github.com/SDSC-ORD/gimie)

[![PyPI version](https://badge.fury.io/py/gimie.svg)](https://badge.fury.io/py/gimie) [![Python Poetry Test](https://github.com/SDSC-ORD/gimie/actions/workflows/poetry-pytest.yml/badge.svg)](https://github.com/SDSC-ORD/gimie/actions/workflows/poetry-pytest.yml) [![docs](https://github.com/SDSC-ORD/gimie/actions/workflows/sphinx-docs.yml/badge.svg)](https://sdsc-ord.github.io/gimie) [![Coverage Status](https://coveralls.io/repos/github/SDSC-ORD/gimie/badge.svg?branch=main)](https://coveralls.io/github/SDSC-ORD/gimie?branch=main)

Gimie (GIt Meta Information Extractor) is a python library and command line tool to extract structured metadata from git repositories.


## Context
Scientific code repositories contain valuable metadata which can be used to enrich existing catalogues, platforms or databases. This tool aims to easily extract structured metadata from a generic git repositories. It can extract extract metadata from the Git provider (GitHub or GitLab) or from the git index itself.

----------------------------------------------------------------------

Using Gimie: easy peasy, it's a 3 step process.

## STEP 1: Installation

To install the stable version on PyPI:

```shell
pip install gimie
```

To install the dev version from github:

```shell
pip install git+https://github.com/SDSC-ORD/gimie.git@main#egg=gimie
```

Gimie is also available as a docker container hosted on the [Github container registry](https://github.com/SDSC-ORD/gimie/pkgs/container/gimie):

```shell
docker pull ghcr.io/sdsc-ord/gimie:latest

# The access token can be provided as an environment variable
docker run -e ACCESS_TOKEN=$ACCESS_TOKEN ghcr.io/sdsc-ord/gimie:latest gimie data <repo>
```

## STEP 2 : Set your credentials

In order to access the github api, you need to provide a github token with the `read:org` scope.

### A. Create access tokens

New to access tokens? Or don't know how to get your Github / Gitlab token ?

Have no fear, see
[here for Github tokens](https://docs.github.com/en/enterprise-server@3.4/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) and [here for Gitlab tokens](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html).
(Note: tokens are as precious as passwords! Treat them as such.)

### B. Set your access tokens via the Terminal

Gimie will use your access tokens to gather information for you. If you want info about a Github repo, Gimie needs your Github token; if you want info about a Gitlab Project then Gimie needs your Gitlab token.

Add your tokens one by one in your terminal:
your Github token:
```bash
export GITHUB_TOKEN=
```
and/or your Gitlab token:
```bash
export GITLAB_TOKEN=
```

## STEP 3: GIMIE info ! Run Gimie

### As a command line tool

```shell
gimie data https://github.com/numpy/numpy
```
(want a Gitlab project instead? Just replace the URL in the command line)

### As a python library

```python
from gimie.project import Project
proj = Project("https://github.com/numpy/numpy)

# To retrieve the rdflib.Graph object
g = proj.to_graph()

# To retrieve the serialized graph
proj.serialize(format='ttl')
```

Or to extract only from a specific source:
```python
from gimie.sources.github import GithubExtractor
gh = GithubExtractor('https://github.com/SDSC-ORD/gimie')
gh.extract()

# To retrieve the rdflib.Graph object
g = gh.to_graph()

# To retrieve the serialized graph
gh.serialize(format='ttl')
```
[For a GitLab project, replace `gimie.sources.github` by `gimie.sources.gitlab`, `GithubExtractor` by `GitlabExtractor`, as well as the URL to the GitLab project.]

## Outputs

The default output is [Turtle](https://www.w3.org/TR/turtle/), a textual syntax for [RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework) data model. We follow the schema recommended by [codemeta](https://codemeta.github.io/).
Supported formats are turtle, json-ld and n-triples (by specifying the `--format` argument in your call i.e. `gimie data https://github.com/numpy/numpy --format 'ttl'`).

With no specifications, Gimie will print results in the terminal. Want to save Gimie output to a file? Add your file path to the end : `gimie data https://github.com/numpy/numpy > path_to_output/gimie_output.ttl`

----------------------------------------------------------------------

## Contributing

All contributions are welcome. New functions and classes should have associated tests and docstrings following the [numpy style guide](https://numpydoc.readthedocs.io/en/latest/format.html).

The code formatting standard we use is [black](https://github.com/psf/black), with `--line-length=79` to follow [PEP8](https://peps.python.org/pep-0008/) recommendations. We use [pytest](https://docs.pytest.org/en/7.2.x/) as our testing framework. This project uses [pyproject.toml](https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/) to define package information, requirements and tooling configuration.

### For development:

activate a conda or virtual environment with Python 3.8 or higher

```shell
git clone https://github.com/SDSC-ORD/gimie && cd gimie
make install
```

run tests:

```shell
make test
```

run checks:

```shell
make check
```
for an easier use Github/Gitlab APIs, place your access tokens in the `.env` file: (and don't worry, the `.gitignore` will ignore them when you push to GitHub)

```
cp .env.dist .env
```

build documentation:

```shell
make doc
```

## Releases and Publishing on Pypi

Releases are done via github release

- a release will trigger a github workflow to publish the package on Pypi
- Make sure to update to a new version in `pyproject.toml` before making the release
- It is possible to test the publishing on Pypi.test by running a manual workflow: go to github actions and run the Workflow: 'Publish on Pypi Test'

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SDSC-ORD/gimie",
    "name": "gimie",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "metadata,git,extraction,linked-data",
    "author": "Swiss Data Science Center",
    "author_email": "contact@datascience.ch",
    "download_url": "https://files.pythonhosted.org/packages/df/73/781d00e88aefa40abbf3c848d911871e92e1944b00fc755487746f228d5c/gimie-0.6.1.tar.gz",
    "platform": null,
    "description": "[![gimie](docs/logo.svg)](https://github.com/SDSC-ORD/gimie)\n\n[![PyPI version](https://badge.fury.io/py/gimie.svg)](https://badge.fury.io/py/gimie) [![Python Poetry Test](https://github.com/SDSC-ORD/gimie/actions/workflows/poetry-pytest.yml/badge.svg)](https://github.com/SDSC-ORD/gimie/actions/workflows/poetry-pytest.yml) [![docs](https://github.com/SDSC-ORD/gimie/actions/workflows/sphinx-docs.yml/badge.svg)](https://sdsc-ord.github.io/gimie) [![Coverage Status](https://coveralls.io/repos/github/SDSC-ORD/gimie/badge.svg?branch=main)](https://coveralls.io/github/SDSC-ORD/gimie?branch=main)\n\nGimie (GIt Meta Information Extractor) is a python library and command line tool to extract structured metadata from git repositories.\n\n\n## Context\nScientific code repositories contain valuable metadata which can be used to enrich existing catalogues, platforms or databases. This tool aims to easily extract structured metadata from a generic git repositories. It can extract extract metadata from the Git provider (GitHub or GitLab) or from the git index itself.\n\n----------------------------------------------------------------------\n\nUsing Gimie: easy peasy, it's a 3 step process.\n\n## STEP 1: Installation\n\nTo install the stable version on PyPI:\n\n```shell\npip install gimie\n```\n\nTo install the dev version from github:\n\n```shell\npip install git+https://github.com/SDSC-ORD/gimie.git@main#egg=gimie\n```\n\nGimie is also available as a docker container hosted on the [Github container registry](https://github.com/SDSC-ORD/gimie/pkgs/container/gimie):\n\n```shell\ndocker pull ghcr.io/sdsc-ord/gimie:latest\n\n# The access token can be provided as an environment variable\ndocker run -e ACCESS_TOKEN=$ACCESS_TOKEN ghcr.io/sdsc-ord/gimie:latest gimie data <repo>\n```\n\n## STEP 2 : Set your credentials\n\nIn order to access the github api, you need to provide a github token with the `read:org` scope.\n\n### A. Create access tokens\n\nNew to access tokens? Or don't know how to get your Github / Gitlab token ?\n\nHave no fear, see\n[here for Github tokens](https://docs.github.com/en/enterprise-server@3.4/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) and [here for Gitlab tokens](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html).\n(Note: tokens are as precious as passwords! Treat them as such.)\n\n### B. Set your access tokens via the Terminal\n\nGimie will use your access tokens to gather information for you. If you want info about a Github repo, Gimie needs your Github token; if you want info about a Gitlab Project then Gimie needs your Gitlab token.\n\nAdd your tokens one by one in your terminal:\nyour Github token:\n```bash\nexport GITHUB_TOKEN=\n```\nand/or your Gitlab token:\n```bash\nexport GITLAB_TOKEN=\n```\n\n## STEP 3: GIMIE info ! Run Gimie\n\n### As a command line tool\n\n```shell\ngimie data https://github.com/numpy/numpy\n```\n(want a Gitlab project instead? Just replace the URL in the command line)\n\n### As a python library\n\n```python\nfrom gimie.project import Project\nproj = Project(\"https://github.com/numpy/numpy)\n\n# To retrieve the rdflib.Graph object\ng = proj.to_graph()\n\n# To retrieve the serialized graph\nproj.serialize(format='ttl')\n```\n\nOr to extract only from a specific source:\n```python\nfrom gimie.sources.github import GithubExtractor\ngh = GithubExtractor('https://github.com/SDSC-ORD/gimie')\ngh.extract()\n\n# To retrieve the rdflib.Graph object\ng = gh.to_graph()\n\n# To retrieve the serialized graph\ngh.serialize(format='ttl')\n```\n[For a GitLab project, replace `gimie.sources.github` by `gimie.sources.gitlab`, `GithubExtractor` by `GitlabExtractor`, as well as the URL to the GitLab project.]\n\n## Outputs\n\nThe default output is [Turtle](https://www.w3.org/TR/turtle/), a textual syntax for [RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework) data model. We follow the schema recommended by [codemeta](https://codemeta.github.io/).\nSupported formats are turtle, json-ld and n-triples (by specifying the `--format` argument in your call i.e. `gimie data https://github.com/numpy/numpy --format 'ttl'`).\n\nWith no specifications, Gimie will print results in the terminal. Want to save Gimie output to a file? Add your file path to the end : `gimie data https://github.com/numpy/numpy > path_to_output/gimie_output.ttl`\n\n----------------------------------------------------------------------\n\n## Contributing\n\nAll contributions are welcome. New functions and classes should have associated tests and docstrings following the [numpy style guide](https://numpydoc.readthedocs.io/en/latest/format.html).\n\nThe code formatting standard we use is [black](https://github.com/psf/black), with `--line-length=79` to follow [PEP8](https://peps.python.org/pep-0008/) recommendations. We use [pytest](https://docs.pytest.org/en/7.2.x/) as our testing framework. This project uses [pyproject.toml](https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/) to define package information, requirements and tooling configuration.\n\n### For development:\n\nactivate a conda or virtual environment with Python 3.8 or higher\n\n```shell\ngit clone https://github.com/SDSC-ORD/gimie && cd gimie\nmake install\n```\n\nrun tests:\n\n```shell\nmake test\n```\n\nrun checks:\n\n```shell\nmake check\n```\nfor an easier use Github/Gitlab APIs, place your access tokens in the `.env` file: (and don't worry, the `.gitignore` will ignore them when you push to GitHub)\n\n```\ncp .env.dist .env\n```\n\nbuild documentation:\n\n```shell\nmake doc\n```\n\n## Releases and Publishing on Pypi\n\nReleases are done via github release\n\n- a release will trigger a github workflow to publish the package on Pypi\n- Make sure to update to a new version in `pyproject.toml` before making the release\n- It is possible to test the publishing on Pypi.test by running a manual workflow: go to github actions and run the Workflow: 'Publish on Pypi Test'\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Extract structured metadata from git repositories.",
    "version": "0.6.1",
    "project_urls": {
        "Homepage": "https://github.com/SDSC-ORD/gimie"
    },
    "split_keywords": [
        "metadata",
        "git",
        "extraction",
        "linked-data"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "87cd26ff1d4210337d01e37401991b79ea5e7f59d8fc968c896ff88a19bd3eda",
                "md5": "1fbc04d0b010ff955ec8176c53de5f81",
                "sha256": "9b748ffc00f0dd9d00a45e51939225967571f4925556242dd6d5fc3ccd486cfe"
            },
            "downloads": -1,
            "filename": "gimie-0.6.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1fbc04d0b010ff955ec8176c53de5f81",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 30644,
            "upload_time": "2023-11-01T12:56:46",
            "upload_time_iso_8601": "2023-11-01T12:56:46.691831Z",
            "url": "https://files.pythonhosted.org/packages/87/cd/26ff1d4210337d01e37401991b79ea5e7f59d8fc968c896ff88a19bd3eda/gimie-0.6.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "df73781d00e88aefa40abbf3c848d911871e92e1944b00fc755487746f228d5c",
                "md5": "d095daf6b32440d8398851687e08625c",
                "sha256": "8f85b32d5919c666396d0bc2c027fe9bdb197e888d140c68feb3b5f67173d38c"
            },
            "downloads": -1,
            "filename": "gimie-0.6.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d095daf6b32440d8398851687e08625c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 22830,
            "upload_time": "2023-11-01T12:56:48",
            "upload_time_iso_8601": "2023-11-01T12:56:48.248780Z",
            "url": "https://files.pythonhosted.org/packages/df/73/781d00e88aefa40abbf3c848d911871e92e1944b00fc755487746f228d5c/gimie-0.6.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-01 12:56:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SDSC-ORD",
    "github_project": "gimie",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "gimie"
}
        
Elapsed time: 0.13342s