CodeMetaPy


NameCodeMetaPy JSON
Version 2.5.2 PyPI version JSON
download
home_pagehttps://github.com/proycon/codemetapy
SummaryGenerate and manage CodeMeta software metadata
upload_time2023-11-27 15:25:06
maintainer
docs_urlNone
authorMaarten van Gompel
requires_python
licenseGPL-3.0-only
keywords software metadata codemeta schema.org rdf linked data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Project Status: Active -- The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![GitHub build](https://github.com/proycon/codemetapy/actions/workflows/codemetapy.yml/badge.svg?branch=master)](https://github.com/proycon/codemetapy/actions/)
[![GitHub release](https://img.shields.io/github/release/proycon/codemetapy.svg)](https://GitHub.com/proycon/codemetapy/releases/)
[![Latest release in the Python Package Index](https://img.shields.io/pypi/v/codemetapy)](https://pypi.org/project/codemetapy/)
 
# Codemetapy

Codemetapy is a command-line tool to work with the [codemeta ](https://codemeta.github.io) software metadata standard.
Codemeta builds upon [schema.org](https://schema.org) and defines a vocabulary for describing software source code. It
maps various existing metadata standards to a unified vocabulary.

For more general information about the CodeMeta Project for defining
software metadata, see <https://codemeta.github.io>. In particular, new
users might want to start with the User Guide, while those looking to
learn more about JSON-LD and consuming existing codemeta files should
see the Developer Guide.

Using codemetapy you can generate a `codemeta.json` file, which
serialises using [JSON-LD](https://json-ld.org) , for
your software. At the moment it supports conversions from the following
existing metadata specifications:

* Python distutils/pip packages (`setup.py`/`pyproject.toml`)
* Java/Maven packages (`pom.xml`)
* NodeJS packages (`package.json`)
* Debian package (`apt show` output)
* Github API (when passed a github URL)
* GitLab API (when passed a GitLab URL)
* Web sites/services (see the section on software types and service below):
    * Simple metadata from HTML `<meta>` elements.
    * Script blocks using `application/json+ld`

It can also read and manipulate existing `codemeta.json` files as well
as parse simple AUTHORS/CONTRIBUTORS files. One of the most notable
features of codemetapy is that it allows chaining to successively update
a metadata description based on multiple sources. Codemetapy is used in
that way by the [codemeta-harvester](https://github.com/proycon/codemeta-harvester). 

**Note:** If you are looking for an all-in-one solution to automatically
generate a `codemeta.json` for your project, then
*[codemeta-harvester](https://github.com/proycon/codemeta-harvester) is the
best place to start*. It is a higher-level tool that automatically invokes
codemetapy on various sources it can automatically detect, and combined those into
a single codemeta representation.

## Installation

`pip install codemetapy`

## Usage

Query and convert any installed python package:

`$ codemetapy somepackage`

Output will be to standard output by default, to write it to an output
file instead, do either:

`$ codemetapy somepackage > codemeta.json`

or use the `-O` parameter:

`$ codemetapy -O codemeta.json somepackage`

If you are in the current working directory of any python project and
there is a `setup.py`or `pyproject.toml`, then you can simply call `codemetapy` without
arguments to output codemeta for the project. Codemetapy will
automatically run `python setup.py egg_info` if needed and parse it's output to
facilitate this:

`$ codemetapy`

The tool also supports adding properties through parameters:

`$ codemetapy --developmentStatus active somepackage > codemeta.json`

To read an existing codemeta.json and extend it:

`$ codemetapy -O codemeta.json codemeta.json somepackage`

or even:

`$ codemetapy -O codemeta.json codemeta.json codemeta2.json codemeta3.json`

This makes use of an important characteristic of codemetapy which is *composition*. When you specify multiple input sources, they will be interpreted as referring to the same resource.
Properties (on `schema:SoftwareSourceCode`) in the later resources will *overwrite* earlier properties. So if `codemeta3.json` specifies authors, all authors that were specified in `codemeta2.json` are lost rather than merged and the end result will have the authors from `codemeta3.json`. However, if `codemeta2.json` has a property that was not in `codemeta3.json`, say `deveopmentStatus`, then that will make it to the end rsult. In other words, the latest source always takes precedence. Any non-overlapping properties will be be merged. This functionality is heavily relied on by the higher-level tool [codemeta-harvester](https://github.com/proycon/codemeta-harvester).

If you want to start from scratch and build using command line parameters, use `/dev/null` as input, and make sure to pass some identifier and code repository:

`$ codemetapy --identifier some-id --codeRepository https://github.com/my/code /dev/null > codemeta.json`

This tool can also deal with debian packages by parsing the output of
`apt show` (albeit limited):

`$ apt show somepackage | codemetapy -i debian -`

Here `-` represents standard input, which enables you to use piping
solutions on a unix shell, `-i` denotes the input types, you can chain
as many as you want. The number of input types specifies must correspond
exactly to the number of input sources (the positional arguments).


## Some notes on Vocabulary

For `codemeta:developmentStatus`, codemetapy attempts to
assign full [repostatus](https://www.repostatus.org/) URIs whenever
possible For `schema:license`, full [SPDX](https://spdx.org) URIs are used where possible.

## Identifiers

We distinguish two types of identifiers, first there is the URI or [IRI](https://www.w3.org/TR/rdf11-concepts/#section-IRIs) 
that identifies RDF resources. It is a globally unique identifier and often looks like a URL. 

Codemetapy will assign new URIs for resources if and only if you pass a base URI using ``--baseuri``. Moreover, if you set this, codemetapy will *forcibly* set URIs over any existing ones, effectively assigning new identifiers. The previous identifier will then be covered via the `owl:sameAs` property instead. This allows you to ownership of all URIs.  Internally, codemetapy will create URIs for everything even if you don't specified a base URI (even for blank nodes), but these URIs are stripped again upon serialisation to JSON-LD.

The second identifier is the [schema:identifier](https://schema.org/identifier), of which there may even be multiple.
Codemetapy typically expects such an identifier to be a simple unspaced string holding a name for software. For example, a Python package name would make a good identifier. If this property is present, codemetapy will use it when generating URIs.
The `schema:identifier` property can be contrasted with `schema:name`, which is the human readable form of the name and may be more elaborate.
The identifier is typically also used for other identifiers (such as DOIs, ISBNs, etc), which should come in the following form:

```json
"identifier:" {
    "@type": "PropertyValue",
    "propertyID": "doi",
    "value": "10.5281/zenodo.6882966"
}
```

But short-hand forms such as ``doi:10.5281/zenodo.6882966`` or as a URL like `https://doi.org/10.5281/zenodo.6882966` are also recognised by this library.


## Software Types and services

Codemetapy (since 2.0) implements an extension to codemeta that allows
linking the software source code to the actual instantiation of the
software, with explicit regard for the interface type. This is done via
the `schema:targetProduct` property, which takes as range a
`schema:SoftwareApplication`, `schema:WebAPI`,
`schema:WebSite` or any of the extra types defined in
<https://github.com/SoftwareUnderstanding/software_types/> . This was
proposed in [this issue](https://github.com/codemeta/codemeta/issues/271)

This extension is enabled by default and can be disabled by setting the
`--strict` flag.

When you pass codemetapy a URL it will assume this is where the software
is run as a service, and attempt to extract metadata from the site and
encode is via `targetProduct`. For example, here we read an
existing `codemeta.json` and extend it with some place where
it is instantiated as a service:

`$ codemetapy codemeta.json https://example.org/`

If served HTML, codemetapy will use your `<script>` block
using `application/json+ld` if it provides a valid software types (as
mentioned above). For other HTML, codemetapy will simply extract some
metadata from HTML `<meta>` elements. Content negotation will be used
and the we favour json+ld, json and even yaml and XML over HTML.

(Note: the older Entypoint Extension from before codemetapy 2.0 is now deprecated)

## Graph

You can use codemetapy to generate one big knowledge graph expressing
multiple codemeta resources using the `--graph` parameter:

`$ codemetapy --graph resource1.json resource2.json`

This will produce JSON-LD output with multiple resources in the graph.

## Github API

Codemetapy can make use of the Github API to query metdata from GitHub,
but this allows only limited anonymous requests before you hit a limit.
To allow more requests, please set the environment variable
`$GITHUB_TOKEN` to a [personal access
token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token).

## GitLab API

Codemetapy can make use of the GitLab API to query metdata from GitLab,
but this allows only limited anonymous requests before you hit a limit.
To allow more requests, please set the environment variable
`$GITLAB_TOKEN` to a [personal access
token](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html).

## Integration in setup.py

You can integrate `codemeta.json` generation in your project's
`setup.py`, this will add an extra `python setup.py codemeta` command
that will generate a new metadata file or update an already existing
metadata file. Note that this must be run *after*
`python setup.py install` (or `python setup.py develop`).

To integrate this, add the following to your project's `setup.py`:

```python
try:
    from codemeta.codemeta import CodeMetaCommand
    cmdclass={
        'codemeta': CodeMetaCommand,
    }
except ImportError:
    cmdclass={}
```

And in your `setup()` call add the parameter:

```python
cmdclass=cmdclass
```

This will ensure your `setup.py` works in all cases, even if codemetapy
is not installed, and that the command will be available if codemetapy
is available.

If you want to ship your package with the generated `codemeta.json`,
then simply add a line saying `codemeta.json` to the file `MANIFEST.in`
in the root of your project.

## Acknowledgements

This work is conducted at the [KNAW Humanities Cluster](https://huc.knaw.nl/)'s
[Digital Infrastructure department](https://di.huc.knaw.nl/) in the scope of the 
[CLARIAH](https://www.clariah.nl) project (CLARIAH-PLUS, NWO grant 184.034.023) as
part of the FAIR Tool Discovery track of the Shared Development Roadmap.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/proycon/codemetapy",
    "name": "CodeMetaPy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "software metadata,codemeta,schema.org,rdf,linked data",
    "author": "Maarten van Gompel",
    "author_email": "proycon@anaproy.nl",
    "download_url": "https://files.pythonhosted.org/packages/ca/b9/af7d19cc299cb421d5fb2ff3dea3572f07a21d7eef68d9a79f04d9e07ddb/CodeMetaPy-2.5.2.tar.gz",
    "platform": null,
    "description": "[![Project Status: Active -- The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n[![GitHub build](https://github.com/proycon/codemetapy/actions/workflows/codemetapy.yml/badge.svg?branch=master)](https://github.com/proycon/codemetapy/actions/)\n[![GitHub release](https://img.shields.io/github/release/proycon/codemetapy.svg)](https://GitHub.com/proycon/codemetapy/releases/)\n[![Latest release in the Python Package Index](https://img.shields.io/pypi/v/codemetapy)](https://pypi.org/project/codemetapy/)\n \n# Codemetapy\n\nCodemetapy is a command-line tool to work with the [codemeta ](https://codemeta.github.io) software metadata standard.\nCodemeta builds upon [schema.org](https://schema.org) and defines a vocabulary for describing software source code. It\nmaps various existing metadata standards to a unified vocabulary.\n\nFor more general information about the CodeMeta Project for defining\nsoftware metadata, see <https://codemeta.github.io>. In particular, new\nusers might want to start with the User Guide, while those looking to\nlearn more about JSON-LD and consuming existing codemeta files should\nsee the Developer Guide.\n\nUsing codemetapy you can generate a `codemeta.json` file, which\nserialises using [JSON-LD](https://json-ld.org) , for\nyour software. At the moment it supports conversions from the following\nexisting metadata specifications:\n\n* Python distutils/pip packages (`setup.py`/`pyproject.toml`)\n* Java/Maven packages (`pom.xml`)\n* NodeJS packages (`package.json`)\n* Debian package (`apt show` output)\n* Github API (when passed a github URL)\n* GitLab API (when passed a GitLab URL)\n* Web sites/services (see the section on software types and service below):\n    * Simple metadata from HTML `<meta>` elements.\n    * Script blocks using `application/json+ld`\n\nIt can also read and manipulate existing `codemeta.json` files as well\nas parse simple AUTHORS/CONTRIBUTORS files. One of the most notable\nfeatures of codemetapy is that it allows chaining to successively update\na metadata description based on multiple sources. Codemetapy is used in\nthat way by the [codemeta-harvester](https://github.com/proycon/codemeta-harvester). \n\n**Note:** If you are looking for an all-in-one solution to automatically\ngenerate a `codemeta.json` for your project, then\n*[codemeta-harvester](https://github.com/proycon/codemeta-harvester) is the\nbest place to start*. It is a higher-level tool that automatically invokes\ncodemetapy on various sources it can automatically detect, and combined those into\na single codemeta representation.\n\n## Installation\n\n`pip install codemetapy`\n\n## Usage\n\nQuery and convert any installed python package:\n\n`$ codemetapy somepackage`\n\nOutput will be to standard output by default, to write it to an output\nfile instead, do either:\n\n`$ codemetapy somepackage > codemeta.json`\n\nor use the `-O` parameter:\n\n`$ codemetapy -O codemeta.json somepackage`\n\nIf you are in the current working directory of any python project and\nthere is a `setup.py`or `pyproject.toml`, then you can simply call `codemetapy` without\narguments to output codemeta for the project. Codemetapy will\nautomatically run `python setup.py egg_info` if needed and parse it's output to\nfacilitate this:\n\n`$ codemetapy`\n\nThe tool also supports adding properties through parameters:\n\n`$ codemetapy --developmentStatus active somepackage > codemeta.json`\n\nTo read an existing codemeta.json and extend it:\n\n`$ codemetapy -O codemeta.json codemeta.json somepackage`\n\nor even:\n\n`$ codemetapy -O codemeta.json codemeta.json codemeta2.json codemeta3.json`\n\nThis makes use of an important characteristic of codemetapy which is *composition*. When you specify multiple input sources, they will be interpreted as referring to the same resource.\nProperties (on `schema:SoftwareSourceCode`) in the later resources will *overwrite* earlier properties. So if `codemeta3.json` specifies authors, all authors that were specified in `codemeta2.json` are lost rather than merged and the end result will have the authors from `codemeta3.json`. However, if `codemeta2.json` has a property that was not in `codemeta3.json`, say `deveopmentStatus`, then that will make it to the end rsult. In other words, the latest source always takes precedence. Any non-overlapping properties will be be merged. This functionality is heavily relied on by the higher-level tool [codemeta-harvester](https://github.com/proycon/codemeta-harvester).\n\nIf you want to start from scratch and build using command line parameters, use `/dev/null` as input, and make sure to pass some identifier and code repository:\n\n`$ codemetapy --identifier some-id --codeRepository https://github.com/my/code /dev/null > codemeta.json`\n\nThis tool can also deal with debian packages by parsing the output of\n`apt show` (albeit limited):\n\n`$ apt show somepackage | codemetapy -i debian -`\n\nHere `-` represents standard input, which enables you to use piping\nsolutions on a unix shell, `-i` denotes the input types, you can chain\nas many as you want. The number of input types specifies must correspond\nexactly to the number of input sources (the positional arguments).\n\n\n## Some notes on Vocabulary\n\nFor `codemeta:developmentStatus`, codemetapy attempts to\nassign full [repostatus](https://www.repostatus.org/) URIs whenever\npossible For `schema:license`, full [SPDX](https://spdx.org) URIs are used where possible.\n\n## Identifiers\n\nWe distinguish two types of identifiers, first there is the URI or [IRI](https://www.w3.org/TR/rdf11-concepts/#section-IRIs) \nthat identifies RDF resources. It is a globally unique identifier and often looks like a URL. \n\nCodemetapy will assign new URIs for resources if and only if you pass a base URI using ``--baseuri``. Moreover, if you set this, codemetapy will *forcibly* set URIs over any existing ones, effectively assigning new identifiers. The previous identifier will then be covered via the `owl:sameAs` property instead. This allows you to ownership of all URIs.  Internally, codemetapy will create URIs for everything even if you don't specified a base URI (even for blank nodes), but these URIs are stripped again upon serialisation to JSON-LD.\n\nThe second identifier is the [schema:identifier](https://schema.org/identifier), of which there may even be multiple.\nCodemetapy typically expects such an identifier to be a simple unspaced string holding a name for software. For example, a Python package name would make a good identifier. If this property is present, codemetapy will use it when generating URIs.\nThe `schema:identifier` property can be contrasted with `schema:name`, which is the human readable form of the name and may be more elaborate.\nThe identifier is typically also used for other identifiers (such as DOIs, ISBNs, etc), which should come in the following form:\n\n```json\n\"identifier:\" {\n    \"@type\": \"PropertyValue\",\n    \"propertyID\": \"doi\",\n    \"value\": \"10.5281/zenodo.6882966\"\n}\n```\n\nBut short-hand forms such as ``doi:10.5281/zenodo.6882966`` or as a URL like `https://doi.org/10.5281/zenodo.6882966` are also recognised by this library.\n\n\n## Software Types and services\n\nCodemetapy (since 2.0) implements an extension to codemeta that allows\nlinking the software source code to the actual instantiation of the\nsoftware, with explicit regard for the interface type. This is done via\nthe `schema:targetProduct` property, which takes as range a\n`schema:SoftwareApplication`, `schema:WebAPI`,\n`schema:WebSite` or any of the extra types defined in\n<https://github.com/SoftwareUnderstanding/software_types/> . This was\nproposed in [this issue](https://github.com/codemeta/codemeta/issues/271)\n\nThis extension is enabled by default and can be disabled by setting the\n`--strict` flag.\n\nWhen you pass codemetapy a URL it will assume this is where the software\nis run as a service, and attempt to extract metadata from the site and\nencode is via `targetProduct`. For example, here we read an\nexisting `codemeta.json` and extend it with some place where\nit is instantiated as a service:\n\n`$ codemetapy codemeta.json https://example.org/`\n\nIf served HTML, codemetapy will use your `<script>` block\nusing `application/json+ld` if it provides a valid software types (as\nmentioned above). For other HTML, codemetapy will simply extract some\nmetadata from HTML `<meta>` elements. Content negotation will be used\nand the we favour json+ld, json and even yaml and XML over HTML.\n\n(Note: the older Entypoint Extension from before codemetapy 2.0 is now deprecated)\n\n## Graph\n\nYou can use codemetapy to generate one big knowledge graph expressing\nmultiple codemeta resources using the `--graph` parameter:\n\n`$ codemetapy --graph resource1.json resource2.json`\n\nThis will produce JSON-LD output with multiple resources in the graph.\n\n## Github API\n\nCodemetapy can make use of the Github API to query metdata from GitHub,\nbut this allows only limited anonymous requests before you hit a limit.\nTo allow more requests, please set the environment variable\n`$GITHUB_TOKEN` to a [personal access\ntoken](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token).\n\n## GitLab API\n\nCodemetapy can make use of the GitLab API to query metdata from GitLab,\nbut this allows only limited anonymous requests before you hit a limit.\nTo allow more requests, please set the environment variable\n`$GITLAB_TOKEN` to a [personal access\ntoken](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html).\n\n## Integration in setup.py\n\nYou can integrate `codemeta.json` generation in your project's\n`setup.py`, this will add an extra `python setup.py codemeta` command\nthat will generate a new metadata file or update an already existing\nmetadata file. Note that this must be run *after*\n`python setup.py install` (or `python setup.py develop`).\n\nTo integrate this, add the following to your project's `setup.py`:\n\n```python\ntry:\n    from codemeta.codemeta import CodeMetaCommand\n    cmdclass={\n        'codemeta': CodeMetaCommand,\n    }\nexcept ImportError:\n    cmdclass={}\n```\n\nAnd in your `setup()` call add the parameter:\n\n```python\ncmdclass=cmdclass\n```\n\nThis will ensure your `setup.py` works in all cases, even if codemetapy\nis not installed, and that the command will be available if codemetapy\nis available.\n\nIf you want to ship your package with the generated `codemeta.json`,\nthen simply add a line saying `codemeta.json` to the file `MANIFEST.in`\nin the root of your project.\n\n## Acknowledgements\n\nThis work is conducted at the [KNAW Humanities Cluster](https://huc.knaw.nl/)'s\n[Digital Infrastructure department](https://di.huc.knaw.nl/) in the scope of the \n[CLARIAH](https://www.clariah.nl) project (CLARIAH-PLUS, NWO grant 184.034.023) as\npart of the FAIR Tool Discovery track of the Shared Development Roadmap.\n",
    "bugtrack_url": null,
    "license": "GPL-3.0-only",
    "summary": "Generate and manage CodeMeta software metadata",
    "version": "2.5.2",
    "project_urls": {
        "Homepage": "https://github.com/proycon/codemetapy"
    },
    "split_keywords": [
        "software metadata",
        "codemeta",
        "schema.org",
        "rdf",
        "linked data"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cab9af7d19cc299cb421d5fb2ff3dea3572f07a21d7eef68d9a79f04d9e07ddb",
                "md5": "28f8857be23bbae6ec06e0883f9faeff",
                "sha256": "f60a947c87c9ae0c1c3cb73780799214ca242d59b603a1bffb9e34682488973f"
            },
            "downloads": -1,
            "filename": "CodeMetaPy-2.5.2.tar.gz",
            "has_sig": false,
            "md5_digest": "28f8857be23bbae6ec06e0883f9faeff",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 84024,
            "upload_time": "2023-11-27T15:25:06",
            "upload_time_iso_8601": "2023-11-27T15:25:06.088619Z",
            "url": "https://files.pythonhosted.org/packages/ca/b9/af7d19cc299cb421d5fb2ff3dea3572f07a21d7eef68d9a79f04d9e07ddb/CodeMetaPy-2.5.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-27 15:25:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "proycon",
    "github_project": "codemetapy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "codemetapy"
}
        
Elapsed time: 0.14713s