labelify

Name	labelify JSON
Version	0.4.0 JSON
	download
home_page	None
Summary	Analyse an RDF graph to find URI's without human readable labels.
upload_time	2025-01-09 06:52:02
maintainer	None
docs_url	None
author	Nicholas Car
requires_python	<4.0,>=3.12
license	BSD-3-Clause
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # labelify 

labelify is a Python module and command line utility that identifies
unlabelled resources in a graph. It is highly configurable and works on
a number of different RDF data sources.

## Installation

labelify is on PyPI at https://pypi.org/project/labelify/ so:

    pip install labelify

or 

    poetry add labelify

will install it.

To install from it's version control repo, for the latest unstable release:

    pip install git+https://github.com/Kurrawong/labelify

## Command Line Usage

Find all missing labels in `myOntology.ttl:`

    labelify myOntology.ttl

Find missing labels for all the predicates (not subjects or objects) in
`myOntology.ttl:`

    labelify myOntology.ttl --nodetype predicates

Find all missing labels in `myOntology.ttl` taking into account the
labels which have been defined in another file called
`supportingVocab.ttl`.

*but don’t check for missing labels in `supportingVocab.ttl`*

    labelify myOntology.ttl --context supportingVocab.ttl

Same as above but use the additional labelling predicates given in
`myLabellingPredicates.txt.`

*By default only rdfs:label is used as a labelling predicate.*

    labelify myOntology.ttl --context supportingVocab.ttl --labels myLabellingPredicates.txt

Where `myLabellingPredicates.txt` is a list of labelling predicates (one
per line and unprefixed):

    http://www.w3.org/2004/02/skos/core#prefLabel
    http://schema.org/name

Find all the missing labels in the subgraph `http://example-graph` at
the sparql endpoint `http://mytriplestore/sparql` using basic HTTP auth
to connect.

labelify will prompt for the password or it can be provided with the
`--password` flag if you dont mind it being saved to the shell history.

    labelify http://mytriplestore/sparql --graph http://example-graph --username admin

### Label Extraction

Get all the IRIs with missing labels from a local RDF file and put them
into a text file with an IRI per line:

    labelify -n all my_file.ttl -r > iris-missing-labels.txt

*note use of `-r` for simple IRI printing*

Use the output file to generate an RDF file containing the labes,
extracted from either another RDF file, a directory of RDF files or a
SPARQL endpoint:

    labelify -x iris-missing-labels.txt other-rdf-file.ttl > labels.ttl
    # or
    labelify -x iris-missing-labels.txt dir-of-rdf-files/ > labels.ttl
    # or
    labelify -x iris-missing-labels.txt http://some-sparql-endpoint.com/sparql > labels.ttl

## Command line output formats

By default, labelify will print helpful progress and configuration
messages and attempt to group the missing labels by namespace, making it
easier to quickly parse the output.

The `--raw/-r` option can be appended to any of the examples above to
tell labelify to only print the uris of objects with missing labels (one
per line) and no other messages. This is useful for command line
composition if you wish to pipe the output into another process.

## More command line options

For more help and the complete list of command line options just run
`labelify --help/-h`

As per unix conventions all the flags shown above can also be used with
short codes. i.e. `-g` is the same as `--graph`.

## Usage as a module

Print missing labels for all the objects (not subjects or predicates) in
`myOntology.ttl`, taking into account any labels which have been defined
in RDF files in the `supportingVocabs` directory.

Using `skos:prefLabel` and `rdfs:label`, but not `dcterms:title` and
`schema:name` (as per default) as the labelling predicates.

    from labelify import find_missing_labels
    from rdflib import Graph
    from rdflib.namespace import RDFS, SKOS
    import glob

    graph = Graph().parse("myOntology.ttl")
    context_graph = Graph()
    for context_file in glob.glob("supportingVocabs/*.ttl"):
        context_graph.parse(context_file)
    labelling_predicates = [SKOS.prefLabel, RDFS.label]
    nodetype = "objects"

    missing_labels = find_missing_labels(
        graph,
        context_graph,
        labelling_predicates,
        nodetype
    )
    print(missing_labels)

and, to extract labels, descriptions & seeAlso details for given IRIs
from a given directory of RDF files:

    from pathlib import Path
    from labelify import extract_labels

    iris = Path("tests/get_iris/iris.txt").read_text().splitlines()
    lbls_graph = extract_labels(Path("tests/one/background/"), iris)

## Development

### Installing from source

Clone the repository and install the dependencies

*labelify uses [Poetry](https://python-poetry.org/) to manage its
dependencies.*

    git clone git@github.com:Kurrawong/labelify.git
    cd labelify
    poetry install

You can then use labelify from the command line

    poetry shell
    python labelify/ ...

### Running tests

    poetry run pytest

Several of the tests require a Fuseki triplestore instance to be available, so you need **Docker** running as the tests 
will attempt to use [testcontainers](https://testcontainers.com/guides/getting-started-with-testcontainers-for-python/) 
to create throwaway containers for this purpose.

### Formatting the codebase

    poetry run black . && poetry run ruff check --fix labelify/

## License

[BSD-3-Clause](https://opensource.org/license/bsd-3-clause/), if anyone
is asking.

## Contact

**KurrawongAI**  
<info@kurrawong.ai>  
<https://kurrawong.ai>

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "labelify",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.12",
    "maintainer_email": null,
    "keywords": null,
    "author": "Nicholas Car",
    "author_email": "nick@kurrawong.ai",
    "download_url": "https://files.pythonhosted.org/packages/4a/42/862f802c106b216e4325e2e730bae3807464fd3b16b660ca3481fbd15353/labelify-0.4.0.tar.gz",
    "platform": null,
    "description": "# labelify \n\nlabelify is a Python module and command line utility that identifies\nunlabelled resources in a graph. It is highly configurable and works on\na number of different RDF data sources.\n\n## Installation\n\nlabelify is on PyPI at https://pypi.org/project/labelify/ so:\n\n    pip install labelify\n\nor \n\n    poetry add labelify\n\nwill install it.\n\nTo install from it's version control repo, for the latest unstable release:\n\n    pip install git+https://github.com/Kurrawong/labelify\n\n## Command Line Usage\n\nFind all missing labels in `myOntology.ttl:`\n\n    labelify myOntology.ttl\n\nFind missing labels for all the predicates (not subjects or objects) in\n`myOntology.ttl:`\n\n    labelify myOntology.ttl --nodetype predicates\n\nFind all missing labels in `myOntology.ttl` taking into account the\nlabels which have been defined in another file called\n`supportingVocab.ttl`.\n\n*but don\u2019t check for missing labels in `supportingVocab.ttl`*\n\n    labelify myOntology.ttl --context supportingVocab.ttl\n\nSame as above but use the additional labelling predicates given in\n`myLabellingPredicates.txt.`\n\n*By default only rdfs:label is used as a labelling predicate.*\n\n    labelify myOntology.ttl --context supportingVocab.ttl --labels myLabellingPredicates.txt\n\nWhere `myLabellingPredicates.txt` is a list of labelling predicates (one\nper line and unprefixed):\n\n    http://www.w3.org/2004/02/skos/core#prefLabel\n    http://schema.org/name\n\nFind all the missing labels in the subgraph `http://example-graph` at\nthe sparql endpoint `http://mytriplestore/sparql` using basic HTTP auth\nto connect.\n\nlabelify will prompt for the password or it can be provided with the\n`--password` flag if you dont mind it being saved to the shell history.\n\n    labelify http://mytriplestore/sparql --graph http://example-graph --username admin\n\n### Label Extraction\n\nGet all the IRIs with missing labels from a local RDF file and put them\ninto a text file with an IRI per line:\n\n    labelify -n all my_file.ttl -r > iris-missing-labels.txt\n\n*note use of `-r` for simple IRI printing*\n\nUse the output file to generate an RDF file containing the labes,\nextracted from either another RDF file, a directory of RDF files or a\nSPARQL endpoint:\n\n    labelify -x iris-missing-labels.txt other-rdf-file.ttl > labels.ttl\n    # or\n    labelify -x iris-missing-labels.txt dir-of-rdf-files/ > labels.ttl\n    # or\n    labelify -x iris-missing-labels.txt http://some-sparql-endpoint.com/sparql > labels.ttl\n\n## Command line output formats\n\nBy default, labelify will print helpful progress and configuration\nmessages and attempt to group the missing labels by namespace, making it\neasier to quickly parse the output.\n\nThe `--raw/-r` option can be appended to any of the examples above to\ntell labelify to only print the uris of objects with missing labels (one\nper line) and no other messages. This is useful for command line\ncomposition if you wish to pipe the output into another process.\n\n## More command line options\n\nFor more help and the complete list of command line options just run\n`labelify --help/-h`\n\nAs per unix conventions all the flags shown above can also be used with\nshort codes. i.e. `-g` is the same as `--graph`.\n\n## Usage as a module\n\nPrint missing labels for all the objects (not subjects or predicates) in\n`myOntology.ttl`, taking into account any labels which have been defined\nin RDF files in the `supportingVocabs` directory.\n\nUsing `skos:prefLabel` and `rdfs:label`, but not `dcterms:title` and\n`schema:name` (as per default) as the labelling predicates.\n\n    from labelify import find_missing_labels\n    from rdflib import Graph\n    from rdflib.namespace import RDFS, SKOS\n    import glob\n\n    graph = Graph().parse(\"myOntology.ttl\")\n    context_graph = Graph()\n    for context_file in glob.glob(\"supportingVocabs/*.ttl\"):\n        context_graph.parse(context_file)\n    labelling_predicates = [SKOS.prefLabel, RDFS.label]\n    nodetype = \"objects\"\n\n    missing_labels = find_missing_labels(\n        graph,\n        context_graph,\n        labelling_predicates,\n        nodetype\n    )\n    print(missing_labels)\n\nand, to extract labels, descriptions & seeAlso details for given IRIs\nfrom a given directory of RDF files:\n\n    from pathlib import Path\n    from labelify import extract_labels\n\n    iris = Path(\"tests/get_iris/iris.txt\").read_text().splitlines()\n    lbls_graph = extract_labels(Path(\"tests/one/background/\"), iris)\n\n## Development\n\n### Installing from source\n\nClone the repository and install the dependencies\n\n*labelify uses [Poetry](https://python-poetry.org/) to manage its\ndependencies.*\n\n    git clone git@github.com:Kurrawong/labelify.git\n    cd labelify\n    poetry install\n\nYou can then use labelify from the command line\n\n    poetry shell\n    python labelify/ ...\n\n### Running tests\n\n    poetry run pytest\n\nSeveral of the tests require a Fuseki triplestore instance to be available, so you need **Docker** running as the tests \nwill attempt to use [testcontainers](https://testcontainers.com/guides/getting-started-with-testcontainers-for-python/) \nto create throwaway containers for this purpose.\n\n### Formatting the codebase\n\n    poetry run black . && poetry run ruff check --fix labelify/\n\n## License\n\n[BSD-3-Clause](https://opensource.org/license/bsd-3-clause/), if anyone\nis asking.\n\n## Contact\n\n**KurrawongAI**  \n<info@kurrawong.ai>  \n<https://kurrawong.ai>\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "Analyse an RDF graph to find URI's without human readable labels.",
    "version": "0.4.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a6958d82c264cbb051013191c1c09f4d96a2e457d5ee6f9010c89d3cbfab96d2",
                "md5": "11649fcec87cbe3577483f8814d463de",
                "sha256": "5452d39bdf58ad08e61892bdf903076b1c3ce11b8b7b6d15d3e81a0ecc5d25c8"
            },
            "downloads": -1,
            "filename": "labelify-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "11649fcec87cbe3577483f8814d463de",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.12",
            "size": 10291,
            "upload_time": "2025-01-09T06:51:59",
            "upload_time_iso_8601": "2025-01-09T06:51:59.621294Z",
            "url": "https://files.pythonhosted.org/packages/a6/95/8d82c264cbb051013191c1c09f4d96a2e457d5ee6f9010c89d3cbfab96d2/labelify-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4a42862f802c106b216e4325e2e730bae3807464fd3b16b660ca3481fbd15353",
                "md5": "11021061dc48604cb586efc0ef94eea9",
                "sha256": "f904bcd074a3998dba78814f004079f1b5edbd5fa3e9f9c96c82fdb56a961b57"
            },
            "downloads": -1,
            "filename": "labelify-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "11021061dc48604cb586efc0ef94eea9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.12",
            "size": 8826,
            "upload_time": "2025-01-09T06:52:02",
            "upload_time_iso_8601": "2025-01-09T06:52:02.557073Z",
            "url": "https://files.pythonhosted.org/packages/4a/42/862f802c106b216e4325e2e730bae3807464fd3b16b660ca3481fbd15353/labelify-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-09 06:52:02",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "labelify"
}

Nicholas Car