Name | invenio-subjects-fast JSON |
Version |
2023.7.5
JSON |
| download |
home_page | https://github.com/MESH-Research/invenio-subjects-fast |
Summary | Provides the FAST faceted subject vocabulary for InvenioRDM |
upload_time | 2023-07-05 13:47:28 |
maintainer | |
docs_url | None |
author | MESH Research |
requires_python | >=3.9 |
license | MIT License Copyright (C) 2023 MESH Research Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
keywords |
invenio
inveniordm
subjects
fast
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# invenio-subjects-fast
FAST subject vocabulary for the InvenioRDM repository system.
Install this extension to use the FAST (Faceted Application of Subject Terminology) subject vocabulary in your InvenioRDM instance. FAST is developed by OCLC Research (https://www.oclc.org/research/areas/data-science/fast.html) and adapts the Library of Congress Subject Headings to use a faceted structure, allowing flexible and efficient tagging and searching. FAST is made available under the Open Data Commons Attribution License (https://www.oclc.org/research/areas/data-science/fast/odcby.html).
Some of the facets in FAST are extremely large. (Well over a million terms.) So this package provides the nine facets of the FAST vocabulary in Invenio as nine separate subject vocabularies. Each term's id is the full URL for the term (e.g., http://id.worldcat.org/fast/1204165).
The invenio-subjects-fast package comes with a preconfigured set of jsonl files ready to be loaded as fixtures into InvenioRDM. It also includes scripts to download the raw .marcxml files from the FAST project and convert them into jsonl vocabulary files. This download and conversion process will only be necessary, though, when the FAST terms are updated.
## Installation
From your InvenioRDM instance directory:
pipenv install invenio-subjects-fast
This will add the package to your Pipfile and install it in your InvenioRDM instance's virtual environment.
## Usage
The package will automatically provide the entry points for InvenioRDM to register the vocabulary as a subject scheme. If you are installing the vocabulary in an existing InvenioRDM instance, though, you will have to tell Invenio to create vocabulary fixtures from the package files:
pipenv run invenio rdm-records fixtures
**Note that this fixture creation will take quite a few minutes**, and on lower-powered processors may take more than half an hour. During this time the terminal process will simply read "Creating required fixtures..." This should eventually be followed by "Created required fixtures!" But the initial loading process for such a large set of vocabulary files is very slow.
**The vocabulary terms will not be available for some time** even after the fixtures have been created and you receive the "Created required fixtures!" message. This is because the indexing of each vocabulary term is delegated to a RabbitMQ task to be performed in due course by a celery worker. It **may take as long as several hours** for celery to complete all of these queued tasks. Once the queue has completed, though, the FAST terms will appear as suggestions in the subject field of the deposit form.
## Vocabulary file format
The official InvenioRDM documentation recommends yaml files for custom vocabularies. This file format is quite slow, though, for InvenioRDM to ingest. So this module instead provides jsonl formatted files.
## Updating the vocabulary
The invenio-subjects-fast package includes a preconfigured set of Invenio vocabulary files in jsonl format. You can, however, download and compile updated FAST source files for yourself. First, from your Invenio instance directory, run
pipenv run invenio-subjects-fast download
This will download the nine separate vocabulary facets from the FAST project's download page as marcxml files (https://www.oclc.org/research/areas/data-science/fast/download.html).
To convert these to Invenio's subject jsonl format, run
pipenv run invenio-subjects-fast convert
## Issues with loading updated vocabularies into InvenioRDM
**Note that at the time of writing, InvenioRDM has no capacity to update installed fixtures without either recreating the database with `invenio-cli services setup --force` or manually updating the database. It is hoped that this situation will be fixed before the next update is made to this package. Be aware, though, that once installed this vocabulary cannot easily be updated at the moment.**
## Updating this package
The FAST vocabulary is updated every 6 months. When a new version of the FAST terms is released, this package should be updated. You can tell that an update may be necessary if the current version of the FAST vocabulary was released after the date used as the version number for this package.
If you would like to contribute an updated version, first create a fork of the (https://github.com/MESH-Research/invenio-subjects-fast), clone it locally, and install the local version of the package in development mode. Then from the folder where the invenio-subjects-fast package was installed, run
pipenv run invenio-subjects-fast download
followed by
pipenv run invenio-subjects-fast convert
Run the automated tests with the test-runner script:
bash run-tests.sh
Once the tests pass, bump the version number in `invenio_subjects_mesh/__init__.py` and in `pyproject.toml` to the current date (YYYY-MM-DD) and submit a pull request to the main invenio-subjects-fast repository.
### Versions
This repository follows [calendar versioning](https://calver.org/):
`2021.06.18` is both a valid semantic version and an indicator of the year-month corresponding to the loaded FAST terms.
## Acknowledgements
Thanks to Guillaume Viger and Northwestern University for the invenio-subjects-mesh package which provided the framework for this package and parts of this README text.
Raw data
{
"_id": null,
"home_page": "https://github.com/MESH-Research/invenio-subjects-fast",
"name": "invenio-subjects-fast",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "",
"keywords": "invenio,inveniordm,subjects,FAST",
"author": "MESH Research",
"author_email": "MESH Research <scottia4@msu.edu>",
"download_url": "https://files.pythonhosted.org/packages/08/8e/b50606bb6e7e5a715367479834b245de5da6cd11f731a8ab314c910e71e5/invenio-subjects-fast-2023.7.5.tar.gz",
"platform": "any",
"description": "# invenio-subjects-fast\n\nFAST subject vocabulary for the InvenioRDM repository system.\n\nInstall this extension to use the FAST (Faceted Application of Subject Terminology) subject vocabulary in your InvenioRDM instance. FAST is developed by OCLC Research (https://www.oclc.org/research/areas/data-science/fast.html) and adapts the Library of Congress Subject Headings to use a faceted structure, allowing flexible and efficient tagging and searching. FAST is made available under the Open Data Commons Attribution License (https://www.oclc.org/research/areas/data-science/fast/odcby.html).\n\nSome of the facets in FAST are extremely large. (Well over a million terms.) So this package provides the nine facets of the FAST vocabulary in Invenio as nine separate subject vocabularies. Each term's id is the full URL for the term (e.g., http://id.worldcat.org/fast/1204165).\n\nThe invenio-subjects-fast package comes with a preconfigured set of jsonl files ready to be loaded as fixtures into InvenioRDM. It also includes scripts to download the raw .marcxml files from the FAST project and convert them into jsonl vocabulary files. This download and conversion process will only be necessary, though, when the FAST terms are updated.\n\n## Installation\n\nFrom your InvenioRDM instance directory:\n\n pipenv install invenio-subjects-fast\n\nThis will add the package to your Pipfile and install it in your InvenioRDM instance's virtual environment.\n\n## Usage\n\nThe package will automatically provide the entry points for InvenioRDM to register the vocabulary as a subject scheme. If you are installing the vocabulary in an existing InvenioRDM instance, though, you will have to tell Invenio to create vocabulary fixtures from the package files:\n\n pipenv run invenio rdm-records fixtures\n\n**Note that this fixture creation will take quite a few minutes**, and on lower-powered processors may take more than half an hour. During this time the terminal process will simply read \"Creating required fixtures...\" This should eventually be followed by \"Created required fixtures!\" But the initial loading process for such a large set of vocabulary files is very slow.\n\n**The vocabulary terms will not be available for some time** even after the fixtures have been created and you receive the \"Created required fixtures!\" message. This is because the indexing of each vocabulary term is delegated to a RabbitMQ task to be performed in due course by a celery worker. It **may take as long as several hours** for celery to complete all of these queued tasks. Once the queue has completed, though, the FAST terms will appear as suggestions in the subject field of the deposit form.\n\n## Vocabulary file format\n\nThe official InvenioRDM documentation recommends yaml files for custom vocabularies. This file format is quite slow, though, for InvenioRDM to ingest. So this module instead provides jsonl formatted files.\n\n## Updating the vocabulary\n\nThe invenio-subjects-fast package includes a preconfigured set of Invenio vocabulary files in jsonl format. You can, however, download and compile updated FAST source files for yourself. First, from your Invenio instance directory, run\n\n pipenv run invenio-subjects-fast download\n\nThis will download the nine separate vocabulary facets from the FAST project's download page as marcxml files (https://www.oclc.org/research/areas/data-science/fast/download.html).\n\nTo convert these to Invenio's subject jsonl format, run\n\n pipenv run invenio-subjects-fast convert\n\n\n## Issues with loading updated vocabularies into InvenioRDM\n\n**Note that at the time of writing, InvenioRDM has no capacity to update installed fixtures without either recreating the database with `invenio-cli services setup --force` or manually updating the database. It is hoped that this situation will be fixed before the next update is made to this package. Be aware, though, that once installed this vocabulary cannot easily be updated at the moment.**\n\n\n## Updating this package\n\n\nThe FAST vocabulary is updated every 6 months. When a new version of the FAST terms is released, this package should be updated. You can tell that an update may be necessary if the current version of the FAST vocabulary was released after the date used as the version number for this package.\n\nIf you would like to contribute an updated version, first create a fork of the (https://github.com/MESH-Research/invenio-subjects-fast), clone it locally, and install the local version of the package in development mode. Then from the folder where the invenio-subjects-fast package was installed, run\n\n pipenv run invenio-subjects-fast download\n\nfollowed by\n\n pipenv run invenio-subjects-fast convert\n\nRun the automated tests with the test-runner script:\n\n bash run-tests.sh\n\nOnce the tests pass, bump the version number in `invenio_subjects_mesh/__init__.py` and in `pyproject.toml` to the current date (YYYY-MM-DD) and submit a pull request to the main invenio-subjects-fast repository.\n\n\n### Versions\n\nThis repository follows [calendar versioning](https://calver.org/):\n\n`2021.06.18` is both a valid semantic version and an indicator of the year-month corresponding to the loaded FAST terms.\n\n\n## Acknowledgements\n\nThanks to Guillaume Viger and Northwestern University for the invenio-subjects-mesh package which provided the framework for this package and parts of this README text.\n",
"bugtrack_url": null,
"license": "MIT License Copyright (C) 2023 MESH Research Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
"summary": "Provides the FAST faceted subject vocabulary for InvenioRDM",
"version": "2023.7.5",
"project_urls": {
"Homepage": "https://github.com/MESH-Research/invenio-subjects-fast"
},
"split_keywords": [
"invenio",
"inveniordm",
"subjects",
"fast"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "19bb61561fbfd8070d6783fa758e633e4c08a2ec4103e9d3fc53d3a7adaa1405",
"md5": "5deab71f87106bd065921b92636058bf",
"sha256": "b725cf430bcc806c4516be95e4c76b33385a1c1a8aa1d06c8fbf3feb43643be3"
},
"downloads": -1,
"filename": "invenio_subjects_fast-2023.7.5-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "5deab71f87106bd065921b92636058bf",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.9",
"size": 33178841,
"upload_time": "2023-07-05T13:47:15",
"upload_time_iso_8601": "2023-07-05T13:47:15.399824Z",
"url": "https://files.pythonhosted.org/packages/19/bb/61561fbfd8070d6783fa758e633e4c08a2ec4103e9d3fc53d3a7adaa1405/invenio_subjects_fast-2023.7.5-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "088eb50606bb6e7e5a715367479834b245de5da6cd11f731a8ab314c910e71e5",
"md5": "539476a7ebf3d21d281cdfdbbafa4247",
"sha256": "ae8e37e7138b13954b0236c17e148bac029425d4b4affe1a1e01277deeb0487d"
},
"downloads": -1,
"filename": "invenio-subjects-fast-2023.7.5.tar.gz",
"has_sig": false,
"md5_digest": "539476a7ebf3d21d281cdfdbbafa4247",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 32152677,
"upload_time": "2023-07-05T13:47:28",
"upload_time_iso_8601": "2023-07-05T13:47:28.961545Z",
"url": "https://files.pythonhosted.org/packages/08/8e/b50606bb6e7e5a715367479834b245de5da6cd11f731a8ab314c910e71e5/invenio-subjects-fast-2023.7.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-05 13:47:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MESH-Research",
"github_project": "invenio-subjects-fast",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "invenio-subjects-fast"
}