retraction-watch-import


Nameretraction-watch-import JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://gitlab.com/crossref/labs/retraction-watch-import
SummaryA tool to import Retraction Watch data
upload_time2023-09-21 15:37:39
maintainer
docs_urlNone
authorMartin Paul Eve
requires_python>=3.8
license
keywords academic retractions data import crossref
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Retraction Watch Database Importer
This Airflow script imports the Retraction Watch database into the annotations system of the Crossref Labs API.

![license](https://img.shields.io/gitlab/license/crossref/labs/retraction-watch-import) ![activity](https://img.shields.io/gitlab/last-commit/crossref/labs/retraction-watch-import)

![Airflow](https://img.shields.io/badge/Airflow-017CEE?style=for-the-badge&logo=Apache%20Airflow&logoColor=white) ![AWS](https://img.shields.io/badge/AWS-%23FF9900.svg?style=for-the-badge&logo=amazon-aws&logoColor=white) ![Linux](https://img.shields.io/badge/Linux-FCC624?style=for-the-badge&logo=linux&logoColor=black) ![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)

## Input Format
The script expects an S3 folder that contains CSV files with Retraction Watch data.

The CSV file should have the headings (with this capitalization):
* `DOI`
* `RetractionDOI`
* `Reason`
* `RetractionNature`
* `Notes`
* `URLS`

The first row of the CSV should be the headings. Multiple entries are possible (e.g. an expression of concern and a retraction), but only one type of each, for each DOI, will be imported. (I.e. you cannot have two retractions or two expressions of concern.)

## Idempotency
The script is idempotent. If you run it multiple times, it will only import new data and the results should be the same after multiple runs.

## Archiving
After processing a JSON input file, the script will move it to an archive folder in the same S3 bucket.

## Periodic Runs and Missing Input Files
The script is designed to be run periodically. If it does not find any input files, it will raise an exception. This is by design.

© Crossref 2023

            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.com/crossref/labs/retraction-watch-import",
    "name": "retraction-watch-import",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Martin Paul Eve <meve@crossref.org>",
    "keywords": "academic,retractions,data,import,crossref",
    "author": "Martin Paul Eve",
    "author_email": "meve@crossref.org",
    "download_url": "https://files.pythonhosted.org/packages/0b/a9/1955a4097f36d4a667d2b5b37c8c197e0793f85ff6fb4f07064c1cdde953/retraction_watch_import-0.0.4.tar.gz",
    "platform": null,
    "description": "# Retraction Watch Database Importer\nThis Airflow script imports the Retraction Watch database into the annotations system of the Crossref Labs API.\n\n![license](https://img.shields.io/gitlab/license/crossref/labs/retraction-watch-import) ![activity](https://img.shields.io/gitlab/last-commit/crossref/labs/retraction-watch-import)\n\n![Airflow](https://img.shields.io/badge/Airflow-017CEE?style=for-the-badge&logo=Apache%20Airflow&logoColor=white) ![AWS](https://img.shields.io/badge/AWS-%23FF9900.svg?style=for-the-badge&logo=amazon-aws&logoColor=white) ![Linux](https://img.shields.io/badge/Linux-FCC624?style=for-the-badge&logo=linux&logoColor=black) ![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)\n\n## Input Format\nThe script expects an S3 folder that contains CSV files with Retraction Watch data.\n\nThe CSV file should have the headings (with this capitalization):\n* `DOI`\n* `RetractionDOI`\n* `Reason`\n* `RetractionNature`\n* `Notes`\n* `URLS`\n\nThe first row of the CSV should be the headings. Multiple entries are possible (e.g. an expression of concern and a retraction), but only one type of each, for each DOI, will be imported. (I.e. you cannot have two retractions or two expressions of concern.)\n\n## Idempotency\nThe script is idempotent. If you run it multiple times, it will only import new data and the results should be the same after multiple runs.\n\n## Archiving\nAfter processing a JSON input file, the script will move it to an archive folder in the same S3 bucket.\n\n## Periodic Runs and Missing Input Files\nThe script is designed to be run periodically. If it does not find any input files, it will raise an exception. This is by design.\n\n&copy; Crossref 2023\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A tool to import Retraction Watch data",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://gitlab.com/crossref/labs/retraction-watch-import",
        "changelog": "https://gitlab.com/crossref/labs/retraction-watch-import/-/blob/main/CHANGELOG.md",
        "documentation": "https://labs.crossref.org",
        "homepage": "https://labs.crossref.org",
        "repository": "https://gitlab.com/crossref/labs/retraction-watch-import"
    },
    "split_keywords": [
        "academic",
        "retractions",
        "data",
        "import",
        "crossref"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2a3ae5a2ab63e8bc89dbbe5c335d99774c8f4c4fd8382e06af4e5674d0eac06a",
                "md5": "b4e4e46fbf7d3abd063f2cedd6917f0c",
                "sha256": "2af73dc486c550ac5e276bbb093b6d6ee7a2382ba9614cb9c003c12d422988c3"
            },
            "downloads": -1,
            "filename": "retraction_watch_import-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b4e4e46fbf7d3abd063f2cedd6917f0c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 7525,
            "upload_time": "2023-09-21T15:37:36",
            "upload_time_iso_8601": "2023-09-21T15:37:36.759115Z",
            "url": "https://files.pythonhosted.org/packages/2a/3a/e5a2ab63e8bc89dbbe5c335d99774c8f4c4fd8382e06af4e5674d0eac06a/retraction_watch_import-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0ba91955a4097f36d4a667d2b5b37c8c197e0793f85ff6fb4f07064c1cdde953",
                "md5": "1891723f89e6a23160fa48ee1c292c1f",
                "sha256": "436878bcf2ae1ed2825538143883278ef0a1136b1ac3e50bfae0df09447dc71f"
            },
            "downloads": -1,
            "filename": "retraction_watch_import-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "1891723f89e6a23160fa48ee1c292c1f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 8315,
            "upload_time": "2023-09-21T15:37:39",
            "upload_time_iso_8601": "2023-09-21T15:37:39.454148Z",
            "url": "https://files.pythonhosted.org/packages/0b/a9/1955a4097f36d4a667d2b5b37c8c197e0793f85ff6fb4f07064c1cdde953/retraction_watch_import-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-21 15:37:39",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "codeberg": false,
    "gitlab_user": "crossref",
    "gitlab_project": "labs",
    "lcname": "retraction-watch-import"
}
        
Elapsed time: 0.12901s