bib-dedupe


Namebib-dedupe JSON
Version 0.10.0 PyPI version JSON
download
home_pagehttps://github.com/CoLRev-Environment/bib-dedupe
SummaryIdentify and merge duplicates in bibliographic records
upload_time2024-11-05 06:03:24
maintainerNone
docs_urlNone
authorGerit Wagner
requires_python<4.0,>=3.10
licenseMIT
keywords de-duplication duplicate meta-analysis research reproducible research open science literature literature review systematic review systematic literature review
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">

# BibDedupe

<!-- [![License](https://img.shields.io/github/license/CoLRev-Ecosystem/bib-dedupe.svg)](https://github.com/CoLRev-Environment/bib-dedupe/releases/) -->
[![status](https://joss.theoj.org/papers/b954027d06d602c106430e275fe72130/status.svg)](https://joss.theoj.org/papers/b954027d06d602c106430e275fe72130)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/bib-dedupe)<br>
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/CoLRev-Environment/bib-dedupe/.github%2Fworkflows%2Ftests.yml?label=tests)](https://github.com/CoLRev-Environment/bib-dedupe/actions/workflows/tests.yml)
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/CoLRev-Environment/bib-dedupe/.github%2Fworkflows%2Fdocs.yml?label=docs)](https://github.com/CoLRev-Environment/bib-dedupe/actions/workflows/docs.yml)
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/CoLRev-Environment/bib-dedupe/.github%2Fworkflows%2Fevaluate.yml?label=continuous%20evaluation)](https://github.com/CoLRev-Environment/bib-dedupe/actions/workflows/evaluate.yml)

</div>

## Overview

BibDedupe is an open-source **Python library for deduplication of bibliographic records**, tailored for literature reviews.
Unlike traditional deduplication methods, BibDedupe focuses on entity resolution, linking duplicate records instead of simply deleting them.

## Features

- **Automated Duplicate Linking with Zero False Positives**: BibDedupe automates the duplicate linking process with a focus on eliminating false positives.
- **Preprocessing Approach**: BibDedupe uses a preprocessing approach that reflects the unique error generation process in academic databases, such as author re-formatting, journal abbreviation or translations.
- **Entity Resolution**: BibDedupe does not simply delete duplicates, but it links duplicates to resolve the entitity and integrates the data. This allows for validation, and undo operations.
- **Programmatic Access**: BibDedupe is designed for seamless integration into existing research workflows, providing programmatic access for easy incorporation into scripts and applications.
- **Transparent and Reproducible Rules**: BibDedupe's blocking and matching rules are transparent and easily reproducible to promote reproducibility in deduplication processes.
- **Continuous Benchmarking**: Continuous integration tests running on GitHub Actions ensure ongoing benchmarking, maintaining the library's reliability and performance across datasets.
- **Efficient and Parallel Computation**: BibDedupe implements computations efficiently and in parallel, using appropriate data structures and functions for optimal performance.

## Documentation

Explore the [official documentation](https://colrev-environment.github.io/bib-dedupe/) for comprehensive information on installation, usage, and customization of BibDedupe.

## Citation

If you use BibDedupe in your research, please cite it as follows:

Wagner, G. (2024) BibDedupe - An open-source Python library for deduplication of bibliographic records. *Journal of Open Source Software*, 9(97), 6318, [https://doi.org/10.21105/joss.06318](https://doi.org/10.21105/joss.06318).

## Contribution Guidelines

We welcome contributions from the community to enhance and expand BibDedupe. If you would like to contribute, please follow our [contribution guidelines](CONTRIBUTING.md).

## License

BibDedupe is released under the [MIT License](LICENSE), allowing free and open use and modification.

## Contact

For any questions, issues, or feedback, please open an [issue](https://github.com/CoLRev-Environment/bib-dedupe/issues) on our GitHub repository.

Happy deduplicating with BibDedupe!

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/CoLRev-Environment/bib-dedupe",
    "name": "bib-dedupe",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "de-duplication, duplicate, meta-analysis, research, reproducible research, open science, literature, literature review, systematic review, systematic literature review",
    "author": "Gerit Wagner",
    "author_email": "gerit.wagner@uni-bamberg.de",
    "download_url": "https://files.pythonhosted.org/packages/a1/5b/8e87ee135e7648fe0a84bf1ed47d4aac08b6cc4c4c7005454fe0babc2edb/bib_dedupe-0.10.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\n# BibDedupe\n\n<!-- [![License](https://img.shields.io/github/license/CoLRev-Ecosystem/bib-dedupe.svg)](https://github.com/CoLRev-Environment/bib-dedupe/releases/) -->\n[![status](https://joss.theoj.org/papers/b954027d06d602c106430e275fe72130/status.svg)](https://joss.theoj.org/papers/b954027d06d602c106430e275fe72130)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/bib-dedupe)<br>\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)\n[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/CoLRev-Environment/bib-dedupe/.github%2Fworkflows%2Ftests.yml?label=tests)](https://github.com/CoLRev-Environment/bib-dedupe/actions/workflows/tests.yml)\n[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/CoLRev-Environment/bib-dedupe/.github%2Fworkflows%2Fdocs.yml?label=docs)](https://github.com/CoLRev-Environment/bib-dedupe/actions/workflows/docs.yml)\n[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/CoLRev-Environment/bib-dedupe/.github%2Fworkflows%2Fevaluate.yml?label=continuous%20evaluation)](https://github.com/CoLRev-Environment/bib-dedupe/actions/workflows/evaluate.yml)\n\n</div>\n\n## Overview\n\nBibDedupe is an open-source **Python library for deduplication of bibliographic records**, tailored for literature reviews.\nUnlike traditional deduplication methods, BibDedupe focuses on entity resolution, linking duplicate records instead of simply deleting them.\n\n## Features\n\n- **Automated Duplicate Linking with Zero False Positives**: BibDedupe automates the duplicate linking process with a focus on eliminating false positives.\n- **Preprocessing Approach**: BibDedupe uses a preprocessing approach that reflects the unique error generation process in academic databases, such as author re-formatting, journal abbreviation or translations.\n- **Entity Resolution**: BibDedupe does not simply delete duplicates, but it links duplicates to resolve the entitity and integrates the data. This allows for validation, and undo operations.\n- **Programmatic Access**: BibDedupe is designed for seamless integration into existing research workflows, providing programmatic access for easy incorporation into scripts and applications.\n- **Transparent and Reproducible Rules**: BibDedupe's blocking and matching rules are transparent and easily reproducible to promote reproducibility in deduplication processes.\n- **Continuous Benchmarking**: Continuous integration tests running on GitHub Actions ensure ongoing benchmarking, maintaining the library's reliability and performance across datasets.\n- **Efficient and Parallel Computation**: BibDedupe implements computations efficiently and in parallel, using appropriate data structures and functions for optimal performance.\n\n## Documentation\n\nExplore the [official documentation](https://colrev-environment.github.io/bib-dedupe/) for comprehensive information on installation, usage, and customization of BibDedupe.\n\n## Citation\n\nIf you use BibDedupe in your research, please cite it as follows:\n\nWagner, G. (2024) BibDedupe - An open-source Python library for deduplication of bibliographic records. *Journal of Open Source Software*, 9(97), 6318, [https://doi.org/10.21105/joss.06318](https://doi.org/10.21105/joss.06318).\n\n## Contribution Guidelines\n\nWe welcome contributions from the community to enhance and expand BibDedupe. If you would like to contribute, please follow our [contribution guidelines](CONTRIBUTING.md).\n\n## License\n\nBibDedupe is released under the [MIT License](LICENSE), allowing free and open use and modification.\n\n## Contact\n\nFor any questions, issues, or feedback, please open an [issue](https://github.com/CoLRev-Environment/bib-dedupe/issues) on our GitHub repository.\n\nHappy deduplicating with BibDedupe!\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Identify and merge duplicates in bibliographic records",
    "version": "0.10.0",
    "project_urls": {
        "Documentation": "https://colrev-environment.github.io/bib-dedupe/",
        "Homepage": "https://github.com/CoLRev-Environment/bib-dedupe",
        "Repository": "https://github.com/CoLRev-Environment/bib-dedupe"
    },
    "split_keywords": [
        "de-duplication",
        " duplicate",
        " meta-analysis",
        " research",
        " reproducible research",
        " open science",
        " literature",
        " literature review",
        " systematic review",
        " systematic literature review"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3aae394c092a6ffc56631c6d30328134c044c0704a073c84acb4fb5ba57ef2f6",
                "md5": "e087014db950f226baabeb8b0caeba65",
                "sha256": "634370b4b03ac77cd86460c2f3832da69f8ec3de7213fd20a4465d59a614ec2e"
            },
            "downloads": -1,
            "filename": "bib_dedupe-0.10.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e087014db950f226baabeb8b0caeba65",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 71708,
            "upload_time": "2024-11-05T06:03:22",
            "upload_time_iso_8601": "2024-11-05T06:03:22.855430Z",
            "url": "https://files.pythonhosted.org/packages/3a/ae/394c092a6ffc56631c6d30328134c044c0704a073c84acb4fb5ba57ef2f6/bib_dedupe-0.10.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a15b8e87ee135e7648fe0a84bf1ed47d4aac08b6cc4c4c7005454fe0babc2edb",
                "md5": "3561faa2c450191c67ad088c81a4c36e",
                "sha256": "3c8079dfbb1f7c28e5c0abb64d99bc868f9a4c2c831528986f09c6fca0f2a4f9"
            },
            "downloads": -1,
            "filename": "bib_dedupe-0.10.0.tar.gz",
            "has_sig": false,
            "md5_digest": "3561faa2c450191c67ad088c81a4c36e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 64454,
            "upload_time": "2024-11-05T06:03:24",
            "upload_time_iso_8601": "2024-11-05T06:03:24.575764Z",
            "url": "https://files.pythonhosted.org/packages/a1/5b/8e87ee135e7648fe0a84bf1ed47d4aac08b6cc4c4c7005454fe0babc2edb/bib_dedupe-0.10.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-05 06:03:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "CoLRev-Environment",
    "github_project": "bib-dedupe",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "bib-dedupe"
}
        
Elapsed time: 0.31516s