<div align="center">
# BibDedupe
<!-- [![License](https://img.shields.io/github/license/CoLRev-Ecosystem/bib-dedupe.svg)](https://github.com/CoLRev-Environment/bib-dedupe/releases/) -->
[![status](https://joss.theoj.org/papers/b954027d06d602c106430e275fe72130/status.svg)](https://joss.theoj.org/papers/b954027d06d602c106430e275fe72130)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/bib-dedupe)<br>
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/CoLRev-Environment/bib-dedupe/.github%2Fworkflows%2Ftests.yml?label=tests)](https://github.com/CoLRev-Environment/bib-dedupe/actions/workflows/tests.yml)
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/CoLRev-Environment/bib-dedupe/.github%2Fworkflows%2Fdocs.yml?label=docs)](https://github.com/CoLRev-Environment/bib-dedupe/actions/workflows/docs.yml)
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/CoLRev-Environment/bib-dedupe/.github%2Fworkflows%2Fevaluate.yml?label=continuous%20evaluation)](https://github.com/CoLRev-Environment/bib-dedupe/actions/workflows/evaluate.yml)
</div>
## Overview
BibDedupe is an open-source **Python library for deduplication of bibliographic records**, tailored for literature reviews.
Unlike traditional deduplication methods, BibDedupe focuses on entity resolution, linking duplicate records instead of simply deleting them.
## Features
- **Automated Duplicate Linking with Zero False Positives**: BibDedupe automates the duplicate linking process with a focus on eliminating false positives.
- **Preprocessing Approach**: BibDedupe uses a preprocessing approach that reflects the unique error generation process in academic databases, such as author re-formatting, journal abbreviation or translations.
- **Entity Resolution**: BibDedupe does not simply delete duplicates, but it links duplicates to resolve the entitity and integrates the data. This allows for validation, and undo operations.
- **Programmatic Access**: BibDedupe is designed for seamless integration into existing research workflows, providing programmatic access for easy incorporation into scripts and applications.
- **Transparent and Reproducible Rules**: BibDedupe's blocking and matching rules are transparent and easily reproducible to promote reproducibility in deduplication processes.
- **Continuous Benchmarking**: Continuous integration tests running on GitHub Actions ensure ongoing benchmarking, maintaining the library's reliability and performance across datasets.
- **Efficient and Parallel Computation**: BibDedupe implements computations efficiently and in parallel, using appropriate data structures and functions for optimal performance.
## Documentation
Explore the [official documentation](https://colrev-environment.github.io/bib-dedupe/) for comprehensive information on installation, usage, and customization of BibDedupe.
## Citation
If you use BibDedupe in your research, please cite it as follows:
Wagner, G. (2024) BibDedupe - An open-source Python library for deduplication of bibliographic records. Available at https://github.com/CoLRev-Environment/bib-dedupe.
## Contribution Guidelines
We welcome contributions from the community to enhance and expand BibDedupe. If you would like to contribute, please follow our [contribution guidelines](CONTRIBUTING.md).
## License
BibDedupe is released under the [MIT License](LICENSE), allowing free and open use and modification.
## Contact
For any questions, issues, or feedback, please open an [issue](https://github.com/CoLRev-Environment/bib-dedupe/issues) on our GitHub repository.
Happy deduplicating with BibDedupe!
Raw data
{
"_id": null,
"home_page": "https://github.com/CoLRev-Environment/bib-dedupe",
"name": "bib-dedupe",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "de-duplication, duplicate, meta-analysis, research, reproducible research, open science, literature, literature review, systematic review, systematic literature review",
"author": "Gerit Wagner",
"author_email": "gerit.wagner@uni-bamberg.de",
"download_url": "https://files.pythonhosted.org/packages/35/76/f354bdf173d1bd76267c3d809c565667cf5eeb7e0bffa559cfd2b44d77ff/bib_dedupe-0.7.5.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n\n# BibDedupe\n\n\n<!-- [![License](https://img.shields.io/github/license/CoLRev-Ecosystem/bib-dedupe.svg)](https://github.com/CoLRev-Environment/bib-dedupe/releases/) -->\n[![status](https://joss.theoj.org/papers/b954027d06d602c106430e275fe72130/status.svg)](https://joss.theoj.org/papers/b954027d06d602c106430e275fe72130)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/bib-dedupe)<br>\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)\n[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/CoLRev-Environment/bib-dedupe/.github%2Fworkflows%2Ftests.yml?label=tests)](https://github.com/CoLRev-Environment/bib-dedupe/actions/workflows/tests.yml)\n[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/CoLRev-Environment/bib-dedupe/.github%2Fworkflows%2Fdocs.yml?label=docs)](https://github.com/CoLRev-Environment/bib-dedupe/actions/workflows/docs.yml)\n[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/CoLRev-Environment/bib-dedupe/.github%2Fworkflows%2Fevaluate.yml?label=continuous%20evaluation)](https://github.com/CoLRev-Environment/bib-dedupe/actions/workflows/evaluate.yml)\n\n</div>\n\n## Overview\n\nBibDedupe is an open-source **Python library for deduplication of bibliographic records**, tailored for literature reviews.\nUnlike traditional deduplication methods, BibDedupe focuses on entity resolution, linking duplicate records instead of simply deleting them.\n\n## Features\n\n- **Automated Duplicate Linking with Zero False Positives**: BibDedupe automates the duplicate linking process with a focus on eliminating false positives.\n- **Preprocessing Approach**: BibDedupe uses a preprocessing approach that reflects the unique error generation process in academic databases, such as author re-formatting, journal abbreviation or translations.\n- **Entity Resolution**: BibDedupe does not simply delete duplicates, but it links duplicates to resolve the entitity and integrates the data. This allows for validation, and undo operations.\n- **Programmatic Access**: BibDedupe is designed for seamless integration into existing research workflows, providing programmatic access for easy incorporation into scripts and applications.\n- **Transparent and Reproducible Rules**: BibDedupe's blocking and matching rules are transparent and easily reproducible to promote reproducibility in deduplication processes.\n- **Continuous Benchmarking**: Continuous integration tests running on GitHub Actions ensure ongoing benchmarking, maintaining the library's reliability and performance across datasets.\n- **Efficient and Parallel Computation**: BibDedupe implements computations efficiently and in parallel, using appropriate data structures and functions for optimal performance.\n\n## Documentation\n\nExplore the [official documentation](https://colrev-environment.github.io/bib-dedupe/) for comprehensive information on installation, usage, and customization of BibDedupe.\n\n## Citation\n\nIf you use BibDedupe in your research, please cite it as follows:\n\nWagner, G. (2024) BibDedupe - An open-source Python library for deduplication of bibliographic records. Available at https://github.com/CoLRev-Environment/bib-dedupe.\n\n\n## Contribution Guidelines\n\nWe welcome contributions from the community to enhance and expand BibDedupe. If you would like to contribute, please follow our [contribution guidelines](CONTRIBUTING.md).\n\n## License\n\nBibDedupe is released under the [MIT License](LICENSE), allowing free and open use and modification.\n\n## Contact\n\nFor any questions, issues, or feedback, please open an [issue](https://github.com/CoLRev-Environment/bib-dedupe/issues) on our GitHub repository.\n\nHappy deduplicating with BibDedupe!\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Identify and merge duplicates in bibliographic records",
"version": "0.7.5",
"project_urls": {
"Documentation": "https://colrev-environment.github.io/bib-dedupe/",
"Homepage": "https://github.com/CoLRev-Environment/bib-dedupe",
"Repository": "https://github.com/CoLRev-Environment/bib-dedupe"
},
"split_keywords": [
"de-duplication",
" duplicate",
" meta-analysis",
" research",
" reproducible research",
" open science",
" literature",
" literature review",
" systematic review",
" systematic literature review"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "167a891617cd05e69531db738dc58d90661fd67228492fd6d9bbed97d5a4ee9f",
"md5": "f7b73fdb46978b870feb46de820c0521",
"sha256": "c252e17a5c6d416f791e81ff3d2506a2b06fcbbf1bec1983bd8e1533ac80a3f7"
},
"downloads": -1,
"filename": "bib_dedupe-0.7.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f7b73fdb46978b870feb46de820c0521",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 71309,
"upload_time": "2024-05-07T10:51:08",
"upload_time_iso_8601": "2024-05-07T10:51:08.294602Z",
"url": "https://files.pythonhosted.org/packages/16/7a/891617cd05e69531db738dc58d90661fd67228492fd6d9bbed97d5a4ee9f/bib_dedupe-0.7.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3576f354bdf173d1bd76267c3d809c565667cf5eeb7e0bffa559cfd2b44d77ff",
"md5": "42513230b6c11b408bfa3643b5a7cc24",
"sha256": "4cbfa8c2f8e19e2d4900ff068d90bbe104e0ebdc8e68ff35a718d7a7a437dd6a"
},
"downloads": -1,
"filename": "bib_dedupe-0.7.5.tar.gz",
"has_sig": false,
"md5_digest": "42513230b6c11b408bfa3643b5a7cc24",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 64051,
"upload_time": "2024-05-07T10:51:09",
"upload_time_iso_8601": "2024-05-07T10:51:09.610163Z",
"url": "https://files.pythonhosted.org/packages/35/76/f354bdf173d1bd76267c3d809c565667cf5eeb7e0bffa559cfd2b44d77ff/bib_dedupe-0.7.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-07 10:51:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "CoLRev-Environment",
"github_project": "bib-dedupe",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "bib-dedupe"
}