cleanbib


Namecleanbib JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryA tool for parsing and cleaning BibTex and BibLaTex files for better LaTex bibliography formatting.
upload_time2025-02-28 00:13:45
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords biblatex bibtex latex
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # BibCleaner

BibCleaner is a Python package designed to parse, clean, and format .bib files for better LaTeX bibliography management. It ensures consistency in formatting, validates DOIs, and helps detect missing or incorrect fields. It facilitates integrations between citation management tools such as zotero and latex editor. 

## Features

Parses .bib files and structures them properly
Cleans and formats author names (handles capitalization, prefixes like "van", "de", etc.)
Ensures proper title capitalization (preserving LaTeX commands)
Customize fields and remove duplicate authors
Detects and fixes formatting issues in pages, DOIs, dates, etc.  
Validates DOIs and warns if they are broken  
Auto-generates citation keys based on author, title, and year  
Categorizes warnings:  
- Critical Issues: Missing author, title, year, etc.  
- Soft Warnings: Broken DOI, minor inconsistencies  

## Installation

Install the package using pip:
```sh
pip install bibcleaner
```

Or, clone the repository and install it manually:
```sh
git clone https://github.com/harryziyuhe/bibcleaner.git
cd bibcleaner
pip install .
```

## Usage

### Basic Usage (CLI)
```python
from bibcleaner import BibCleaner

cleaner = BibCleaner("references.bib")
cleaner.process()
cleaner.save_as_bib("cleaned_references.bib")
```

## Example Input & Output

### Example .bib File (Before Cleaning)
```bibtex
@article{einstein1923,
  author = "albert einstein",
  title = "the theory of relativity",
  journal = "journal of physics",
  year = "1923",
  doi = "10.1000/xyz123"
}
```

### Cleaned Output
```bibtex
@article{Einstein_Theory_1923,
  author = {Albert Einstein},
  title = {The Theory of Relativity},
  journal = {Journal of Physics},
  year = {1923},
  doi = {10.1000/xyz123}
}
```

### Warning Output
```
Critical Issues Found:
  Missing required field 'author' in @article{sample2024}

Soft Warnings:
  DOI '10.1000/broken' in @article{sample2023} may be broken (HTTP 404).
```

## Advanced Features

### Cleaning Options
You can specify which fields to keep when cleaning. Example:
```python
keep_fields = {
    "article": ["author", "title", "journal", "year"],
    "book": ["author", "title", "publisher", "year"]
}
cleaner = BibCleaner("references.bib", keep_fields)
cleaner.process()
```

### Export to JSON
```python
cleaner.save_as_json("cleaned_references.json")
```
## License

This project is licensed under the MIT License.  
See [LICENSE](LICENSE) for details.

## Contact

For suggestions or issues, feel free to open an issue on GitHub or contact zih028@ucsd.edu
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cleanbib",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "BibLaTex, BibTex, LaTex",
    "author": null,
    "author_email": "Harry He <zih028@ucsd.edu>",
    "download_url": "https://files.pythonhosted.org/packages/23/28/8adad654dcde1367235dcfc11bf194fb7831ba21013991e36656c5827c3b/cleanbib-0.1.0.tar.gz",
    "platform": null,
    "description": "# BibCleaner\n\nBibCleaner is a Python package designed to parse, clean, and format .bib files for better LaTeX bibliography management. It ensures consistency in formatting, validates DOIs, and helps detect missing or incorrect fields. It facilitates integrations between citation management tools such as zotero and latex editor. \n\n## Features\n\nParses .bib files and structures them properly\nCleans and formats author names (handles capitalization, prefixes like \"van\", \"de\", etc.)\nEnsures proper title capitalization (preserving LaTeX commands)\nCustomize fields and remove duplicate authors\nDetects and fixes formatting issues in pages, DOIs, dates, etc.  \nValidates DOIs and warns if they are broken  \nAuto-generates citation keys based on author, title, and year  \nCategorizes warnings:  \n- Critical Issues: Missing author, title, year, etc.  \n- Soft Warnings: Broken DOI, minor inconsistencies  \n\n## Installation\n\nInstall the package using pip:\n```sh\npip install bibcleaner\n```\n\nOr, clone the repository and install it manually:\n```sh\ngit clone https://github.com/harryziyuhe/bibcleaner.git\ncd bibcleaner\npip install .\n```\n\n## Usage\n\n### Basic Usage (CLI)\n```python\nfrom bibcleaner import BibCleaner\n\ncleaner = BibCleaner(\"references.bib\")\ncleaner.process()\ncleaner.save_as_bib(\"cleaned_references.bib\")\n```\n\n## Example Input & Output\n\n### Example .bib File (Before Cleaning)\n```bibtex\n@article{einstein1923,\n  author = \"albert einstein\",\n  title = \"the theory of relativity\",\n  journal = \"journal of physics\",\n  year = \"1923\",\n  doi = \"10.1000/xyz123\"\n}\n```\n\n### Cleaned Output\n```bibtex\n@article{Einstein_Theory_1923,\n  author = {Albert Einstein},\n  title = {The Theory of Relativity},\n  journal = {Journal of Physics},\n  year = {1923},\n  doi = {10.1000/xyz123}\n}\n```\n\n### Warning Output\n```\nCritical Issues Found:\n  Missing required field 'author' in @article{sample2024}\n\nSoft Warnings:\n  DOI '10.1000/broken' in @article{sample2023} may be broken (HTTP 404).\n```\n\n## Advanced Features\n\n### Cleaning Options\nYou can specify which fields to keep when cleaning. Example:\n```python\nkeep_fields = {\n    \"article\": [\"author\", \"title\", \"journal\", \"year\"],\n    \"book\": [\"author\", \"title\", \"publisher\", \"year\"]\n}\ncleaner = BibCleaner(\"references.bib\", keep_fields)\ncleaner.process()\n```\n\n### Export to JSON\n```python\ncleaner.save_as_json(\"cleaned_references.json\")\n```\n## License\n\nThis project is licensed under the MIT License.  \nSee [LICENSE](LICENSE) for details.\n\n## Contact\n\nFor suggestions or issues, feel free to open an issue on GitHub or contact zih028@ucsd.edu",
    "bugtrack_url": null,
    "license": null,
    "summary": "A tool for parsing and cleaning BibTex and BibLaTex files for better LaTex bibliography formatting.",
    "version": "0.1.0",
    "project_urls": {
        "Documentation": "https://github.com/harryziyuhe/cleanbib#readme",
        "Issues": "https://github.com/harryziyuhe/cleanbib/issues",
        "Source": "https://github.com/harryziyuhe/cleanbib"
    },
    "split_keywords": [
        "biblatex",
        " bibtex",
        " latex"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "af9857459be4fa28d0f7fcc78713e4c8ff8f78e34fff8fe6ca71dd88e7bc283b",
                "md5": "1217104aa4d81428e869f151acd3b063",
                "sha256": "e6fd85ff28666bead22ec84bcd15bf9a68dc782c2e118401df169dae96a708f5"
            },
            "downloads": -1,
            "filename": "cleanbib-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1217104aa4d81428e869f151acd3b063",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 8134,
            "upload_time": "2025-02-28T00:13:43",
            "upload_time_iso_8601": "2025-02-28T00:13:43.714600Z",
            "url": "https://files.pythonhosted.org/packages/af/98/57459be4fa28d0f7fcc78713e4c8ff8f78e34fff8fe6ca71dd88e7bc283b/cleanbib-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "23288adad654dcde1367235dcfc11bf194fb7831ba21013991e36656c5827c3b",
                "md5": "e54e512dd3e8ac85d209ee3e6f849266",
                "sha256": "476670d65fe5dcc950d23c0249aecbc254e91561a3ee6712bd8185d19dc89cb6"
            },
            "downloads": -1,
            "filename": "cleanbib-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e54e512dd3e8ac85d209ee3e6f849266",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 10378,
            "upload_time": "2025-02-28T00:13:45",
            "upload_time_iso_8601": "2025-02-28T00:13:45.561553Z",
            "url": "https://files.pythonhosted.org/packages/23/28/8adad654dcde1367235dcfc11bf194fb7831ba21013991e36656c5827c3b/cleanbib-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-28 00:13:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "harryziyuhe",
    "github_project": "cleanbib#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "cleanbib"
}
        
Elapsed time: 1.31373s