bibtexautocomplete


Namebibtexautocomplete JSON
Version 1.3.2 PyPI version JSON
download
home_pageNone
SummaryScript to autocomplete bibtex files by polling online databases
upload_time2024-04-12 21:41:52
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT License Copyright (c) 2022-2024 Dorian Lesbre Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords bibtex biblatex latex autocomplete btac
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Bibtex Autocomplete

[![PyPI version][version-shield]][pypi-link]
[![PyPI pyversions][pyversion-shield]][pypi-link]
[![License][license-shield]](https://choosealicense.com/licenses/mit/)
[![PyPI status][status-shield]][pypi-link]
[![Downloads][download-shield]](https://pepy.tech/project/bibtexautocomplete)

[![Maintenance][maintain-shield]][commit-link]
[![Commit][commit-shield]][commit-link]
[![actions][pipeline-shield]][pipeline-link]
[![issues][issues-shield]][issues-link]
[![pull requests][pr-shield]][pr-link]

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10207744.svg)](https://doi.org/10.5281/zenodo.10207744)

[version-shield]:   https://img.shields.io/pypi/v/bibtexautocomplete.svg
[pyversion-shield]: https://img.shields.io/pypi/pyversions/bibtexautocomplete.svg
[license-shield]:   https://img.shields.io/pypi/l/bibtexautocomplete.svg
[status-shield]:    https://img.shields.io/pypi/status/bibtexautocomplete.svg
[download-shield]:  https://static.pepy.tech/badge/bibtexautocomplete
[pypi-link]: https://pypi.python.org/pypi/bibtexautocomplete/

[maintain-shield]: https://img.shields.io/badge/Maintained%3F-yes-brightgreen.svg
[commit-shield]: https://img.shields.io/github/last-commit/dlesbre/bibtex-autocomplete
[commit-link]: https://github.com/dlesbre/bibtex-autocomplete/graphs/commit-activity

[pipeline-shield]: https://img.shields.io/github/actions/workflow/status/dlesbre/bibtex-autocomplete/python-app.yml?branch=master&label=tests
[pipeline-link]: https://github.com/dlesbre/bibtex-autocomplete/actions/workflows/python-app.yml

[issues-shield]: https://img.shields.io/github/issues/dlesbre/bibtex-autocomplete
[issues-link]: https://github.com/dlesbre/bibtex-autocomplete/issues

[pr-shield]: https://img.shields.io/github/issues-pr/dlesbre/bibtex-autocomplete
[pr-link]: https://github.com/dlesbre/bibtex-autocomplete/pulls

**bibtex-autocomplete** or **btac** is a simple script to autocomplete BibTeX
bibliographies. It reads a BibTeX file and looks online for any additional data
to add to each entry. It can quickly generate entries from minimal data (a lone
title is often sufficient to generate a full entry). You can also use it to only
add specific fields (like DOIs, or ISSN) to a manually curated bib file.

It is designed to be as simple to use as possible: just give it a bib file and
let **btac** work its magic! It combines multiple sources and runs consistency
and normalization checks on the added fields (check that URLs lead to a valid
webpage, that DOIs exist at https://dx.doi.org/).

It attempts to complete a BibTeX file by querying the following domains:
- [openalex.org](https://openalex.org/): ~240 million entries
- [www.crossref.org](https://www.crossref.org/): ~150 million entries
- [arxiv.org](https://arxiv.org/): open access archive, ~2.4 million entries
- [semanticscholar.org](https://www.semanticscholar.org/): ~215 million entries
- [unpaywall.org](https://unpaywall.org/): database of open access articles, ~48 million entries
- [dblp.org](https://dblp.org): computer science, ~7 million entries
- [researchr.org](https://researchr.org/): computer science
- [inspirehep.net](https://inspirehep.net/): high-energy physics, ~1.5 million entries

Big thanks to all of them for allowing open, easy and well-documented access to
their databases. This project wouldn't be possible without them. You can easily
narrow down the list of sources if some aren't relevant using command line options.

### Contents

- [New in version 1.3](#new-in-version-13)
- [Demo](#demo)
- [Quick overview](#quick-overview)
- [Installation](#installation)
  - [Dependencies](#dependencies)
- [Usage](#usage)
- [Command line arguments](#command-line-arguments)
  - [Specifying output](#specifying-output)
  - [Query filtering](#query-filtering)
  - [New field formatting](#new-field-formatting)
  - [Global output formatting](#global-output-formatting)
  - [Optional flags](#optional-flags)
- [Credit and license](#credit-and-license)

## New in version 1.3

Added OpenAlex and Inspire HEP as sources. Switched to a majority vote between source
to find new field, along with smart field normalization and comparison. And of course,
bug fixes!

See the [changelog](https://github.com/dlesbre/bibtex-autocomplete/blob/master/CHANGELOG.md) for full details.

## Demo

![demo.svg](https://raw.githubusercontent.com/dlesbre/bibtex-autocomplete/2d1a01f5ec94c8af9c2f3c1a810eca51bb4cce74/imgs/demo.svg)

## Quick overview

**How does it find matches?**

`btac` queries the websites using the entry DOI (if known) or its title. So
entries that don't have one of those two fields *will not* be completed.
- DOIs are only used if they can be recognized, so the `doi` field should
  contain "10.xxxx/yyyy" or an URL ending with it.
- Titles should be the full title. They are compared excluding case and
  punctuation, but titles with missing words will not match.
- If one or more authors are present, entries with no common authors will not
  match. Authors are compared using lower case last names only. Be sure to use
  one of the correct BibTeX formats for the author field:
  ```bibtex
  author = {First Last and Last, First and First von Last}
  ```
  (see
  [https://www.bibtex.com/f/author-field/](https://www.bibtex.com/f/author-field/)
  for full details)
- If the year is known, entries with different years will also not match.

**Disclaimers**

- There is no guarantee that the script will find matches for your entries, or
  that the websites will have any data to add to your entries, (or even that the
  website data is correct, but that's not for me to say...)

- The script is designed to minimize the chance of false positives - that is
  adding data from another similar-ish entry to your entry. If you find any such
  false positive please report them using the [issue
  tracker](https://github.com/dlesbre/bibtex-autocomplete/issues).

**How are entries completed?**

Once responses from all websites have been found, the script will add fields
from website with the following priority by performing a majority vote among the
source. To do so it uses smart normalization and merging tactics for each field:
- Authors (and editors) match if they have same last names and, if both first
  names present, the first name of one is equal/an abbreviation of the other.
  Author list match if their intersection is non-empty.
- ISSN and ISBN are normalized their check digits verified. ISBN are converted
  to their 13 digit representation
- URL and DOI are checked for valid format, and further validated by querying
  them online to ensure they exist
- Many fields match with abbreviation detection (journal, institution, booktitle,
  organization, publisher, school and series). So `ACM` will match
  `Association for Computer Machinery`
- Pages are normalized to use `--` as separator
- All other fields are compared excluding case and punctuation.

The script will not overwrite any user given non-empty fields, unless the
`-f/--force-overwrite` flag is given. If you want to check what fields are
added, you can use `-v/--verbose` to have them printed to stdout (with
source information), or `-p/--prefix` to have the new fields be prefixed with
`BTAC` in the output file.

## Installation

Can be installed with [pip](https://pypi.org/project/pip/) :

```console
pip install bibtexautocomplete
```

You should now be able to run the script using either command:

```console
btac --version
python3 -m bibtexautocomplete --version
```

**Note:** `pip` no longer allows installing scripts globally in systems with other
package managers (like most Linux distros). You can install the script locally in
a [virtual environment](https://docs.python.org/3/library/venv.html) or globally
using [pipx](https://pipx.pypa.io/stable/):

```console
sudo apt install pipx
pipx install bibtexautocomplete
```

### Dependencies

This package has two dependencies (automatically installed by pip) :

- [bibtexparser](https://bibtexparser.readthedocs.io/) (<2.0.0)
- [alive_progress](https://github.com/rsalmei/alive-progress) (>= 3.0.0) for the fancy progress bar

## Usage

The command line tool can be used as follows:
```console
btac [--flags] <input_files>
```

**Examples :**

- `btac my/db.bib` : reads from `./my/db.bib`, writes to `./my/db.btac.bib`.
  A different output file can be specified with `-o`.
- `btac -i db.bib` : reads from `db.bib` and overwrites it (inplace flag).
  Avoid on non backed-up/version-controlled files, I'd hate it if my script
  corrupted your data.
- `btac folder` : reads from all files ending with `.bib` in folder. Excludes
  `.btac.bib` files unless they are the only `.bib` files present. Writes to
  `folder/file.btac.bib` unless inplace flag is set.
- `btac` with no inputs is same as `btac .`, reads file from current working directory
- `btac -c doi ...` only completes DOI fields, leave others unchanged
- `btac -v ...` verbose mode, pretty prints all new fields when done.
  See [this image](https://raw.githubusercontent.com/dlesbre/bibtex-autocomplete/master/imgs/btac-verbose.png) for a preview of verbose output.

**Note:** the [parser](https://pypi.org/project/bibtexparser/) doesn't preserve
format information, so this script will reformat your files. Some [formatting
options](#output-formatting) are provided to control output format.

**Slow responses:** Sometimes due to server traffic, a source DB may take significantly longer
to respond and slow `btac`.
- You can increase timeout with `btac ... -t 60` (60s) or `btac ... -t -1` (no timeout)
- You can disable queries to the offender `btac ... -Q <website>`
- You can try again at another time

## Command line arguments

As `btac` has a lot of option I'd recommend setting up an alias if you use a lot
regularly.

### Specifying output

- `-o --output <file.bib>`

  Write output to given file. Can be used multiple times when also giving
  multiple inputs. Maps inputs to outputs in order. If there are extra inputs,
  uses default name (`old_name.btac.bib`). Ignored in inplace (`-i`) mode.

  For example `btac db1.bib db2.bib db3.bib -o out1.bib -o out2.bib` reads `db1.bib`,
  `db2.bib` and `db3.bib`, and write their outputs to `out1.bib`, `out2.bib`
  and `db3.btac.bib` respectively.

- `-i --inplace` Modify input files inplace, ignores any specified output files.
  Avoid on non backed-up/version-controlled files, I'd hate it if my script
  corrupted your data.

- `-O --no-output` don't write any output files (except the one specified by `--dump-data`)
  can be used with `-v/--verbose` mode to only print a list of changes to the terminal

### Query filtering

- `-q --only-query <site>` or `-Q --dont-query <site>`

  Restrict which websites to query from. `<site>` must be one of: `openalex`,
  `crossref`, `arxiv`, `s2`, `unpaywall`, `dblp`, `researchr`, `inspire`. These arguments
  can be used multiple times, for example to only query Crossref and DBLP use
  `-q crossref -q dblp` or
  `-Q openalex -Q researchr -Q unpaywall -Q arxiv -Q s2 -Q inspire`

- `-e --only-entry <id>` or `-E --exclude-entry <id>`

  Restrict which entries should be autocompleted. `<id>` is the entry ID used in
  your BibTeX file (e.g. `@inproceedings{<id> ... }`). These arguments can also
  be used multiple times to select only/exclude multiple entries

- `-c --only-complete <field>` or `-C --dont-complete <field>`

  Restrict which fields you wish to autocomplete. Field is a BibTeX field (e.g.
  `author`, `doi`,...). So if you only wish to add missing DOIs use `-c doi`.

- `-b --filter-fields-by-entrytype <required|optional|all>` only add fields that correspond to
  the given entry type in bibtex's data model. Disabled by default. `required`
  only adds required fields, `optional` adds required and optional fields, and
  `all` adds required, optional and non-standard fields (doi, issn and isbn).
  A list of required/optional fields by entry type can be found
  [on the tex stackexchange](https://tex.stackexchange.com/questions/239042/where-can-we-find-a-list-of-all-available-bibtex-entries-and-the-available-fiel)

- `-w --overwrite <field>` or `-W --dont-overwrite <field>`

  Force overwriting of the selected fields. If using `-W author -W journal`
  your force overwrite of all fields except `author` and `journal`. The
  default is to override nothing (only complete absent and blank fields).

  For a more complex example `btac -C doi -w author` means complete all fields
  save DOI, and only overwrite author fields.

  You can also use the `-f` flag to overwrite everything or the `-p` flag to add
  a prefix to new fields, thus avoiding overwrites.

- `-m --mark` and `-M --ignore-mark`

  This is useful to avoid repeated queries if you want to run `btac` many times
  on the same (large) file.

  By default, `btac` ignores any entry with a `BTACqueried` field. `--ignore-mark`
  overrides this behavior.

  When `--mark` is set, `btac` adds a `BTACqueried = {yyyy-mm-dd}` field to each entry
  it queries.

### New field formatting

You can use the following arguments to control how `btac` formats the new fields
- `--fu --escape-unicode` replace unicode symbols by latex escapes sequence (for
  example: replace `é` with `{\'e}`). The default is to keep unicode symbols as is.
- `--fp --protect-uppercase <field>` or `--FP --dont-protect-uppercase <field>` or
  `--fpa --protect-all-uppercase`, insert braces around words containing uppercase
  letters in the given fields to ensure bibtex will preserve them. The three
  arguments are mutually exclusive, and the first two can be used multiple times
  to select/deselect multiple fields.


### Global output formatting

Unfortunately [bibtexparser](https://pypi.org/project/bibtexparser/) doesn't
preserve format information, so this script will reformat your BibTeX file. Here
are a few options you can use to control the output format:

- `--fa --align-values` pad field names to align all values

  ```bibtex
  @article{Example,
    author = {Someone},
    doi    = {10.xxxx/yyyyy},
  }
  ```

- `--fc --comma-first` use comma first syntax

  ```bibtex
  @article{Example
    , author = {Someone}
    , doi = {10.xxxx/yyyyy}
    ,
  }
  ```

- `--fl --no-trailing-comma` don't add the last trailing comma
- `--fi --indent <space>` space used for indentation, default is a tab.
  Can be specified as a number (number of spaces) or a string with spaces
  and `_`, `t`, and `n` characters to mark space, tabs and newlines.

### Optional flags

- `-p --prefix` Write new fields with a prefix. The script will add `BTACtitle =
  ...` instead of `title = ...` in the bib file. This can be combined with `-f`
  to safely show info for already present fields.

  Note that this can overwrite existing fields starting with `BTACxxxx`, even
  without the `-f` option.
- `-f --force-overwrite` Overwrite already present fields. The default is to
  overwrite a field only if it is empty or absent
- `-D --diff` only print the new fields in the output file. In this mode, old
  fields are removed and entries with no new fields are deleted. This cannot be
  used with the `-i --inplace` flag for safety reasons. If you really want to overwrite
  your input file (and delete a bunch of data in the process), you can do so with
  by specifying it explicitly via the `-o --output` option.

- `-t --timeout <float>` set timeout on request in seconds, default: 20.0 s,
  increase this if you are getting a lot of timeouts. Set it to -1 for no timeout.
- `-S --ignore-ssl` bypass SSL verification. Use this if you encounter the error:
  ```
  [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)
  ```
  Another (better) fix for this is to run `pip install --upgrade certifi` to update python's certificates.

- `-d --dump-data <file.json>` writes matching entries to the given JSON files.

  This allows to see duplicate fields from different sources that are otherwise overwritten when merged into a single entry.

  The JSON file will have the following structure:

  ```json
  [
    {
      "entry": "<entry_id>",
      "new-fields": 8,
      "crossref": {
        "query-url": "https://api.crossref.org/...",
        "query-response-time": 0.556,
        "query-response-status": 200,
        "author" : "Lastname, Firstnames and Lastname, Firstnames ...",
        "title" : "super interesting article!",
        "..." : "..."
      },
      "openalex": ...,
      "arxiv": null, // null when no match found
      "unpaywall": ...,
      "dblp": ...,
      "researchr": ...,
      "inspire": ...
    },
    ...
  ]
  ```

- `-v --verbose` verbose mode shows more info. It details entries as they are
  being processed and shows a summary of new fields and their source at the end.
  Using it more than once prints debug info (up to four times).

  Verbose mode looks like this:

  ![verbose-output.png](https://raw.githubusercontent.com/dlesbre/bibtex-autocomplete/master/imgs/btac-verbose.png)
- `-s --silent` hide info and progress bar. Keep showing warnings and errors.
  Use twice to also hide warnings, thrice to also hide errors and four times to
  also hide critical errors, effectively killing all output.
- `-n --no-color` don't use ANSI codes to color and stylize output

- `--version` show version number
- `-h --help` show help

## Credit and license

This project was first inspired by the solution provided by
[thando](https://tex.stackexchange.com/users/182467/thando) in this
[TeX stack exchange post](https://tex.stackexchange.com/questions/6810/automatically-adding-doi-fields-to-a-hand-made-bibliography). I worked on as
part of a course on
[Web data management](https://moodle.r2.enst.fr/moodle/course/view.php?id=142) in
2021-2022 as part of my masters ([MPRI](https://wikimpri.dptinfo.ens-cachan.fr/doku.php)).

This project is free and open-source. It is distributed under terms of the
[MIT License](https://choosealicense.com/licenses/mit/). See the
[LICENSE](https://github.com/dlesbre/bibtex-autocomplete/blob/master/LICENSE)
file for more information

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "bibtexautocomplete",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Dorian Lesbre <dorian.lesbre@gmail.com>",
    "keywords": "bibtex, biblatex, latex, autocomplete, btac",
    "author": null,
    "author_email": "Dorian Lesbre <dorian.lesbre@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/46/3e/6db2d834260052c29a321241b33d3f7b208258fe3c390eb68dc712c2c6c4/bibtexautocomplete-1.3.2.tar.gz",
    "platform": null,
    "description": "# Bibtex Autocomplete\n\n[![PyPI version][version-shield]][pypi-link]\n[![PyPI pyversions][pyversion-shield]][pypi-link]\n[![License][license-shield]](https://choosealicense.com/licenses/mit/)\n[![PyPI status][status-shield]][pypi-link]\n[![Downloads][download-shield]](https://pepy.tech/project/bibtexautocomplete)\n\n[![Maintenance][maintain-shield]][commit-link]\n[![Commit][commit-shield]][commit-link]\n[![actions][pipeline-shield]][pipeline-link]\n[![issues][issues-shield]][issues-link]\n[![pull requests][pr-shield]][pr-link]\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10207744.svg)](https://doi.org/10.5281/zenodo.10207744)\n\n[version-shield]:   https://img.shields.io/pypi/v/bibtexautocomplete.svg\n[pyversion-shield]: https://img.shields.io/pypi/pyversions/bibtexautocomplete.svg\n[license-shield]:   https://img.shields.io/pypi/l/bibtexautocomplete.svg\n[status-shield]:    https://img.shields.io/pypi/status/bibtexautocomplete.svg\n[download-shield]:  https://static.pepy.tech/badge/bibtexautocomplete\n[pypi-link]: https://pypi.python.org/pypi/bibtexautocomplete/\n\n[maintain-shield]: https://img.shields.io/badge/Maintained%3F-yes-brightgreen.svg\n[commit-shield]: https://img.shields.io/github/last-commit/dlesbre/bibtex-autocomplete\n[commit-link]: https://github.com/dlesbre/bibtex-autocomplete/graphs/commit-activity\n\n[pipeline-shield]: https://img.shields.io/github/actions/workflow/status/dlesbre/bibtex-autocomplete/python-app.yml?branch=master&label=tests\n[pipeline-link]: https://github.com/dlesbre/bibtex-autocomplete/actions/workflows/python-app.yml\n\n[issues-shield]: https://img.shields.io/github/issues/dlesbre/bibtex-autocomplete\n[issues-link]: https://github.com/dlesbre/bibtex-autocomplete/issues\n\n[pr-shield]: https://img.shields.io/github/issues-pr/dlesbre/bibtex-autocomplete\n[pr-link]: https://github.com/dlesbre/bibtex-autocomplete/pulls\n\n**bibtex-autocomplete** or **btac** is a simple script to autocomplete BibTeX\nbibliographies. It reads a BibTeX file and looks online for any additional data\nto add to each entry. It can quickly generate entries from minimal data (a lone\ntitle is often sufficient to generate a full entry). You can also use it to only\nadd specific fields (like DOIs, or ISSN) to a manually curated bib file.\n\nIt is designed to be as simple to use as possible: just give it a bib file and\nlet **btac** work its magic! It combines multiple sources and runs consistency\nand normalization checks on the added fields (check that URLs lead to a valid\nwebpage, that DOIs exist at https://dx.doi.org/).\n\nIt attempts to complete a BibTeX file by querying the following domains:\n- [openalex.org](https://openalex.org/): ~240 million entries\n- [www.crossref.org](https://www.crossref.org/): ~150 million entries\n- [arxiv.org](https://arxiv.org/): open access archive, ~2.4 million entries\n- [semanticscholar.org](https://www.semanticscholar.org/): ~215 million entries\n- [unpaywall.org](https://unpaywall.org/): database of open access articles, ~48 million entries\n- [dblp.org](https://dblp.org): computer science, ~7 million entries\n- [researchr.org](https://researchr.org/): computer science\n- [inspirehep.net](https://inspirehep.net/): high-energy physics, ~1.5 million entries\n\nBig thanks to all of them for allowing open, easy and well-documented access to\ntheir databases. This project wouldn't be possible without them. You can easily\nnarrow down the list of sources if some aren't relevant using command line options.\n\n### Contents\n\n- [New in version 1.3](#new-in-version-13)\n- [Demo](#demo)\n- [Quick overview](#quick-overview)\n- [Installation](#installation)\n  - [Dependencies](#dependencies)\n- [Usage](#usage)\n- [Command line arguments](#command-line-arguments)\n  - [Specifying output](#specifying-output)\n  - [Query filtering](#query-filtering)\n  - [New field formatting](#new-field-formatting)\n  - [Global output formatting](#global-output-formatting)\n  - [Optional flags](#optional-flags)\n- [Credit and license](#credit-and-license)\n\n## New in version 1.3\n\nAdded OpenAlex and Inspire HEP as sources. Switched to a majority vote between source\nto find new field, along with smart field normalization and comparison. And of course,\nbug fixes!\n\nSee the [changelog](https://github.com/dlesbre/bibtex-autocomplete/blob/master/CHANGELOG.md) for full details.\n\n## Demo\n\n![demo.svg](https://raw.githubusercontent.com/dlesbre/bibtex-autocomplete/2d1a01f5ec94c8af9c2f3c1a810eca51bb4cce74/imgs/demo.svg)\n\n## Quick overview\n\n**How does it find matches?**\n\n`btac` queries the websites using the entry DOI (if known) or its title. So\nentries that don't have one of those two fields *will not* be completed.\n- DOIs are only used if they can be recognized, so the `doi` field should\n  contain \"10.xxxx/yyyy\" or an URL ending with it.\n- Titles should be the full title. They are compared excluding case and\n  punctuation, but titles with missing words will not match.\n- If one or more authors are present, entries with no common authors will not\n  match. Authors are compared using lower case last names only. Be sure to use\n  one of the correct BibTeX formats for the author field:\n  ```bibtex\n  author = {First Last and Last, First and First von Last}\n  ```\n  (see\n  [https://www.bibtex.com/f/author-field/](https://www.bibtex.com/f/author-field/)\n  for full details)\n- If the year is known, entries with different years will also not match.\n\n**Disclaimers**\n\n- There is no guarantee that the script will find matches for your entries, or\n  that the websites will have any data to add to your entries, (or even that the\n  website data is correct, but that's not for me to say...)\n\n- The script is designed to minimize the chance of false positives - that is\n  adding data from another similar-ish entry to your entry. If you find any such\n  false positive please report them using the [issue\n  tracker](https://github.com/dlesbre/bibtex-autocomplete/issues).\n\n**How are entries completed?**\n\nOnce responses from all websites have been found, the script will add fields\nfrom website with the following priority by performing a majority vote among the\nsource. To do so it uses smart normalization and merging tactics for each field:\n- Authors (and editors) match if they have same last names and, if both first\n  names present, the first name of one is equal/an abbreviation of the other.\n  Author list match if their intersection is non-empty.\n- ISSN and ISBN are normalized their check digits verified. ISBN are converted\n  to their 13 digit representation\n- URL and DOI are checked for valid format, and further validated by querying\n  them online to ensure they exist\n- Many fields match with abbreviation detection (journal, institution, booktitle,\n  organization, publisher, school and series). So `ACM` will match\n  `Association for Computer Machinery`\n- Pages are normalized to use `--` as separator\n- All other fields are compared excluding case and punctuation.\n\nThe script will not overwrite any user given non-empty fields, unless the\n`-f/--force-overwrite` flag is given. If you want to check what fields are\nadded, you can use `-v/--verbose` to have them printed to stdout (with\nsource information), or `-p/--prefix` to have the new fields be prefixed with\n`BTAC` in the output file.\n\n## Installation\n\nCan be installed with [pip](https://pypi.org/project/pip/) :\n\n```console\npip install bibtexautocomplete\n```\n\nYou should now be able to run the script using either command:\n\n```console\nbtac --version\npython3 -m bibtexautocomplete --version\n```\n\n**Note:** `pip` no longer allows installing scripts globally in systems with other\npackage managers (like most Linux distros). You can install the script locally in\na [virtual environment](https://docs.python.org/3/library/venv.html) or globally\nusing [pipx](https://pipx.pypa.io/stable/):\n\n```console\nsudo apt install pipx\npipx install bibtexautocomplete\n```\n\n### Dependencies\n\nThis package has two dependencies (automatically installed by pip) :\n\n- [bibtexparser](https://bibtexparser.readthedocs.io/) (<2.0.0)\n- [alive_progress](https://github.com/rsalmei/alive-progress) (>= 3.0.0) for the fancy progress bar\n\n## Usage\n\nThe command line tool can be used as follows:\n```console\nbtac [--flags] <input_files>\n```\n\n**Examples :**\n\n- `btac my/db.bib` : reads from `./my/db.bib`, writes to `./my/db.btac.bib`.\n  A different output file can be specified with `-o`.\n- `btac -i db.bib` : reads from `db.bib` and overwrites it (inplace flag).\n  Avoid on non backed-up/version-controlled files, I'd hate it if my script\n  corrupted your data.\n- `btac folder` : reads from all files ending with `.bib` in folder. Excludes\n  `.btac.bib` files unless they are the only `.bib` files present. Writes to\n  `folder/file.btac.bib` unless inplace flag is set.\n- `btac` with no inputs is same as `btac .`, reads file from current working directory\n- `btac -c doi ...` only completes DOI fields, leave others unchanged\n- `btac -v ...` verbose mode, pretty prints all new fields when done.\n  See [this image](https://raw.githubusercontent.com/dlesbre/bibtex-autocomplete/master/imgs/btac-verbose.png) for a preview of verbose output.\n\n**Note:** the [parser](https://pypi.org/project/bibtexparser/) doesn't preserve\nformat information, so this script will reformat your files. Some [formatting\noptions](#output-formatting) are provided to control output format.\n\n**Slow responses:** Sometimes due to server traffic, a source DB may take significantly longer\nto respond and slow `btac`.\n- You can increase timeout with `btac ... -t 60` (60s) or `btac ... -t -1` (no timeout)\n- You can disable queries to the offender `btac ... -Q <website>`\n- You can try again at another time\n\n## Command line arguments\n\nAs `btac` has a lot of option I'd recommend setting up an alias if you use a lot\nregularly.\n\n### Specifying output\n\n- `-o --output <file.bib>`\n\n  Write output to given file. Can be used multiple times when also giving\n  multiple inputs. Maps inputs to outputs in order. If there are extra inputs,\n  uses default name (`old_name.btac.bib`). Ignored in inplace (`-i`) mode.\n\n  For example `btac db1.bib db2.bib db3.bib -o out1.bib -o out2.bib` reads `db1.bib`,\n  `db2.bib` and `db3.bib`, and write their outputs to `out1.bib`, `out2.bib`\n  and `db3.btac.bib` respectively.\n\n- `-i --inplace` Modify input files inplace, ignores any specified output files.\n  Avoid on non backed-up/version-controlled files, I'd hate it if my script\n  corrupted your data.\n\n- `-O --no-output` don't write any output files (except the one specified by `--dump-data`)\n  can be used with `-v/--verbose` mode to only print a list of changes to the terminal\n\n### Query filtering\n\n- `-q --only-query <site>` or `-Q --dont-query <site>`\n\n  Restrict which websites to query from. `<site>` must be one of: `openalex`,\n  `crossref`, `arxiv`, `s2`, `unpaywall`, `dblp`, `researchr`, `inspire`. These arguments\n  can be used multiple times, for example to only query Crossref and DBLP use\n  `-q crossref -q dblp` or\n  `-Q openalex -Q researchr -Q unpaywall -Q arxiv -Q s2 -Q inspire`\n\n- `-e --only-entry <id>` or `-E --exclude-entry <id>`\n\n  Restrict which entries should be autocompleted. `<id>` is the entry ID used in\n  your BibTeX file (e.g. `@inproceedings{<id> ... }`). These arguments can also\n  be used multiple times to select only/exclude multiple entries\n\n- `-c --only-complete <field>` or `-C --dont-complete <field>`\n\n  Restrict which fields you wish to autocomplete. Field is a BibTeX field (e.g.\n  `author`, `doi`,...). So if you only wish to add missing DOIs use `-c doi`.\n\n- `-b --filter-fields-by-entrytype <required|optional|all>` only add fields that correspond to\n  the given entry type in bibtex's data model. Disabled by default. `required`\n  only adds required fields, `optional` adds required and optional fields, and\n  `all` adds required, optional and non-standard fields (doi, issn and isbn).\n  A list of required/optional fields by entry type can be found\n  [on the tex stackexchange](https://tex.stackexchange.com/questions/239042/where-can-we-find-a-list-of-all-available-bibtex-entries-and-the-available-fiel)\n\n- `-w --overwrite <field>` or `-W --dont-overwrite <field>`\n\n  Force overwriting of the selected fields. If using `-W author -W journal`\n  your force overwrite of all fields except `author` and `journal`. The\n  default is to override nothing (only complete absent and blank fields).\n\n  For a more complex example `btac -C doi -w author` means complete all fields\n  save DOI, and only overwrite author fields.\n\n  You can also use the `-f` flag to overwrite everything or the `-p` flag to add\n  a prefix to new fields, thus avoiding overwrites.\n\n- `-m --mark` and `-M --ignore-mark`\n\n  This is useful to avoid repeated queries if you want to run `btac` many times\n  on the same (large) file.\n\n  By default, `btac` ignores any entry with a `BTACqueried` field. `--ignore-mark`\n  overrides this behavior.\n\n  When `--mark` is set, `btac` adds a `BTACqueried = {yyyy-mm-dd}` field to each entry\n  it queries.\n\n### New field formatting\n\nYou can use the following arguments to control how `btac` formats the new fields\n- `--fu --escape-unicode` replace unicode symbols by latex escapes sequence (for\n  example: replace `\u00e9` with `{\\'e}`). The default is to keep unicode symbols as is.\n- `--fp --protect-uppercase <field>` or `--FP --dont-protect-uppercase <field>` or\n  `--fpa --protect-all-uppercase`, insert braces around words containing uppercase\n  letters in the given fields to ensure bibtex will preserve them. The three\n  arguments are mutually exclusive, and the first two can be used multiple times\n  to select/deselect multiple fields.\n\n\n### Global output formatting\n\nUnfortunately [bibtexparser](https://pypi.org/project/bibtexparser/) doesn't\npreserve format information, so this script will reformat your BibTeX file. Here\nare a few options you can use to control the output format:\n\n- `--fa --align-values` pad field names to align all values\n\n  ```bibtex\n  @article{Example,\n    author = {Someone},\n    doi    = {10.xxxx/yyyyy},\n  }\n  ```\n\n- `--fc --comma-first` use comma first syntax\n\n  ```bibtex\n  @article{Example\n    , author = {Someone}\n    , doi = {10.xxxx/yyyyy}\n    ,\n  }\n  ```\n\n- `--fl --no-trailing-comma` don't add the last trailing comma\n- `--fi --indent <space>` space used for indentation, default is a tab.\n  Can be specified as a number (number of spaces) or a string with spaces\n  and `_`, `t`, and `n` characters to mark space, tabs and newlines.\n\n### Optional flags\n\n- `-p --prefix` Write new fields with a prefix. The script will add `BTACtitle =\n  ...` instead of `title = ...` in the bib file. This can be combined with `-f`\n  to safely show info for already present fields.\n\n  Note that this can overwrite existing fields starting with `BTACxxxx`, even\n  without the `-f` option.\n- `-f --force-overwrite` Overwrite already present fields. The default is to\n  overwrite a field only if it is empty or absent\n- `-D --diff` only print the new fields in the output file. In this mode, old\n  fields are removed and entries with no new fields are deleted. This cannot be\n  used with the `-i --inplace` flag for safety reasons. If you really want to overwrite\n  your input file (and delete a bunch of data in the process), you can do so with\n  by specifying it explicitly via the `-o --output` option.\n\n- `-t --timeout <float>` set timeout on request in seconds, default: 20.0 s,\n  increase this if you are getting a lot of timeouts. Set it to -1 for no timeout.\n- `-S --ignore-ssl` bypass SSL verification. Use this if you encounter the error:\n  ```\n  [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)\n  ```\n  Another (better) fix for this is to run `pip install --upgrade certifi` to update python's certificates.\n\n- `-d --dump-data <file.json>` writes matching entries to the given JSON files.\n\n  This allows to see duplicate fields from different sources that are otherwise overwritten when merged into a single entry.\n\n  The JSON file will have the following structure:\n\n  ```json\n  [\n    {\n      \"entry\": \"<entry_id>\",\n      \"new-fields\": 8,\n      \"crossref\": {\n        \"query-url\": \"https://api.crossref.org/...\",\n        \"query-response-time\": 0.556,\n        \"query-response-status\": 200,\n        \"author\" : \"Lastname, Firstnames and Lastname, Firstnames ...\",\n        \"title\" : \"super interesting article!\",\n        \"...\" : \"...\"\n      },\n      \"openalex\": ...,\n      \"arxiv\": null, // null when no match found\n      \"unpaywall\": ...,\n      \"dblp\": ...,\n      \"researchr\": ...,\n      \"inspire\": ...\n    },\n    ...\n  ]\n  ```\n\n- `-v --verbose` verbose mode shows more info. It details entries as they are\n  being processed and shows a summary of new fields and their source at the end.\n  Using it more than once prints debug info (up to four times).\n\n  Verbose mode looks like this:\n\n  ![verbose-output.png](https://raw.githubusercontent.com/dlesbre/bibtex-autocomplete/master/imgs/btac-verbose.png)\n- `-s --silent` hide info and progress bar. Keep showing warnings and errors.\n  Use twice to also hide warnings, thrice to also hide errors and four times to\n  also hide critical errors, effectively killing all output.\n- `-n --no-color` don't use ANSI codes to color and stylize output\n\n- `--version` show version number\n- `-h --help` show help\n\n## Credit and license\n\nThis project was first inspired by the solution provided by\n[thando](https://tex.stackexchange.com/users/182467/thando) in this\n[TeX stack exchange post](https://tex.stackexchange.com/questions/6810/automatically-adding-doi-fields-to-a-hand-made-bibliography). I worked on as\npart of a course on\n[Web data management](https://moodle.r2.enst.fr/moodle/course/view.php?id=142) in\n2021-2022 as part of my masters ([MPRI](https://wikimpri.dptinfo.ens-cachan.fr/doku.php)).\n\nThis project is free and open-source. It is distributed under terms of the\n[MIT License](https://choosealicense.com/licenses/mit/). See the\n[LICENSE](https://github.com/dlesbre/bibtex-autocomplete/blob/master/LICENSE)\nfile for more information\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2022-2024 Dorian Lesbre  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Script to autocomplete bibtex files by polling online databases",
    "version": "1.3.2",
    "project_urls": {
        "Changelog": "https://github.com/dlesbre/bibtex-autocomplete/blob/master/CHANGELOG.md",
        "Homepage": "https://github.com/dlesbre/bibtex-autocomplete",
        "Issues": "https://github.com/dlesbre/bibtex-autocomplete/issues",
        "Repository": "https://github.com/dlesbre/bibtex-autocomplete.git"
    },
    "split_keywords": [
        "bibtex",
        " biblatex",
        " latex",
        " autocomplete",
        " btac"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bfbcf04ea548f4bc4b10cc71bce448f451d46194ff65b93d6bd7310f6f11b655",
                "md5": "9888bc210933eaa27a0ea86ad5d9ce31",
                "sha256": "61274d36345c671b4fc8dbac16525be8c73253dc72b8bb0407c2e0016629af37"
            },
            "downloads": -1,
            "filename": "bibtexautocomplete-1.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9888bc210933eaa27a0ea86ad5d9ce31",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 71831,
            "upload_time": "2024-04-12T21:41:50",
            "upload_time_iso_8601": "2024-04-12T21:41:50.650500Z",
            "url": "https://files.pythonhosted.org/packages/bf/bc/f04ea548f4bc4b10cc71bce448f451d46194ff65b93d6bd7310f6f11b655/bibtexautocomplete-1.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "463e6db2d834260052c29a321241b33d3f7b208258fe3c390eb68dc712c2c6c4",
                "md5": "3d757a7c6ef672dd5e0e707e1517f6a4",
                "sha256": "8ae8c74c82f2b03c3bb24e58e9d65bedfb96dbcbaaa46c34502006e7c88c45b0"
            },
            "downloads": -1,
            "filename": "bibtexautocomplete-1.3.2.tar.gz",
            "has_sig": false,
            "md5_digest": "3d757a7c6ef672dd5e0e707e1517f6a4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 68989,
            "upload_time": "2024-04-12T21:41:52",
            "upload_time_iso_8601": "2024-04-12T21:41:52.598369Z",
            "url": "https://files.pythonhosted.org/packages/46/3e/6db2d834260052c29a321241b33d3f7b208258fe3c390eb68dc712c2c6c4/bibtexautocomplete-1.3.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-12 21:41:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dlesbre",
    "github_project": "bibtex-autocomplete",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "bibtexautocomplete"
}
        
Elapsed time: 0.23198s