CheckMyTex


NameCheckMyTex JSON
Version 0.10.4 PyPI version JSON
download
home_pagehttps://github.com/d-krupke/checkmytex
SummaryA simple tool for checking complex LaTeX documents, e.g., dissertations.
upload_time2023-06-12 17:26:30
maintainer
docs_urlNone
authorDominik Krupke
requires_python>=3.7
licenseMIT
keywords latex
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CheckMyTex

A tool to comfortably check complex LaTeX documents, e.g., dissertations, for
common errors. There are already pretty good correction tools for LaTex, e.g.,
[TeXtidote](https://github.com/sylvainhalle/textidote),
[YaLafi](https://github.com/matze-dd/YaLafi) (of which we use the tex2text
engine), or [LaTeXBuddy](https://gitlab.com/LaTeXBuddy/LaTeXBuddy) (in which I
was involved and of which I copied some things), but they had shortcomings with
complex documents, and they also did not fit my workflow. CheckMyTex builds upon
YaLafi, but provides a simple CLI with some additional magic and tricks to deal
with hopefully any document. The primary difference to its main contenders is
the focus on CLI and whitelists.

> :warning: Your terminal needs to support rich (should be most terminals)!

Primary concepts are

- not just listing problems but also their exact locations,
- working on the whole document, not just individual files (because otherwise,
  you forget to check some files),
- lots of predefined rules such as newcommand-substitution, todo-removal, etc.,
- simple extension of further checking modules,
- ability to whitelist found problems and share this whitelist,
- edit the errors directly (in Vim with automatic jump to line), and
- having a single, simple command that you can easily run before every commit.

This tool has a fancy HTML-output (like other tools), but its primary intention
is to be used as CLI:

1. Thanks to colored output, the highlighting works just as nice in CLI as in
   HTML. No need to switch to your browser.
2. The CLI can use your favourite editor (currently, only (n)vim and nano have
   full support) without switching context.

An example output can be seen
[here](https://htmlpreview.github.io/?https://github.com/d-krupke/CheckMyTex/blob/main/example_output.html).
The CLI version looks nearly identical (thanks to
[rich](https://rich.readthedocs.io/en/stable/introduction.html), but you are
iteratively asked how to deal with each problem.

## What does CheckMyTex currently check for you?

- Spelling errors using aspell or
  [pyspellchecker](https://pypi.org/project/pyspellchecker/)
- Grammar errors using [languagetool](https://languagetool.org/)
- LaTeX-smells using [ChkTeX](https://www.nongnu.org/chktex/)
- Raw numbers instead of siunitx ([simple regex](checkmytex/finding/siunitx.py),
  showing you how easy new modules can be added)
- Additional advise from [proselint](https://github.com/amperser/proselint)
- (Correct) usage of cleveref.
- Uniform writing style of NP-hard/complete (this is probably a problem only
  within my community, but it doesn't harm you)

I found this set of tools to be sufficient to find most problems in text and
LaTeX-source, and I am constantly surprised on how well it works.

The sources are detexed before applying grammar or spelling checking using
[YaLafi](https://github.com/matze-dd/YaLafi).

Further checks may be added in the future. I do a lot of collaborative writing
on papers and am constantly confronted with bad LaTeX that I try to detect
automatically.

## Install

You can install CheckMyTex using pip (if you have Python3 installed)

```
pip install checkmytex
```

You additionally need to install [languagetool](https://languagetool.org/) and a
LaTeX-distribution (which should contain ChkTeX). To have a better spell
checker, you should also install aspell and the corresponding dictionaries. All
these should be available via yours systems package manages, e.g. `pacman`,
`apt`, or `brew`.

> :warning: This tool currently only supports Unix (Linux and Mac OS). It could
> work in some windows configurations, but probably you get some unexpected
> behavior due to incompatible system calls.

### Mac

```shell
brew install --cask mactex  # install a tex distribution
brew install languagtool  # install the grammar checker languagetool
brew install aspell  # install a dictionary
```

### Arch Linux

```shell
sudo pacman -S texlive-most languagetool aspell aspell-en
```

## Usage

```bash
checkmytex main.tex
```

CheckMyTex will now guide you through your document and show you all problems,
skipping over good parts. For each problem, you will be asked what to do

```
[s]kip,[S]kip all,[w]hitelist,[I]gnore all,[n]ext file,[e]dit,[l]ook up,[f]ind,[?]:
```

- _skip_ will skip this concrete problem, but ask you again next time you run
  CheckMyTex.
- _Skip all_ will skip this problem and all identical problems, but ask you
  again on the next run of CheckMyTex.
- _whitelist_ will whitelist the problem and never ask you about it again (for
  this document).
- _Ignore_ will ignore all problems that belong to the same rule, but ask you
  again next time you run CheckMyTex.
- _next file_ will jump to the next file.
- _edit_ will open you `$EDITOR` at the location of the problem. It tries to
  keep track of line changes without reprocessing the document.
- _look up_ will google the problem for you (if available). E.g., you can check
  for rare technical terms.
- _find_ allows to search with a regular expression for further occurrences. Use
  this, e.g., to find a uniform spelling.
- _?_ provides further information of the problem. Primarily for debugging and
  fine-tuning.

Whitelisted problems are by default saved in `.whitelist.txt` (document root)
and are human-readable. You can copy it to use also for other documents or
change the path using the `-w` argument with a path when calling CheckMyTex.

This tool will have problems with some areas of you document. You can exclude
these areas by adding lines with `%%PAUSE-CHECKING` and `%%CONTINUE-CHECKING`.
This may be easier than whitelisting all the problems.

The time to check a 300-page dissertation is around a few seconds. A better
spell checking would be available but drastically increase the runtime.

If you want to process the output with another tool, you can export the result
as json using:

```bash
checkmytex --json analysis.json main.tex
```

If you want an HTML-document you can share with your co-authors, use

```bash
checkmytex --html analysis.html main.tex
```

## Extending CheckMyTex

### Finding problems in the LaTeX document

CheckMyTex already comes with a set of very useful tools and rules to find
potential problems.

```python
from checkmytex.finding import (
    Languagetool,
    AspellChecker,
    CheckSpell,
    UniformNpHard,
    Cleveref,
    Proselint,
    SiUnitx,
)
```

You can also easily create your own rule. For example, a common antipattern in
my community is to exclude many lines of text by defining a `\old{..}`-command.
Let us quickly write a rule that detects this behavior, using a simple regular
expression.

Note that `LatexDocument` provides us with all the necessary tools to read
source and compiled latex, as well as trace the problem back to its origin.

```python
import re
from checkmytex.finding import Checker, Problem
from checkmytex import LatexDocument
import typing


class NoOld(Checker):
    def check(self, document: LatexDocument) -> typing.Iterable[Problem]:
        source = document.get_source()
        for match in re.finditer(r"\\old\{", source):
            origin = document.get_simplified_origin_of_source(
                match.start(), match.end()
            )
            context = document.get_source_context(origin)
            long_id = f"NO_OLD:{context}"
            yield Problem(
                origin,
                "Please do not use \\old{! (it confuses highlighting)",
                context=context,
                long_id=long_id,
                tool="CustomNoOld",
                rule="NO_OLD",
            )
```

This is all! Now you can add this rule to the `DocumentAnalyzer` with
`add_checker`. You may want to copy the main.py and build yourself a custom
version that directly includes this rule.

### Filtering patterns of false positives

Some false positives follow some pattern. For example author names are usually
not in any dictionary. The current default already tries to detect if a spelling
error is actually an author name and automatically removes it. You can easily
write such a filter yourself.

By default, CheckMyTex comes with the following filtering rules:

- Spelling errors in `\includegraphics`-paths.
- Spelling errors in labels.
- Spelling errors of words used in the bibliography. This also removes a lot of
  author names from the problem list.
- Spelling errors of author names before a `\cite`
- Problems in the whitelist.
- Ignore words used repeatedly in adjacent sentences (currently only the word
  "problem").
- Words with `\` or `$` in them. They are usually terms and not proper words.

Let us extend these rules: imagine you don't want any errors within an
align-environment shown.

```python
import re
import typing

from checkmytex.filtering import Filter
from checkmytex.finding import Problem
from checkmytex import LatexDocument


class FilterAlign(Filter):
    def __init__(self):
        self._ranges = []

    def prepare(self, document: LatexDocument):
        #  analyze which parts of the source are align-environments using a regular expression
        expr = r"\\begin\{align\}.*?\\end\{align\}"
        source = document.get_source()
        for match in re.finditer(expr, source, re.MULTILINE | re.DOTALL):
            self._ranges.append((match.start(), match.end()))

    def filter(self, problems: typing.Iterable[Problem]) -> typing.Iterable[Problem]:
        for p in problems:
            s_span = p.origin.get_source_span()
            if any(r[0] <= s_span[0] < r[1] for r in self._ranges):
                continue  # problem starts within a previous found range of an align-environment
            yield p
```

We can add this filter similar to a checker to the `DocumentAnalyzer`.

## Other Languages

Other languages are partially supported: You need to create your own main-file
and provide the right language codes for the different tools. An example for
german can be found in [german.py](./examples/german.py).

## Development Status

This tool is still under development but already usable. Just expect some
imperfections. Ideas are welcome.

### TODOs

- Reduce double-whitespace matches. They do not matter in LaTeX. Maybe clean the
  detexed file instead of just disabling the corresponding rules?
- More configuration options. Currently, the best option is to simply build your
  own [main.py](./checkmytex/__main__.py)

## Changes

- 0.10.4: Fixing exception when $EDITOR is not set.
- 0.10.3: Making project slightly more robust.
- 0.10.2: Making project pep compliant.
- 0.10.1: Also the interactive mode is now using rich.
- 0.10.0: Beautiful HTML-output using _rich_. Interactive CLI will follow soon.
- 0.9.0: Fundamental refactoring and JSON-ouput.
- 0.8.1: Fixing problem with text manipulated by commands. All found errors now
  should only span a single line. Solution is ugly and should be improved. For
  now, it is working. .



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/d-krupke/checkmytex",
    "name": "CheckMyTex",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "LaTeX",
    "author": "Dominik Krupke",
    "author_email": "krupke@ibr.cs.tu-bs.de",
    "download_url": "https://files.pythonhosted.org/packages/23/1e/ba83e2850892898ef6845835d8485c407f9466c2ee164d9324f1440768b7/CheckMyTex-0.10.4.tar.gz",
    "platform": null,
    "description": "# CheckMyTex\n\nA tool to comfortably check complex LaTeX documents, e.g., dissertations, for\ncommon errors. There are already pretty good correction tools for LaTex, e.g.,\n[TeXtidote](https://github.com/sylvainhalle/textidote),\n[YaLafi](https://github.com/matze-dd/YaLafi) (of which we use the tex2text\nengine), or [LaTeXBuddy](https://gitlab.com/LaTeXBuddy/LaTeXBuddy) (in which I\nwas involved and of which I copied some things), but they had shortcomings with\ncomplex documents, and they also did not fit my workflow. CheckMyTex builds upon\nYaLafi, but provides a simple CLI with some additional magic and tricks to deal\nwith hopefully any document. The primary difference to its main contenders is\nthe focus on CLI and whitelists.\n\n> :warning: Your terminal needs to support rich (should be most terminals)!\n\nPrimary concepts are\n\n- not just listing problems but also their exact locations,\n- working on the whole document, not just individual files (because otherwise,\n  you forget to check some files),\n- lots of predefined rules such as newcommand-substitution, todo-removal, etc.,\n- simple extension of further checking modules,\n- ability to whitelist found problems and share this whitelist,\n- edit the errors directly (in Vim with automatic jump to line), and\n- having a single, simple command that you can easily run before every commit.\n\nThis tool has a fancy HTML-output (like other tools), but its primary intention\nis to be used as CLI:\n\n1. Thanks to colored output, the highlighting works just as nice in CLI as in\n   HTML. No need to switch to your browser.\n2. The CLI can use your favourite editor (currently, only (n)vim and nano have\n   full support) without switching context.\n\nAn example output can be seen\n[here](https://htmlpreview.github.io/?https://github.com/d-krupke/CheckMyTex/blob/main/example_output.html).\nThe CLI version looks nearly identical (thanks to\n[rich](https://rich.readthedocs.io/en/stable/introduction.html), but you are\niteratively asked how to deal with each problem.\n\n## What does CheckMyTex currently check for you?\n\n- Spelling errors using aspell or\n  [pyspellchecker](https://pypi.org/project/pyspellchecker/)\n- Grammar errors using [languagetool](https://languagetool.org/)\n- LaTeX-smells using [ChkTeX](https://www.nongnu.org/chktex/)\n- Raw numbers instead of siunitx ([simple regex](checkmytex/finding/siunitx.py),\n  showing you how easy new modules can be added)\n- Additional advise from [proselint](https://github.com/amperser/proselint)\n- (Correct) usage of cleveref.\n- Uniform writing style of NP-hard/complete (this is probably a problem only\n  within my community, but it doesn't harm you)\n\nI found this set of tools to be sufficient to find most problems in text and\nLaTeX-source, and I am constantly surprised on how well it works.\n\nThe sources are detexed before applying grammar or spelling checking using\n[YaLafi](https://github.com/matze-dd/YaLafi).\n\nFurther checks may be added in the future. I do a lot of collaborative writing\non papers and am constantly confronted with bad LaTeX that I try to detect\nautomatically.\n\n## Install\n\nYou can install CheckMyTex using pip (if you have Python3 installed)\n\n```\npip install checkmytex\n```\n\nYou additionally need to install [languagetool](https://languagetool.org/) and a\nLaTeX-distribution (which should contain ChkTeX). To have a better spell\nchecker, you should also install aspell and the corresponding dictionaries. All\nthese should be available via yours systems package manages, e.g. `pacman`,\n`apt`, or `brew`.\n\n> :warning: This tool currently only supports Unix (Linux and Mac OS). It could\n> work in some windows configurations, but probably you get some unexpected\n> behavior due to incompatible system calls.\n\n### Mac\n\n```shell\nbrew install --cask mactex  # install a tex distribution\nbrew install languagtool  # install the grammar checker languagetool\nbrew install aspell  # install a dictionary\n```\n\n### Arch Linux\n\n```shell\nsudo pacman -S texlive-most languagetool aspell aspell-en\n```\n\n## Usage\n\n```bash\ncheckmytex main.tex\n```\n\nCheckMyTex will now guide you through your document and show you all problems,\nskipping over good parts. For each problem, you will be asked what to do\n\n```\n[s]kip,[S]kip all,[w]hitelist,[I]gnore all,[n]ext file,[e]dit,[l]ook up,[f]ind,[?]:\n```\n\n- _skip_ will skip this concrete problem, but ask you again next time you run\n  CheckMyTex.\n- _Skip all_ will skip this problem and all identical problems, but ask you\n  again on the next run of CheckMyTex.\n- _whitelist_ will whitelist the problem and never ask you about it again (for\n  this document).\n- _Ignore_ will ignore all problems that belong to the same rule, but ask you\n  again next time you run CheckMyTex.\n- _next file_ will jump to the next file.\n- _edit_ will open you `$EDITOR` at the location of the problem. It tries to\n  keep track of line changes without reprocessing the document.\n- _look up_ will google the problem for you (if available). E.g., you can check\n  for rare technical terms.\n- _find_ allows to search with a regular expression for further occurrences. Use\n  this, e.g., to find a uniform spelling.\n- _?_ provides further information of the problem. Primarily for debugging and\n  fine-tuning.\n\nWhitelisted problems are by default saved in `.whitelist.txt` (document root)\nand are human-readable. You can copy it to use also for other documents or\nchange the path using the `-w` argument with a path when calling CheckMyTex.\n\nThis tool will have problems with some areas of you document. You can exclude\nthese areas by adding lines with `%%PAUSE-CHECKING` and `%%CONTINUE-CHECKING`.\nThis may be easier than whitelisting all the problems.\n\nThe time to check a 300-page dissertation is around a few seconds. A better\nspell checking would be available but drastically increase the runtime.\n\nIf you want to process the output with another tool, you can export the result\nas json using:\n\n```bash\ncheckmytex --json analysis.json main.tex\n```\n\nIf you want an HTML-document you can share with your co-authors, use\n\n```bash\ncheckmytex --html analysis.html main.tex\n```\n\n## Extending CheckMyTex\n\n### Finding problems in the LaTeX document\n\nCheckMyTex already comes with a set of very useful tools and rules to find\npotential problems.\n\n```python\nfrom checkmytex.finding import (\n    Languagetool,\n    AspellChecker,\n    CheckSpell,\n    UniformNpHard,\n    Cleveref,\n    Proselint,\n    SiUnitx,\n)\n```\n\nYou can also easily create your own rule. For example, a common antipattern in\nmy community is to exclude many lines of text by defining a `\\old{..}`-command.\nLet us quickly write a rule that detects this behavior, using a simple regular\nexpression.\n\nNote that `LatexDocument` provides us with all the necessary tools to read\nsource and compiled latex, as well as trace the problem back to its origin.\n\n```python\nimport re\nfrom checkmytex.finding import Checker, Problem\nfrom checkmytex import LatexDocument\nimport typing\n\n\nclass NoOld(Checker):\n    def check(self, document: LatexDocument) -> typing.Iterable[Problem]:\n        source = document.get_source()\n        for match in re.finditer(r\"\\\\old\\{\", source):\n            origin = document.get_simplified_origin_of_source(\n                match.start(), match.end()\n            )\n            context = document.get_source_context(origin)\n            long_id = f\"NO_OLD:{context}\"\n            yield Problem(\n                origin,\n                \"Please do not use \\\\old{! (it confuses highlighting)\",\n                context=context,\n                long_id=long_id,\n                tool=\"CustomNoOld\",\n                rule=\"NO_OLD\",\n            )\n```\n\nThis is all! Now you can add this rule to the `DocumentAnalyzer` with\n`add_checker`. You may want to copy the main.py and build yourself a custom\nversion that directly includes this rule.\n\n### Filtering patterns of false positives\n\nSome false positives follow some pattern. For example author names are usually\nnot in any dictionary. The current default already tries to detect if a spelling\nerror is actually an author name and automatically removes it. You can easily\nwrite such a filter yourself.\n\nBy default, CheckMyTex comes with the following filtering rules:\n\n- Spelling errors in `\\includegraphics`-paths.\n- Spelling errors in labels.\n- Spelling errors of words used in the bibliography. This also removes a lot of\n  author names from the problem list.\n- Spelling errors of author names before a `\\cite`\n- Problems in the whitelist.\n- Ignore words used repeatedly in adjacent sentences (currently only the word\n  \"problem\").\n- Words with `\\` or `$` in them. They are usually terms and not proper words.\n\nLet us extend these rules: imagine you don't want any errors within an\nalign-environment shown.\n\n```python\nimport re\nimport typing\n\nfrom checkmytex.filtering import Filter\nfrom checkmytex.finding import Problem\nfrom checkmytex import LatexDocument\n\n\nclass FilterAlign(Filter):\n    def __init__(self):\n        self._ranges = []\n\n    def prepare(self, document: LatexDocument):\n        #  analyze which parts of the source are align-environments using a regular expression\n        expr = r\"\\\\begin\\{align\\}.*?\\\\end\\{align\\}\"\n        source = document.get_source()\n        for match in re.finditer(expr, source, re.MULTILINE | re.DOTALL):\n            self._ranges.append((match.start(), match.end()))\n\n    def filter(self, problems: typing.Iterable[Problem]) -> typing.Iterable[Problem]:\n        for p in problems:\n            s_span = p.origin.get_source_span()\n            if any(r[0] <= s_span[0] < r[1] for r in self._ranges):\n                continue  # problem starts within a previous found range of an align-environment\n            yield p\n```\n\nWe can add this filter similar to a checker to the `DocumentAnalyzer`.\n\n## Other Languages\n\nOther languages are partially supported: You need to create your own main-file\nand provide the right language codes for the different tools. An example for\ngerman can be found in [german.py](./examples/german.py).\n\n## Development Status\n\nThis tool is still under development but already usable. Just expect some\nimperfections. Ideas are welcome.\n\n### TODOs\n\n- Reduce double-whitespace matches. They do not matter in LaTeX. Maybe clean the\n  detexed file instead of just disabling the corresponding rules?\n- More configuration options. Currently, the best option is to simply build your\n  own [main.py](./checkmytex/__main__.py)\n\n## Changes\n\n- 0.10.4: Fixing exception when $EDITOR is not set.\n- 0.10.3: Making project slightly more robust.\n- 0.10.2: Making project pep compliant.\n- 0.10.1: Also the interactive mode is now using rich.\n- 0.10.0: Beautiful HTML-output using _rich_. Interactive CLI will follow soon.\n- 0.9.0: Fundamental refactoring and JSON-ouput.\n- 0.8.1: Fixing problem with text manipulated by commands. All found errors now\n  should only span a single line. Solution is ugly and should be improved. For\n  now, it is working. .\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A simple tool for checking complex LaTeX documents, e.g., dissertations.",
    "version": "0.10.4",
    "project_urls": {
        "Homepage": "https://github.com/d-krupke/checkmytex"
    },
    "split_keywords": [
        "latex"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "231eba83e2850892898ef6845835d8485c407f9466c2ee164d9324f1440768b7",
                "md5": "5ffa165bcdc9b577bf6db2f3236c5ff3",
                "sha256": "c6f8aaf2de64191c3f547e0806778fbe71966094a1cd3802717d438f11e6eddf"
            },
            "downloads": -1,
            "filename": "CheckMyTex-0.10.4.tar.gz",
            "has_sig": false,
            "md5_digest": "5ffa165bcdc9b577bf6db2f3236c5ff3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 31845,
            "upload_time": "2023-06-12T17:26:30",
            "upload_time_iso_8601": "2023-06-12T17:26:30.830555Z",
            "url": "https://files.pythonhosted.org/packages/23/1e/ba83e2850892898ef6845835d8485c407f9466c2ee164d9324f1440768b7/CheckMyTex-0.10.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-12 17:26:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "d-krupke",
    "github_project": "checkmytex",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "checkmytex"
}
        
Elapsed time: 0.20323s