texthooks


Nametexthooks JSON
Version 0.6.8 PyPI version JSON
download
home_pagehttps://github.com/sirosen/texthooks
Summarypre-commit fixers and linters for handling text files
upload_time2024-12-02 19:17:31
maintainerNone
docs_urlNone
authorStephen Rosen
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # texthooks

A collection of `pre-commit` hooks for handling text files.

In particular, hooks for handling unicode characters which may be undesirable
in a repository.

## Usage with pre-commit

To use with `pre-commit`, include this repo and the desired hooks in
`.pre-commit-config.yaml`:

```yaml
- repo: https://github.com/sirosen/texthooks
  rev: 0.6.8
  hooks:
    - id: alphabetize-codeowners
    - id: fix-smartquotes
    - id: fix-ligatures
```

## Standalone Usage

Each hook is usable as a CLI script. Simply

```bash
pip install texthooks
```

and then invoke, e.g.

```bash
fix-smartquotes FILENAME
```

## Hook Summary

| **Hook**                 | **Description**                                  |
| ------------------------ | ------------------------------------------------ |
| `alphabetize-codeowners` | Alphabetize names in CODEOWNERS files.           |
| `fix-smartquotes`        | Replace curly quotes with ASCII quotes.          |
| `fix-spaces`             | Normalize special space markers to ASCII spaces. |
| `fix-ligatures`          | Convert stylistic ligatures to ASCII text.       |
| `forbid-bidi-controls`   | Check for bi-directional text.                   |
| `macro-expand`           | A simple way to write text formatting macros.    |

## Supported Hooks

### `alphabetize-codeowners`

Normalize `CODEOWNERS` files to always list people and teams in the same order
by alphabetizing.

The default hook targets `CODEOWNERS`, `.github/CODEOWNERS`, and
`docs/CODEOWNERS`.

#### Sorts Owners, Not Lines

`alphabetize-codeowners` alphabetizes the lists of *owners* per path.
It does not alphabetize the lines in the file or otherwise sort them.

#### Ignores Comments and Empty Lines

Any comment lines or empty lines should be left unmodified by the hook.

#### Normalizes Whitespace

On the lines which are modified, the hook will normalize the line to have no
leading whitespace, and to separate codeowner names with a single space
character.

### `fix-smartquotes`

This fixes copy-paste from some applications which replace double-quotes with curly
quotes.
It does *not* convert corner brackets, braile quotation marks, or angle
quotation marks. Those characters are not typically the result of copy-paste
errors, so they are allowed.

Low quotation marks vary in usage and meaning by language, and some languages
use quotation marks which are facing "outwards" (opposite facing from english).
For the most part, these and exotic characters (double-prime quotes) are
ignored.

In files with the offending marks, they are replaced and the run is marked as
failed.

#### Overriding Quotation Characters

Two options are available for specifying exactly which characters will be
replaced. For ease of use, they are specified as hex-encoded unicode
codepoints.

Suppose you wanted to *avoid* replacing the "Heavy single comma quotation
mark ornament" (`275C`) and the "Heavy single turned comma quotation mark
ornament" (`275B`) characters. You could override the single quote codepoints
as follows:

```yaml
- repo: https://github.com/sirosen/texthooks
  rev: 0.6.8
  hooks:
    - id: fix-smartquotes
      # replace default single quote chars with this set:
      # apostrophe, fullwidth apostrophe, left single quote, single high
      # reversed-9 quote, right single quote
      args: ["--single-quote-codepoints", "0027,FF07,2018,201B,2019"]
```

### `fix-spaces`

Replace various unicode space characters with `" "`.

This normalizes No-Break Space and similar characters to ensure that your files
render the same way in all contexts. It does not modify newlines, carriage
returns, or other whitespace characters outside of the Space Separator
category.

#### Overriding Space Characters

An option is available for specifying exactly which characters will be
replaced. For ease of use, they are specified as hex-encoded unicode
codepoints.

Suppose you wanted to *only* replace Thin Space (codepoint `2009`).
You could override the space codepoints as follows:

```yaml
- repo: https://github.com/sirosen/texthooks
  rev: 0.6.8
  hooks:
    - id: fix-spaces
      args: ["--separator-codepoints", "2009"]
```

### `fix-ligatures`

Automatically find and replace ligature characters with their ascii equivalents.

This replaces liguatures which may be created by programs like LaTeX for
presentation with their strictly-equivalent ASCII counterparts. For example,
`fi` and `ff` may be ligature-ized.

This hook converts these back into ASCII so that tools like `grep` will behave
as expected.

### `forbid-bidi-controls`

This is checker which forbids the use of unicode bidirectional text control
characters.

These are directional formatting characters which can be used to construct text
with unexpected or unclear semantics. For example, in programming languages
which allow bidirectional text in statements, `"X" = ייִדיש` can be written
with right-to-left reversal to mean that the variable `ייִדיש` is assigned a
value of `"X"`.

### `macro-expand`

Replace simple "macro" strings in text. This fixer is a no-op if no macro
arguments are supplied. Add `--macro` to arguments to do replacements.

For example, convert `issue:NNN` to an issue link in markdown with the
following sample config:

```yaml
- repo: https://github.com/sirosen/texthooks
  rev: 0.6.8
  hooks:
    - id: macro-expand
      args:
        - "--macro"
        - "issue:"
        - '[texthooks#$VALUE](https://github.com/sirosen/texthooks/issues/$VALUE)'
```

## CHANGELOG

### Unreleased

<!-- bumpversion-changelog -->

### 0.6.8

- Inline comments on codeowners lines are no longer ignored and sorted as part
  of the set of path owners. They are preserved and the leading whitespace is
  normalized to two spaces.

### 0.6.7

- Support GitLab section headers in alphabetize-codeowners when
  `--dialect=gitlab` is passed.
- Casefold codeowner names for better unicode sorting. Thanks @adam-moss for
  the PR!

### 0.6.6

- Bugfix for empty line handling in `alphabetize-codeowners`

### 0.6.5

- `alphabetize-codeowners` is now case insensitive and normalizes whitespace
  better. Thanks @kurtmckee for the PR!

### 0.6.4

- Add support for Gitea and GitLab CODEOWNERS files to
  `alphabetize-codeowners`. Thanks @adam-moss for the PR!

### 0.6.3

- Reduce length of hook names for `pre-commit.com` requirements

### 0.6.2

- Minor whitespace bugfix for `alphabetize-codeowners`

### 0.6.1

- Bugfix for `alphabetize-codeowners` stripping ending newlines

### 0.6.0

- Add `alphabetize-codeowners` fixer (ported from
  [sirosen/alphabetize-codeowners](https://github.com/sirosen/alphabetize-codeowners))

### 0.5.0

- Fix a bug in fixers when running on Windows which could cause data to be
  written with the wrong encoding
- Add `-v/--verbose` and `-q/--quiet` flags to tune output verbosity

### 0.4.0

- Add `fix-spaces` fixer

### 0.3.1

- Minor fixes to docstrings

### 0.3.0

- Add the macro-expand fixer

### 0.2.2

- Fix a bug in CLI argument handling for all hooks

### 0.2.1

- Fix a typo in `forbid-bidi-controls` entrypoint

### 0.2.0

- Add the `forbid-bidi-controls` hook
- Adjust the handling of file encodings. Files will be read with UTF-8 encoding
  by default in most cases.

### 0.1.0

- Initial release with `fix-ligatures` and `fix-smartquotes` hooks

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sirosen/texthooks",
    "name": "texthooks",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Stephen Rosen",
    "author_email": "sirosen0@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/4a/72/162988e15fae46efbb0018090c833dddf49e0a8d53b81e86301133ca17e3/texthooks-0.6.8.tar.gz",
    "platform": null,
    "description": "# texthooks\n\nA collection of `pre-commit` hooks for handling text files.\n\nIn particular, hooks for handling unicode characters which may be undesirable\nin a repository.\n\n## Usage with pre-commit\n\nTo use with `pre-commit`, include this repo and the desired hooks in\n`.pre-commit-config.yaml`:\n\n```yaml\n- repo: https://github.com/sirosen/texthooks\n  rev: 0.6.8\n  hooks:\n    - id: alphabetize-codeowners\n    - id: fix-smartquotes\n    - id: fix-ligatures\n```\n\n## Standalone Usage\n\nEach hook is usable as a CLI script. Simply\n\n```bash\npip install texthooks\n```\n\nand then invoke, e.g.\n\n```bash\nfix-smartquotes FILENAME\n```\n\n## Hook Summary\n\n| **Hook**                 | **Description**                                  |\n| ------------------------ | ------------------------------------------------ |\n| `alphabetize-codeowners` | Alphabetize names in CODEOWNERS files.           |\n| `fix-smartquotes`        | Replace curly quotes with ASCII quotes.          |\n| `fix-spaces`             | Normalize special space markers to ASCII spaces. |\n| `fix-ligatures`          | Convert stylistic ligatures to ASCII text.       |\n| `forbid-bidi-controls`   | Check for bi-directional text.                   |\n| `macro-expand`           | A simple way to write text formatting macros.    |\n\n## Supported Hooks\n\n### `alphabetize-codeowners`\n\nNormalize `CODEOWNERS` files to always list people and teams in the same order\nby alphabetizing.\n\nThe default hook targets `CODEOWNERS`, `.github/CODEOWNERS`, and\n`docs/CODEOWNERS`.\n\n#### Sorts Owners, Not Lines\n\n`alphabetize-codeowners` alphabetizes the lists of *owners* per path.\nIt does not alphabetize the lines in the file or otherwise sort them.\n\n#### Ignores Comments and Empty Lines\n\nAny comment lines or empty lines should be left unmodified by the hook.\n\n#### Normalizes Whitespace\n\nOn the lines which are modified, the hook will normalize the line to have no\nleading whitespace, and to separate codeowner names with a single space\ncharacter.\n\n### `fix-smartquotes`\n\nThis fixes copy-paste from some applications which replace double-quotes with curly\nquotes.\nIt does *not* convert corner brackets, braile quotation marks, or angle\nquotation marks. Those characters are not typically the result of copy-paste\nerrors, so they are allowed.\n\nLow quotation marks vary in usage and meaning by language, and some languages\nuse quotation marks which are facing \"outwards\" (opposite facing from english).\nFor the most part, these and exotic characters (double-prime quotes) are\nignored.\n\nIn files with the offending marks, they are replaced and the run is marked as\nfailed.\n\n#### Overriding Quotation Characters\n\nTwo options are available for specifying exactly which characters will be\nreplaced. For ease of use, they are specified as hex-encoded unicode\ncodepoints.\n\nSuppose you wanted to *avoid* replacing the \"Heavy single comma quotation\nmark ornament\" (`275C`) and the \"Heavy single turned comma quotation mark\nornament\" (`275B`) characters. You could override the single quote codepoints\nas follows:\n\n```yaml\n- repo: https://github.com/sirosen/texthooks\n  rev: 0.6.8\n  hooks:\n    - id: fix-smartquotes\n      # replace default single quote chars with this set:\n      # apostrophe, fullwidth apostrophe, left single quote, single high\n      # reversed-9 quote, right single quote\n      args: [\"--single-quote-codepoints\", \"0027,FF07,2018,201B,2019\"]\n```\n\n### `fix-spaces`\n\nReplace various unicode space characters with `\" \"`.\n\nThis normalizes No-Break Space and similar characters to ensure that your files\nrender the same way in all contexts. It does not modify newlines, carriage\nreturns, or other whitespace characters outside of the Space Separator\ncategory.\n\n#### Overriding Space Characters\n\nAn option is available for specifying exactly which characters will be\nreplaced. For ease of use, they are specified as hex-encoded unicode\ncodepoints.\n\nSuppose you wanted to *only* replace Thin Space (codepoint `2009`).\nYou could override the space codepoints as follows:\n\n```yaml\n- repo: https://github.com/sirosen/texthooks\n  rev: 0.6.8\n  hooks:\n    - id: fix-spaces\n      args: [\"--separator-codepoints\", \"2009\"]\n```\n\n### `fix-ligatures`\n\nAutomatically find and replace ligature characters with their ascii equivalents.\n\nThis replaces liguatures which may be created by programs like LaTeX for\npresentation with their strictly-equivalent ASCII counterparts. For example,\n`fi` and `ff` may be ligature-ized.\n\nThis hook converts these back into ASCII so that tools like `grep` will behave\nas expected.\n\n### `forbid-bidi-controls`\n\nThis is checker which forbids the use of unicode bidirectional text control\ncharacters.\n\nThese are directional formatting characters which can be used to construct text\nwith unexpected or unclear semantics. For example, in programming languages\nwhich allow bidirectional text in statements, `\"X\" = \u05d9\u05d9\u05b4\u05d3\u05d9\u05e9` can be written\nwith right-to-left reversal to mean that the variable `\u05d9\u05d9\u05b4\u05d3\u05d9\u05e9` is assigned a\nvalue of `\"X\"`.\n\n### `macro-expand`\n\nReplace simple \"macro\" strings in text. This fixer is a no-op if no macro\narguments are supplied. Add `--macro` to arguments to do replacements.\n\nFor example, convert `issue:NNN` to an issue link in markdown with the\nfollowing sample config:\n\n```yaml\n- repo: https://github.com/sirosen/texthooks\n  rev: 0.6.8\n  hooks:\n    - id: macro-expand\n      args:\n        - \"--macro\"\n        - \"issue:\"\n        - '[texthooks#$VALUE](https://github.com/sirosen/texthooks/issues/$VALUE)'\n```\n\n## CHANGELOG\n\n### Unreleased\n\n<!-- bumpversion-changelog -->\n\n### 0.6.8\n\n- Inline comments on codeowners lines are no longer ignored and sorted as part\n  of the set of path owners. They are preserved and the leading whitespace is\n  normalized to two spaces.\n\n### 0.6.7\n\n- Support GitLab section headers in alphabetize-codeowners when\n  `--dialect=gitlab` is passed.\n- Casefold codeowner names for better unicode sorting. Thanks @adam-moss for\n  the PR!\n\n### 0.6.6\n\n- Bugfix for empty line handling in `alphabetize-codeowners`\n\n### 0.6.5\n\n- `alphabetize-codeowners` is now case insensitive and normalizes whitespace\n  better. Thanks @kurtmckee for the PR!\n\n### 0.6.4\n\n- Add support for Gitea and GitLab CODEOWNERS files to\n  `alphabetize-codeowners`. Thanks @adam-moss for the PR!\n\n### 0.6.3\n\n- Reduce length of hook names for `pre-commit.com` requirements\n\n### 0.6.2\n\n- Minor whitespace bugfix for `alphabetize-codeowners`\n\n### 0.6.1\n\n- Bugfix for `alphabetize-codeowners` stripping ending newlines\n\n### 0.6.0\n\n- Add `alphabetize-codeowners` fixer (ported from\n  [sirosen/alphabetize-codeowners](https://github.com/sirosen/alphabetize-codeowners))\n\n### 0.5.0\n\n- Fix a bug in fixers when running on Windows which could cause data to be\n  written with the wrong encoding\n- Add `-v/--verbose` and `-q/--quiet` flags to tune output verbosity\n\n### 0.4.0\n\n- Add `fix-spaces` fixer\n\n### 0.3.1\n\n- Minor fixes to docstrings\n\n### 0.3.0\n\n- Add the macro-expand fixer\n\n### 0.2.2\n\n- Fix a bug in CLI argument handling for all hooks\n\n### 0.2.1\n\n- Fix a typo in `forbid-bidi-controls` entrypoint\n\n### 0.2.0\n\n- Add the `forbid-bidi-controls` hook\n- Adjust the handling of file encodings. Files will be read with UTF-8 encoding\n  by default in most cases.\n\n### 0.1.0\n\n- Initial release with `fix-ligatures` and `fix-smartquotes` hooks\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "pre-commit fixers and linters for handling text files",
    "version": "0.6.8",
    "project_urls": {
        "Homepage": "https://github.com/sirosen/texthooks"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "74c6121233973fcb66edcc91a47e483c02dd6d73e30b1b67acbcf22faa423780",
                "md5": "ae15f54d6077c22b035b32998a3a5ef2",
                "sha256": "de4e893765ebf827f1a2fe5f5121cb0617bf0d7f405d38fb44f0d70121b12dbb"
            },
            "downloads": -1,
            "filename": "texthooks-0.6.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ae15f54d6077c22b035b32998a3a5ef2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 16120,
            "upload_time": "2024-12-02T19:17:29",
            "upload_time_iso_8601": "2024-12-02T19:17:29.598778Z",
            "url": "https://files.pythonhosted.org/packages/74/c6/121233973fcb66edcc91a47e483c02dd6d73e30b1b67acbcf22faa423780/texthooks-0.6.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4a72162988e15fae46efbb0018090c833dddf49e0a8d53b81e86301133ca17e3",
                "md5": "95c2840a3e96b96132d9c550d2dbbc34",
                "sha256": "1ad03feb0837fe9d3ac271b611b6df2079621a871dee732e52f1b1dfddebbd25"
            },
            "downloads": -1,
            "filename": "texthooks-0.6.8.tar.gz",
            "has_sig": false,
            "md5_digest": "95c2840a3e96b96132d9c550d2dbbc34",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 14777,
            "upload_time": "2024-12-02T19:17:31",
            "upload_time_iso_8601": "2024-12-02T19:17:31.278126Z",
            "url": "https://files.pythonhosted.org/packages/4a/72/162988e15fae46efbb0018090c833dddf49e0a8d53b81e86301133ca17e3/texthooks-0.6.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-02 19:17:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sirosen",
    "github_project": "texthooks",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "tox": true,
    "lcname": "texthooks"
}
        
Elapsed time: 0.33925s