# flachtex
Tools (e.g. [cktex](https://www.nongnu.org/chktex/),
[YaLafi](https://github.com/matze-dd/YaLafi),
[TeXtidote](https://github.com/sylvainhalle/textidote)) for analyzing
LaTeX-documents often only work on single files, making them tedious to use for
complex documents. The purpose of _flachtex_ is to preprocess even complicated
LaTeX-documents such that they can be easily analyzed as a single document. The
important part is that it also provides a data structure to reverse that process
and get the origin of a specific part (allowing to trace issues back to their
source). While there are other tools to flatten LaTeX, they all are neither
capable of dealing with complex imports nor do they allow you to trace back to
the origins.
Notable features of _flachtex_ are:
- Flattening of LaTeX-documents with various rules (`\include`, `\input`,
`\subimport` ,`%%FLACHTEX-EXPLICIT-IMPORT[path/to/file]`...).
- Any character in the output can be traced back to its origin.
- Remove comments.
- Remove `\todo{...}`.
- Remove highlights of `\usepackage{changes}`. (This substitution is actually
more robust than the one supplied with the package.)
- Substitute commands defined by `\newcommand`.
- A modular design that allows to add additional rules.
## Installation
_flachtex_ is available via pip: `pip install flachtex`.
## Example
Let us look on a quick example that shows the power of the tool. We have a
LaTeX-document consisting of three files.
_main.tex_
```tex
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{amsmath,amssymb,amsfonts,amsthm}
\usepackage{todonotes}
\usepackage{xspace}
\newcommand{\importantterm}{\emph{ImportantTerm}\xspace}
%%FLACHTEX-SKIP-START
Technicalities (e.g., configuration of Journal-template) that we want to skip.
%%FLACHTEX-SKIP-STOP
\begin{document}
\section{Introduction}
\todo[inline]{This TODO will not be shown because we don't want to analyze it.}
Let us use \importantterm here.
% including part_a with 'input' and without extension
\input{./part_a}
% including part_b with 'include' and with extension
\include{./part_b.tex}
\end{document}
```
_part_a.tex_
```tex
\subsection{Part A}
This is Part A. We can also use \importantterm here.
```
_part_b.tex_
```tex
\subsection{Part B}
And Part B.
```
_flachtex_ can create the following output for us that is much easier to
analyze.
```tex
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{amsmath,amssymb,amsfonts,amsthm}
\usepackage{todonotes}
\usepackage{xspace}
\newcommand{\importantterm}{\emph{ImportantTerm}\xspace}
\begin{document}
\section{Introduction}
Let us use \emph{ImportantTerm}\xspace here.
\subsection{Part A}
This is Part A. We can also use \emph{ImportantTerm}\xspace here.
\subsection{Part B}
And Part B.
\end{document}
```
(currently, _flachtex_ will actually add some redundant empty lines, but those
usually do no harm and could be easily eliminated by some simple
postprocessing.)
## Usage
### CLI
_flachtex_ comes with a simple CLI, if you don't want to use it via Python.
```
usage: flachtex [-h] [--to_json] [--comments] [--attach] [--changes]
[--changes_prefix] [--todos] [--newcommand]
path
flachtex: Traceable LaTeX flattening.
positional arguments:
path Path to main.tex
options:
-h, --help show this help message and exit
--to_json Return a json.
--comments Remove comments.
--attach Attach sources to json.
--changes Replace the commands of the changes package.
--changes_prefix Use the prefix option in changes.
--todos Remove todo-notes.
--newcommand Automatically substitute custom commands.
```
### Python
```python
from flachtex import Preprocessor, remove_comments
from flachtex.rules import TodonotesRule
# basic usage
preprocessor = Preprocessor("/path/to/latex_document/")
preprocessor.skip_rules.append(TodonotesRule()) # remove todos
doc = preprocessor.expand_file("main.tex")
# remove the comments (optional)
doc = remove_comments(doc)
# The document can be read as a string (but contains also further information)
print(f"The process LaTeX-document is {doc}")
# Get the used files
for f, data in preprocessor.structure.items():
print(
f"Used file {f} which contains the content '{data['content']}' and includes"
f" the files {data['includes']}."
)
# query origin
origin_file, pos = doc.get_origin_of_line(line=3, col=6)
print(
f"The seventh character of the fourth line origins from file {origin_file}:{pos}."
)
origin_file, pos = doc.get_origin(5)
print(f"The sixth character origins from file {origin_file}:{pos}.")
```
## Features
### Flatten LaTeX-documents
Currently, _flachtex_ supports file inclusions of the following form:
```
% native includes/inputs
\include{path/file.tex}
\input{path/file.tex}
% subimport
\subimport{path}{file}
\subimport*{path}{file}
% manual import
%%FLACHTEX-EXPLICIT-IMPORT[path/to/file]
%%FLACHTEX-SKIP-START
Complex import logic that cannot be parsed by flachtex.
%%FLACHTEX-SKIP-STOP
```
### Path Resolution
_flachtex_ will first try to resolve the inclusion relative to the calling file.
If no file is found (also trying with additional ".tex"), it tries the document
folder (cwd) and the folder of the root tex-file. Afterwards, it tries the
parent directories.
If this is not sufficient, try to use the
`%%FLACHTEX-EXPLICIT-IMPORT[path/file.tex]` option.
### Extending the tool
_flachtex_ has a modular structure that allows it to receive additional rules or
replace existing ones. You can find the current rules in
[./flachtex/rules](./flachtex/rules).
It is important that the matches do not overlap for SkipRules and ImportRules.
For efficiency, _flachtex_ will first find the matches and only then includes
the files. Overlapping matches would need a complex resolution and my result in
unexpected output. (It would not be too difficult to add some simple resolution
rules instead of simply throwing an exception).
### Usage for cleaning 'changes' of '\usepackage{changes}'
The [changes-package](https://ctan.org/pkg/changes?lang=en) is helpful for
highlighting the changes, which is a good practice, e.g., when writing journal
papers (which usually have to go through one or two reviewing iterations). These
can of course disturb automatic language checkers and they have to be removed in
the end. The script that is attached to the original package unfortunately is
not compatible with some usages (e.g., comments can lead it astray). _flachtex_
is capable of removing the highlights done with _changes_ in a robust way. There
are some nasty ways to trick it, but if you use brackets, it should work fine
and independent of escaped symbols, comments, or line breaks.
### Substitution of \newcommand
It is reasonably common to create your own commands with `\newcommand', e.g.,
for some terms which you may want to change later. If you want to analyze the
tex-document, this can become cumbersome. Thus, _flachtex_ gives you the option
to automatically substitute such commands.
The primary reason I added this functionality to this tool (and not some higher
level tool) is that I also saw that some people define their own \input/\include
commands, which could not be imported easily without this feature.
## Changelog
- **0.3.12** Made parsing of non utf-8 encodings more robust. Some templates you
get have very strange file encodings. You don't always convert them manually
to utf-8.
- **0.3.11** `newcommand` should work reliably with multiple arguments now
(hopefully).
- **0.3.10** Support for `newcommand*` substitution
- **0.3.9**: PEP compliance which may have created problems in environments
without setuptools
- **0.3.8**: Substituting newcommands is no longer enabled by default.
- **0.3.7**: Versions got slightly mixed up. Should be fixed now.
- **0.3.6** bugfix: Using findall instead of finditer.
- **0.3.4** Dealing with `\xspace` in command substitution.
- **0.3.3**
- `FileFinder` now has a default and allows to set a new root.
- Command substitution for commands without parameters made more accurate.
- `from_json` for `TraceableString`
**This tool is still work in progress.**
Raw data
{
"_id": null,
"home_page": "https://github.com/d-krupke/flachtex",
"name": "flachtex",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "LaTeX flatten",
"author": "Dominik Krupke",
"author_email": "krupke@ibr.cs.tu-bs.de",
"download_url": "https://files.pythonhosted.org/packages/97/9c/41e75a56951d2d195726bb960114f09554a7dc2ec41ea47f8d225ef6f5c3/flachtex-0.3.12.tar.gz",
"platform": null,
"description": "# flachtex\n\nTools (e.g. [cktex](https://www.nongnu.org/chktex/),\n[YaLafi](https://github.com/matze-dd/YaLafi),\n[TeXtidote](https://github.com/sylvainhalle/textidote)) for analyzing\nLaTeX-documents often only work on single files, making them tedious to use for\ncomplex documents. The purpose of _flachtex_ is to preprocess even complicated\nLaTeX-documents such that they can be easily analyzed as a single document. The\nimportant part is that it also provides a data structure to reverse that process\nand get the origin of a specific part (allowing to trace issues back to their\nsource). While there are other tools to flatten LaTeX, they all are neither\ncapable of dealing with complex imports nor do they allow you to trace back to\nthe origins.\n\nNotable features of _flachtex_ are:\n\n- Flattening of LaTeX-documents with various rules (`\\include`, `\\input`,\n `\\subimport` ,`%%FLACHTEX-EXPLICIT-IMPORT[path/to/file]`...).\n- Any character in the output can be traced back to its origin.\n- Remove comments.\n- Remove `\\todo{...}`.\n- Remove highlights of `\\usepackage{changes}`. (This substitution is actually\n more robust than the one supplied with the package.)\n- Substitute commands defined by `\\newcommand`.\n- A modular design that allows to add additional rules.\n\n## Installation\n\n_flachtex_ is available via pip: `pip install flachtex`.\n\n## Example\n\nLet us look on a quick example that shows the power of the tool. We have a\nLaTeX-document consisting of three files.\n\n_main.tex_\n\n```tex\n\\documentclass{article}\n\\usepackage[utf8]{inputenc}\n\\usepackage{amsmath,amssymb,amsfonts,amsthm}\n\\usepackage{todonotes}\n\\usepackage{xspace}\n\n\\newcommand{\\importantterm}{\\emph{ImportantTerm}\\xspace}\n\n%%FLACHTEX-SKIP-START\nTechnicalities (e.g., configuration of Journal-template) that we want to skip.\n%%FLACHTEX-SKIP-STOP\n\n\\begin{document}\n\n\\section{Introduction}\n\n\\todo[inline]{This TODO will not be shown because we don't want to analyze it.}\n\nLet us use \\importantterm here.\n\n% including part_a with 'input' and without extension\n\\input{./part_a}\n\n% including part_b with 'include' and with extension\n\\include{./part_b.tex}\n\n\\end{document}\n```\n\n_part_a.tex_\n\n```tex\n\\subsection{Part A}\n\nThis is Part A. We can also use \\importantterm here.\n```\n\n_part_b.tex_\n\n```tex\n\\subsection{Part B}\nAnd Part B.\n```\n\n_flachtex_ can create the following output for us that is much easier to\nanalyze.\n\n```tex\n\\documentclass{article}\n\\usepackage[utf8]{inputenc}\n\\usepackage{amsmath,amssymb,amsfonts,amsthm}\n\\usepackage{todonotes}\n\\usepackage{xspace}\n\n\\newcommand{\\importantterm}{\\emph{ImportantTerm}\\xspace}\n\n\\begin{document}\n\n\\section{Introduction}\n\nLet us use \\emph{ImportantTerm}\\xspace here.\n\n\\subsection{Part A}\n\nThis is Part A. We can also use \\emph{ImportantTerm}\\xspace here.\n\n\\subsection{Part B}\nAnd Part B.\n\n\\end{document}\n```\n\n(currently, _flachtex_ will actually add some redundant empty lines, but those\nusually do no harm and could be easily eliminated by some simple\npostprocessing.)\n\n## Usage\n\n### CLI\n\n_flachtex_ comes with a simple CLI, if you don't want to use it via Python.\n\n```\nusage: flachtex [-h] [--to_json] [--comments] [--attach] [--changes]\n [--changes_prefix] [--todos] [--newcommand]\n path\n\nflachtex: Traceable LaTeX flattening.\n\npositional arguments:\n path Path to main.tex\n\noptions:\n -h, --help show this help message and exit\n --to_json Return a json.\n --comments Remove comments.\n --attach Attach sources to json.\n --changes Replace the commands of the changes package.\n --changes_prefix Use the prefix option in changes.\n --todos Remove todo-notes.\n --newcommand Automatically substitute custom commands.\n```\n\n### Python\n\n```python\nfrom flachtex import Preprocessor, remove_comments\nfrom flachtex.rules import TodonotesRule\n\n# basic usage\npreprocessor = Preprocessor(\"/path/to/latex_document/\")\npreprocessor.skip_rules.append(TodonotesRule()) # remove todos\ndoc = preprocessor.expand_file(\"main.tex\")\n\n# remove the comments (optional)\ndoc = remove_comments(doc)\n\n# The document can be read as a string (but contains also further information)\nprint(f\"The process LaTeX-document is {doc}\")\n\n# Get the used files\nfor f, data in preprocessor.structure.items():\n print(\n f\"Used file {f} which contains the content '{data['content']}' and includes\"\n f\" the files {data['includes']}.\"\n )\n\n# query origin\norigin_file, pos = doc.get_origin_of_line(line=3, col=6)\nprint(\n f\"The seventh character of the fourth line origins from file {origin_file}:{pos}.\"\n)\norigin_file, pos = doc.get_origin(5)\nprint(f\"The sixth character origins from file {origin_file}:{pos}.\")\n```\n\n## Features\n\n### Flatten LaTeX-documents\n\nCurrently, _flachtex_ supports file inclusions of the following form:\n\n```\n% native includes/inputs\n\\include{path/file.tex}\n\\input{path/file.tex}\n\n% subimport\n\\subimport{path}{file}\n\\subimport*{path}{file}\n\n% manual import\n%%FLACHTEX-EXPLICIT-IMPORT[path/to/file]\n%%FLACHTEX-SKIP-START\nComplex import logic that cannot be parsed by flachtex.\n%%FLACHTEX-SKIP-STOP\n```\n\n### Path Resolution\n\n_flachtex_ will first try to resolve the inclusion relative to the calling file.\nIf no file is found (also trying with additional \".tex\"), it tries the document\nfolder (cwd) and the folder of the root tex-file. Afterwards, it tries the\nparent directories.\n\nIf this is not sufficient, try to use the\n`%%FLACHTEX-EXPLICIT-IMPORT[path/file.tex]` option.\n\n### Extending the tool\n\n_flachtex_ has a modular structure that allows it to receive additional rules or\nreplace existing ones. You can find the current rules in\n[./flachtex/rules](./flachtex/rules).\n\nIt is important that the matches do not overlap for SkipRules and ImportRules.\nFor efficiency, _flachtex_ will first find the matches and only then includes\nthe files. Overlapping matches would need a complex resolution and my result in\nunexpected output. (It would not be too difficult to add some simple resolution\nrules instead of simply throwing an exception).\n\n### Usage for cleaning 'changes' of '\\usepackage{changes}'\n\nThe [changes-package](https://ctan.org/pkg/changes?lang=en) is helpful for\nhighlighting the changes, which is a good practice, e.g., when writing journal\npapers (which usually have to go through one or two reviewing iterations). These\ncan of course disturb automatic language checkers and they have to be removed in\nthe end. The script that is attached to the original package unfortunately is\nnot compatible with some usages (e.g., comments can lead it astray). _flachtex_\nis capable of removing the highlights done with _changes_ in a robust way. There\nare some nasty ways to trick it, but if you use brackets, it should work fine\nand independent of escaped symbols, comments, or line breaks.\n\n### Substitution of \\newcommand\n\nIt is reasonably common to create your own commands with `\\newcommand', e.g.,\nfor some terms which you may want to change later. If you want to analyze the\ntex-document, this can become cumbersome. Thus, _flachtex_ gives you the option\nto automatically substitute such commands.\n\nThe primary reason I added this functionality to this tool (and not some higher\nlevel tool) is that I also saw that some people define their own \\input/\\include\ncommands, which could not be imported easily without this feature.\n\n## Changelog\n\n- **0.3.12** Made parsing of non utf-8 encodings more robust. Some templates you\n get have very strange file encodings. You don't always convert them manually\n to utf-8.\n- **0.3.11** `newcommand` should work reliably with multiple arguments now\n (hopefully).\n- **0.3.10** Support for `newcommand*` substitution\n- **0.3.9**: PEP compliance which may have created problems in environments\n without setuptools\n- **0.3.8**: Substituting newcommands is no longer enabled by default.\n- **0.3.7**: Versions got slightly mixed up. Should be fixed now.\n- **0.3.6** bugfix: Using findall instead of finditer.\n- **0.3.4** Dealing with `\\xspace` in command substitution.\n- **0.3.3**\n - `FileFinder` now has a default and allows to set a new root.\n - Command substitution for commands without parameters made more accurate.\n - `from_json` for `TraceableString`\n\n**This tool is still work in progress.**\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A traceable LaTeX flattener.",
"version": "0.3.12",
"project_urls": {
"Homepage": "https://github.com/d-krupke/flachtex"
},
"split_keywords": [
"latex",
"flatten"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "979c41e75a56951d2d195726bb960114f09554a7dc2ec41ea47f8d225ef6f5c3",
"md5": "568e5c1aee63a9bf9f56e00e3ec77ef9",
"sha256": "94157150b4adc89adab5770b60f48b569fdfd5d5150e738dd5d4bf7e1ca6c362"
},
"downloads": -1,
"filename": "flachtex-0.3.12.tar.gz",
"has_sig": false,
"md5_digest": "568e5c1aee63a9bf9f56e00e3ec77ef9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 21492,
"upload_time": "2023-06-08T11:12:02",
"upload_time_iso_8601": "2023-06-08T11:12:02.531853Z",
"url": "https://files.pythonhosted.org/packages/97/9c/41e75a56951d2d195726bb960114f09554a7dc2ec41ea47f8d225ef6f5c3/flachtex-0.3.12.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-08 11:12:02",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "d-krupke",
"github_project": "flachtex",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "flachtex"
}