# Python utilities for Manubot: Manuscripts, open and automated
[![documentation](https://img.shields.io/badge/-Documentation-purple?logo=read-the-docs&logoColor=white&style=for-the-badge)](https://manubot.github.io/manubot/)
[![PyPI](https://img.shields.io/pypi/v/manubot.svg?logo=PyPI&logoColor=white&style=for-the-badge)](https://pypi.org/project/manubot/)
[![GitHub Actions CI Tests Status](https://img.shields.io/github/actions/workflow/status/manubot/manubot/test.yml?branch=main&label=actions&style=for-the-badge&logo=github&logoColor=white)](https://github.com/manubot/manubot/actions)
[![AppVeyor Windows Build Status](https://img.shields.io/appveyor/build/manubot/manubot/main?style=for-the-badge&logo=appveyor&logoColor=white&label=AppVeyor)](https://ci.appveyor.com/project/manubot/manubot/branch/main)
[Manubot](https://manubot.org/ "Manubot homepage") is a workflow and set of tools for the next generation of scholarly publishing.
This repository contains a Python package with several Manubot-related utilities, as described in the [usage section](#usage) below.
Package documentation is available at <https://manubot.github.io/manubot> (auto-generated from the Python source code).
The `manubot cite` command-line interface retrieves and formats bibliographic metadata for user-supplied persistent identifiers like DOIs or PubMed IDs.
The `manubot process` command-line interface prepares scholarly manuscripts for Pandoc consumption.
The `manubot process` command is used by Manubot manuscripts, which are based off the [Rootstock template](https://github.com/manubot/rootstock), to automate several aspects of manuscript generation.
The `manubot ai-revision` command is used to automatically revise a manuscript based on a set of AI-generated suggestions.
See Rootstock's [manuscript usage guide](https://github.com/manubot/rootstock/blob/main/USAGE.md) for more information.
**Note:**
If you want to experience Manubot by editing an existing manuscript, see <https://github.com/manubot/try-manubot>.
If you want to create a new manuscript, see <https://github.com/manubot/rootstock>.
To cite the Manubot project or for more information on its design and history, see:
> **Open collaborative writing with Manubot**<br>
Daniel S. Himmelstein, Vincent Rubinetti, David R. Slochower, Dongbo Hu, Venkat S. Malladi, Casey S. Greene, Anthony Gitter<br>
*PLOS Computational Biology* (2019-06-24) <https://doi.org/c7np><br>
DOI: [10.1371/journal.pcbi.1007128](https://doi.org/10.1371/journal.pcbi.1007128) · PMID: [31233491](https://www.ncbi.nlm.nih.gov/pubmed/31233491) · PMCID: [PMC6611653](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6611653)
The Manubot version of this manuscript is available at <https://greenelab.github.io/meta-review/>.
## Installation
If you are using the `manubot` Python package as part of a manuscript repository, installation of this package is handled though the Rootstock's [environment specification](https://github.com/manubot/rootstock/blob/main/build/environment.yml).
For other use cases, this package can be installed via `pip`.
Install the latest release version [from PyPI](https://pypi.org/project/manubot/):
```sh
pip install --upgrade manubot
```
Or install from the source code on [GitHub](https://github.com/manubot/manubot), using the version specified by a commit hash:
```sh
COMMIT=d2160151e52750895571079a6e257beb6e0b1278
pip install --upgrade git+https://github.com/manubot/manubot@$COMMIT
```
The `--upgrade` argument ensures `pip` updates an existing `manubot` installation if present.
Some functions in this package require [Pandoc](https://pandoc.org/),
which must be [installed](https://pandoc.org/installing.html) separately on the system.
The pandoc-manubot-cite filter depends on Pandoc as well as panflute (a Python package).
Users must install a [compatible version of panflute](https://github.com/sergiocorreia/panflute#supported-pandoc-versions) based on their Pandoc version.
For example, on a system with Pandoc 2.9,
install the appropriate panflute like `pip install panflute==1.12.5`.
## Usage
Installing the python package creates the `manubot` command line program.
Here is the usage information as per `manubot --help`:
<!-- test codeblock contains output of `manubot --help` -->
```
usage: manubot [-h] [--version] {process,cite,webpage,ai-revision} ...
Manubot: the manuscript bot for scholarly writing
options:
-h, --help show this help message and exit
--version show program's version number and exit
subcommands:
All operations are done through subcommands:
{process,cite,webpage,ai-revision}
process process manuscript content
cite citekey to CSL JSON command line utility
webpage deploy Manubot outputs to a webpage directory tree
ai-revision revise manuscript content with language models
```
Note that all operations are done through the following sub-commands.
### Process
The `manubot process` program is the primary interface to using Manubot.
There are two required arguments: `--content-directory` and `--output-directory`, which specify the respective paths to the content and output directories.
The content directory stores the manuscript source files.
Files generated by Manubot are saved to the output directory.
One common setup is to create a directory for a manuscript that contains both the `content` and `output` directory.
Under this setup, you can run the Manubot using:
```sh
manubot process \
--skip-citations \
--content-directory=content \
--output-directory=output
```
See `manubot process --help` for documentation of all command line arguments:
<!-- test codeblock contains output of `manubot process --help` -->
```
usage: manubot process [-h] --content-directory CONTENT_DIRECTORY
--output-directory OUTPUT_DIRECTORY
[--template-variables-path TEMPLATE_VARIABLES_PATH]
--skip-citations [--cache-directory CACHE_DIRECTORY]
[--clear-requests-cache] [--skip-remote]
[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
Process manuscript content to create outputs for Pandoc consumption. Performs
bibliographic processing and templating.
options:
-h, --help show this help message and exit
--content-directory CONTENT_DIRECTORY
Directory where manuscript content files are located.
--output-directory OUTPUT_DIRECTORY
Directory to output files generated by this script.
--template-variables-path TEMPLATE_VARIABLES_PATH
Path or URL of a file containing template variables
for jinja2. Serialization format is inferred from the
file extension, with support for JSON, YAML, and TOML.
If the format cannot be detected, the parser assumes
JSON. Specify this argument multiple times to read
multiple files. Variables can be applied to a
namespace (i.e. stored under a dictionary key) like
`--template-variables-path=namespace=path_or_url`.
Namespaces must match the regex `[a-zA-
Z_][a-zA-Z0-9_]*`.
--skip-citations Skip citation and reference processing. Support for
citation and reference processing has been moved from
`manubot process` to the pandoc-manubot-cite filter.
Therefore this argument is now required. If citation-
tags.tsv is found in content, these tags will be
inserted in the markdown output using the reference-
link syntax for citekey aliases. Appends
content/manual-references*.* paths to Pandoc's
metadata.bibliography field.
--cache-directory CACHE_DIRECTORY
Custom cache directory. If not specified, caches to
output-directory.
--clear-requests-cache
--skip-remote Do not add the rootstock repository to the local git
repository remotes.
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Set the logging level for stderr logging
```
#### Manual references
Manubot has the ability to rely on user-provided reference metadata rather than generating it.
`manubot process` searches the content directory for files containing manually-provided reference metadata that match the glob `manual-references*.*`.
These files are stored in the Pandoc metadata `bibliography` field, such that they can be loaded by `pandoc-manubot-cite`.
### Cite
`manubot cite` is a command line utility to produce bibliographic metadata for citation keys.
The utility either outputs metadata as [CSL JSON items](http://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html#items) or produces formatted references if `--render`.
Citation keys should be in the format `prefix:accession`.
For example, the following example generates Markdown-formatted references for four persistent identifiers:
```shell
manubot cite --format=markdown \
doi:10.1098/rsif.2017.0387 pubmed:29424689 pmc:PMC5640425 arxiv:1806.05726
```
The following [terminal recording](https://asciinema.org/a/205085?speed=2) demonstrates the main features of `manubot cite` (for a slightly outdated version):
![manubot cite demonstration](media/terminal-recordings/manubot-cite-cast.gif)
Additional usage information is available from `manubot cite --help`:
<!-- test codeblock contains output of `manubot cite --help` -->
```
usage: manubot cite [-h] [--output OUTPUT]
[--format {csljson,cslyaml,plain,markdown,docx,html,jats} | --yml | --txt | --md]
[--csl CSL] [--bibliography BIBLIOGRAPHY]
[--no-infer-prefix] [--allow-invalid-csl-data]
[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
citekeys [citekeys ...]
Generate bibliographic metadata in CSL JSON format for one or more citation
keys. Optionally, render metadata into formatted references using Pandoc. Text
outputs are UTF-8 encoded.
positional arguments:
citekeys One or more (space separated) citation keys to
generate bibliographic metadata for.
options:
-h, --help show this help message and exit
--output OUTPUT Specify a file to write output, otherwise default to
stdout.
--format {csljson,cslyaml,plain,markdown,docx,html,jats}
Format to use for output file. csljson and cslyaml
output the CSL data. All other choices render the
references using Pandoc. If not specified, attempt to
infer this from the --output filename extension.
Otherwise, default to csljson.
--yml Short for --format=cslyaml.
--txt Short for --format=plain.
--md Short for --format=markdown.
--csl CSL URL or path with CSL XML style used to style
references (i.e. Pandoc's --csl option). Defaults to
Manubot's style.
--bibliography BIBLIOGRAPHY
File to read manual reference metadata. Specify
multiple times to load multiple files. Similar to
pandoc --bibliography.
--no-infer-prefix Do not attempt to infer the prefix for citekeys
without a known prefix.
--allow-invalid-csl-data
Allow CSL Items that do not conform to the JSON
Schema. Skips CSL pruning.
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Set the logging level for stderr logging
```
### Pandoc filter
This package creates the `pandoc-manubot-cite` Pandoc filter,
providing access to Manubot's cite-by-ID functionality from within a Pandoc workflow.
Options are set via Pandoc metadata fields [listed in the docs](https://manubot.github.io/manubot/reference/manubot/pandoc/cite_filter).
<!-- test codeblock contains output of `pandoc-manubot-cite --help` -->
```
usage: pandoc-manubot-cite [-h] [--input [INPUT]] [--output [OUTPUT]]
target_format
Pandoc filter for citation by persistent identifier. Filters are command-line
programs that read and write a JSON-encoded abstract syntax tree for Pandoc.
Unless you are debugging, run this filter as part of a pandoc command by
specifying --filter=pandoc-manubot-cite.
positional arguments:
target_format output format of the pandoc command, as per Pandoc's --to
option
options:
-h, --help show this help message and exit
--input [INPUT] path read JSON input (defaults to stdin)
--output [OUTPUT] path to write JSON output (defaults to stdout)
```
Other Pandoc filters exist that do something similar:
[`pandoc-url2cite`](https://github.com/phiresky/pandoc-url2cite), [pandoc-url2cite-hs](https://github.com/Aver1y/pandoc-url2cite-hs), &
[`pwcite`](https://github.com/wikicite/wcite#filter-pwcite).
Currently, `pandoc-manubot-cite` supports the most types of persistent identifiers.
We're interested in creating as much compatibility as possible between these filters and their syntaxes.
#### Manual references
Manual references are loaded from the `references` and `bibliography` Pandoc metadata fields.
If a manual reference filename ends with `.json` or `.yaml`, it's assumed to contain CSL Data (i.e. Citation Style Language JSON).
Otherwise, the format is inferred from the extension and converted to CSL JSON using the `pandoc-citeproc --bib2json` [utility](https://github.com/jgm/pandoc-citeproc/blob/master/man/pandoc-citeproc.1.md#convert-mode).
The standard citation key for manual references is inferred from the CSL JSON `id` or `note` field.
When no prefix is provided, such as `doi:`, `url:`, or `raw:`, a `raw:` prefix is automatically added.
If multiple manual reference files load metadata for the same standard citation `id`, precedence is assigned according to descending filename order.
### Webpage
The `manubot webpage` command populates a `webpage` directory with Manubot output files.
<!-- test codeblock contains output of `manubot webpage --help` -->
```
usage: manubot webpage [-h] [--checkout [CHECKOUT]] [--version VERSION]
[--timestamp] [--no-ots-cache | --ots-cache OTS_CACHE]
[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
Update the webpage directory tree with Manubot output files. This command
should be run from the root directory of a Manubot manuscript that follows the
Rootstock layout, containing `output` and `webpage` directories. HTML and PDF
outputs are copied to the webpage directory, which is structured as static
source files for website hosting.
options:
-h, --help show this help message and exit
--checkout [CHECKOUT]
branch to checkout /v directory contents from. For
example, --checkout=upstream/gh-pages. --checkout is
equivalent to --checkout=gh-pages. If --checkout is
ommitted, no checkout is performed.
--version VERSION Used to create webpage/v/{version} directory.
Generally a commit hash, tag, or 'local'. When
omitted, version defaults to the commit hash on CI
builds and 'local' elsewhere.
--timestamp timestamp versioned manuscripts in webpage/v using
OpenTimestamps. Specify this flag to create timestamps
for the current HTML and PDF outputs and upgrade any
timestamps from past manuscript versions.
--no-ots-cache disable the timestamp cache.
--ots-cache OTS_CACHE
location for the timestamp cache (default:
ci/cache/ots).
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Set the logging level for stderr logging
```
### AI-assisted academic authoring
The `manubot ai-revision` command uses large language models from [OpenAI](https://openai.com/api/) to automatically revise a manuscript and suggest text improvements.
<!-- test codeblock contains output of `manubot ai-revision --help` -->
```
usage: manubot ai-revision [-h] --content-directory CONTENT_DIRECTORY
[--config-directory CONFIG_DIRECTORY]
[--model-type MODEL_TYPE]
[--model-kwargs key=value [key=value ...]]
[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
Revise manuscript content using AI models to suggest text improvements.
options:
-h, --help show this help message and exit
--content-directory CONTENT_DIRECTORY
Directory where manuscript content files are located.
--config-directory CONFIG_DIRECTORY
Directory where AI revision configuration files are
located. If unspecified, disables custom
configuration.
--model-type MODEL_TYPE
Model type used to revise the manuscript. Default is
GPT3CompletionModel. It can be any subclass of
manubot_ai_editor.models.ManuscriptRevisionModel
--model-kwargs key=value [key=value ...]
Keyword arguments for the revision model (--model-
type), with format key=value.
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Set the logging level for stderr logging
```
The usual call is:
```
manubot ai-revision --content-directory content/
```
The parameters `--model-type` and `--model-kwargs` are used for debugging purposes.
For example, since the tool splits the text into paragraphs, you might want to see if paragraphs were detected correctly.
The tool incurs a cost when using the OpenAI API, so this could be important to check for text with complicated structure.
```
manubot ai-revision \
--content-directory content/ \
--model-type DummyManuscriptRevisionModel \
--model-kwargs add_paragraph_marks=true
```
## Development
### Environment
Create a development environment using:
```shell
conda create --name manubot-dev --channel conda-forge \
python=3.11 pandoc=2.11.3.1
conda activate manubot-dev # assumes conda >= 4.4
pip install --editable ".[webpage,dev]"
```
### Commands
Below are some common commands used for development.
They assume the working directory is set to the repository's root,
and the conda environment is activated.
```shell
# run the test suite
pytest
# install pre-commit git hooks (once per local clone).
# The pre-commit checks declared in .pre-commit-config.yaml will now
# run on changed files during git commits.
pre-commit install
# run the pre-commit checks (required to pass CI)
pre-commit run --all-files
# commit despite failing pre-commit checks (will fail CI)
git commit --no-verify
# regenerate the README codeblocks for --help messages
python manubot/tests/test_readme.py
# generate the docs
portray as_html --overwrite --output_dir=docs
# process the example testing manuscript
manubot process \
--content-directory=manubot/process/tests/manuscripts/example/content \
--output-directory=manubot/process/tests/manuscripts/example/output \
--skip-citations \
--log-level=INFO
```
### Release instructions
[![PyPI](https://img.shields.io/pypi/v/manubot.svg?logo=PyPI&style=for-the-badge)](https://pypi.org/project/manubot/)
This section is only relevant for project maintainers.
GitHub Actions [deploys](.github/workflows/release.yml) releases to [PyPI](https://pypi.org/project/manubot).
To create a new release, bump the `__version__` in [`manubot/__init__.py`](manubot/__init__.py).
Then, set the `TAG` and `OLD_TAG` environment variables:
```shell
TAG=v$(python setup.py --version)
# fetch tags from the upstream remote
# (assumes upstream is the manubot organization remote)
git fetch --tags upstream main
# get previous release tag, can hardcode like OLD_TAG=v0.3.1
OLD_TAG=$(git describe --tags --abbrev=0)
```
The following commands can help draft release notes:
```shell
# check out a branch for a pull request as needed
git checkout -b "release-$TAG"
# create release notes file if it doesn't exist
touch "release-notes/$TAG.md"
# commit list since previous tag
echo $'\n\nCommits\n-------\n' >> "release-notes/$TAG.md"
git log --oneline --decorate=no --reverse $OLD_TAG..HEAD >> "release-notes/$TAG.md"
# commit authors since previous tag
echo $'\n\nCode authors\n------------\n' >> "release-notes/$TAG.md"
git log $OLD_TAG..HEAD --format='%aN <%aE>' | sort --unique >> "release-notes/$TAG.md"
```
After a commit with the above updates is part of `upstream:main`,
for example after a PR is merged,
use the [GitHub interface](https://github.com/manubot/manubot/releases/new) to create a release with the new "Tag version".
Monitor [GitHub Actions](https://github.com/manubot/manubot/actions?query=workflow%3ARelease) and [PyPI](https://pypi.org/project/manubot/#history) for successful deployment of the release.
## Goals & Acknowledgments
Our goal is to create scholarly infrastructure that encourages open science and assists reproducibility.
Accordingly, we hope for the Manubot software and philosophy to be adopted widely, by both academic and commercial entities.
As such, Manubot is free/libre and open source software (see [`LICENSE.md`](LICENSE.md)).
We would like to thank the contributors and funders whose support makes this project possible.
Specifically, Manubot development has been financially supported by:
- the **Alfred P. Sloan Foundation** in [Grant G-2018-11163](https://sloan.org/grant-detail/8501) to [**@dhimmel**](https://github.com/dhimmel).
- the **Gordon & Betty Moore Foundation** ([**@DDD-Moore**](https://github.com/DDD-Moore)) in [Grant GBMF4552](https://www.moore.org/grant-detail?grantId=GBMF4552) to [**@cgreene**](https://github.com/cgreene).
Raw data
{
"_id": null,
"home_page": "https://github.com/manubot/manubot",
"name": "manubot",
"maintainer": "Daniel Himmelstein, Anthony Gitter",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "contact@manubot.org",
"keywords": "manuscript, markdown, pandoc, publishing, references, citations, csl",
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/dd/ef/8318f343e5ef3ccd6c018bce78ab011d60bccbc7d9cfb23b81372f4b8535/manubot-0.6.1.tar.gz",
"platform": null,
"description": "# Python utilities for Manubot: Manuscripts, open and automated\n\n[![documentation](https://img.shields.io/badge/-Documentation-purple?logo=read-the-docs&logoColor=white&style=for-the-badge)](https://manubot.github.io/manubot/)\n[![PyPI](https://img.shields.io/pypi/v/manubot.svg?logo=PyPI&logoColor=white&style=for-the-badge)](https://pypi.org/project/manubot/)\n\n[![GitHub Actions CI Tests Status](https://img.shields.io/github/actions/workflow/status/manubot/manubot/test.yml?branch=main&label=actions&style=for-the-badge&logo=github&logoColor=white)](https://github.com/manubot/manubot/actions)\n[![AppVeyor Windows Build Status](https://img.shields.io/appveyor/build/manubot/manubot/main?style=for-the-badge&logo=appveyor&logoColor=white&label=AppVeyor)](https://ci.appveyor.com/project/manubot/manubot/branch/main)\n\n\n[Manubot](https://manubot.org/ \"Manubot homepage\") is a workflow and set of tools for the next generation of scholarly publishing.\nThis repository contains a Python package with several Manubot-related utilities, as described in the [usage section](#usage) below.\nPackage documentation is available at <https://manubot.github.io/manubot> (auto-generated from the Python source code).\n\nThe `manubot cite` command-line interface retrieves and formats bibliographic metadata for user-supplied persistent identifiers like DOIs or PubMed IDs.\nThe `manubot process` command-line interface prepares scholarly manuscripts for Pandoc consumption.\nThe `manubot process` command is used by Manubot manuscripts, which are based off the [Rootstock template](https://github.com/manubot/rootstock), to automate several aspects of manuscript generation.\nThe `manubot ai-revision` command is used to automatically revise a manuscript based on a set of AI-generated suggestions.\nSee Rootstock's [manuscript usage guide](https://github.com/manubot/rootstock/blob/main/USAGE.md) for more information.\n\n**Note:**\nIf you want to experience Manubot by editing an existing manuscript, see <https://github.com/manubot/try-manubot>.\nIf you want to create a new manuscript, see <https://github.com/manubot/rootstock>.\n\nTo cite the Manubot project or for more information on its design and history, see:\n\n> **Open collaborative writing with Manubot**<br>\nDaniel S. Himmelstein, Vincent Rubinetti, David R. Slochower, Dongbo Hu, Venkat S. Malladi, Casey S. Greene, Anthony Gitter<br>\n*PLOS Computational Biology* (2019-06-24) <https://doi.org/c7np><br>\nDOI: [10.1371/journal.pcbi.1007128](https://doi.org/10.1371/journal.pcbi.1007128) \u00b7 PMID: [31233491](https://www.ncbi.nlm.nih.gov/pubmed/31233491) \u00b7 PMCID: [PMC6611653](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6611653)\n\nThe Manubot version of this manuscript is available at <https://greenelab.github.io/meta-review/>.\n\n## Installation\n\nIf you are using the `manubot` Python package as part of a manuscript repository, installation of this package is handled though the Rootstock's [environment specification](https://github.com/manubot/rootstock/blob/main/build/environment.yml).\nFor other use cases, this package can be installed via `pip`.\n\nInstall the latest release version [from PyPI](https://pypi.org/project/manubot/):\n\n```sh\npip install --upgrade manubot\n```\n\nOr install from the source code on [GitHub](https://github.com/manubot/manubot), using the version specified by a commit hash:\n\n```sh\nCOMMIT=d2160151e52750895571079a6e257beb6e0b1278\npip install --upgrade git+https://github.com/manubot/manubot@$COMMIT\n```\n\nThe `--upgrade` argument ensures `pip` updates an existing `manubot` installation if present.\n\nSome functions in this package require [Pandoc](https://pandoc.org/),\nwhich must be [installed](https://pandoc.org/installing.html) separately on the system.\nThe pandoc-manubot-cite filter depends on Pandoc as well as panflute (a Python package).\nUsers must install a [compatible version of panflute](https://github.com/sergiocorreia/panflute#supported-pandoc-versions) based on their Pandoc version.\nFor example, on a system with Pandoc 2.9,\ninstall the appropriate panflute like `pip install panflute==1.12.5`.\n\n## Usage\n\nInstalling the python package creates the `manubot` command line program.\nHere is the usage information as per `manubot --help`:\n\n<!-- test codeblock contains output of `manubot --help` -->\n```\nusage: manubot [-h] [--version] {process,cite,webpage,ai-revision} ...\n\nManubot: the manuscript bot for scholarly writing\n\noptions:\n -h, --help show this help message and exit\n --version show program's version number and exit\n\nsubcommands:\n All operations are done through subcommands:\n\n {process,cite,webpage,ai-revision}\n process process manuscript content\n cite citekey to CSL JSON command line utility\n webpage deploy Manubot outputs to a webpage directory tree\n ai-revision revise manuscript content with language models\n```\n\nNote that all operations are done through the following sub-commands.\n\n### Process\n\nThe `manubot process` program is the primary interface to using Manubot.\nThere are two required arguments: `--content-directory` and `--output-directory`, which specify the respective paths to the content and output directories.\nThe content directory stores the manuscript source files.\nFiles generated by Manubot are saved to the output directory.\n\nOne common setup is to create a directory for a manuscript that contains both the `content` and `output` directory.\nUnder this setup, you can run the Manubot using:\n\n```sh\nmanubot process \\\n --skip-citations \\\n --content-directory=content \\\n --output-directory=output\n```\n\nSee `manubot process --help` for documentation of all command line arguments:\n\n<!-- test codeblock contains output of `manubot process --help` -->\n```\nusage: manubot process [-h] --content-directory CONTENT_DIRECTORY\n --output-directory OUTPUT_DIRECTORY\n [--template-variables-path TEMPLATE_VARIABLES_PATH]\n --skip-citations [--cache-directory CACHE_DIRECTORY]\n [--clear-requests-cache] [--skip-remote]\n [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]\n\nProcess manuscript content to create outputs for Pandoc consumption. Performs\nbibliographic processing and templating.\n\noptions:\n -h, --help show this help message and exit\n --content-directory CONTENT_DIRECTORY\n Directory where manuscript content files are located.\n --output-directory OUTPUT_DIRECTORY\n Directory to output files generated by this script.\n --template-variables-path TEMPLATE_VARIABLES_PATH\n Path or URL of a file containing template variables\n for jinja2. Serialization format is inferred from the\n file extension, with support for JSON, YAML, and TOML.\n If the format cannot be detected, the parser assumes\n JSON. Specify this argument multiple times to read\n multiple files. Variables can be applied to a\n namespace (i.e. stored under a dictionary key) like\n `--template-variables-path=namespace=path_or_url`.\n Namespaces must match the regex `[a-zA-\n Z_][a-zA-Z0-9_]*`.\n --skip-citations Skip citation and reference processing. Support for\n citation and reference processing has been moved from\n `manubot process` to the pandoc-manubot-cite filter.\n Therefore this argument is now required. If citation-\n tags.tsv is found in content, these tags will be\n inserted in the markdown output using the reference-\n link syntax for citekey aliases. Appends\n content/manual-references*.* paths to Pandoc's\n metadata.bibliography field.\n --cache-directory CACHE_DIRECTORY\n Custom cache directory. If not specified, caches to\n output-directory.\n --clear-requests-cache\n --skip-remote Do not add the rootstock repository to the local git\n repository remotes.\n --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}\n Set the logging level for stderr logging\n```\n\n#### Manual references\n\nManubot has the ability to rely on user-provided reference metadata rather than generating it.\n`manubot process` searches the content directory for files containing manually-provided reference metadata that match the glob `manual-references*.*`.\nThese files are stored in the Pandoc metadata `bibliography` field, such that they can be loaded by `pandoc-manubot-cite`.\n\n### Cite\n\n`manubot cite` is a command line utility to produce bibliographic metadata for citation keys.\nThe utility either outputs metadata as [CSL JSON items](http://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html#items) or produces formatted references if `--render`.\n\nCitation keys should be in the format `prefix:accession`.\nFor example, the following example generates Markdown-formatted references for four persistent identifiers:\n\n```shell\nmanubot cite --format=markdown \\\n doi:10.1098/rsif.2017.0387 pubmed:29424689 pmc:PMC5640425 arxiv:1806.05726\n```\n\nThe following [terminal recording](https://asciinema.org/a/205085?speed=2) demonstrates the main features of `manubot cite` (for a slightly outdated version):\n\n![manubot cite demonstration](media/terminal-recordings/manubot-cite-cast.gif)\n\nAdditional usage information is available from `manubot cite --help`:\n\n<!-- test codeblock contains output of `manubot cite --help` -->\n```\nusage: manubot cite [-h] [--output OUTPUT]\n [--format {csljson,cslyaml,plain,markdown,docx,html,jats} | --yml | --txt | --md]\n [--csl CSL] [--bibliography BIBLIOGRAPHY]\n [--no-infer-prefix] [--allow-invalid-csl-data]\n [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]\n citekeys [citekeys ...]\n\nGenerate bibliographic metadata in CSL JSON format for one or more citation\nkeys. Optionally, render metadata into formatted references using Pandoc. Text\noutputs are UTF-8 encoded.\n\npositional arguments:\n citekeys One or more (space separated) citation keys to\n generate bibliographic metadata for.\n\noptions:\n -h, --help show this help message and exit\n --output OUTPUT Specify a file to write output, otherwise default to\n stdout.\n --format {csljson,cslyaml,plain,markdown,docx,html,jats}\n Format to use for output file. csljson and cslyaml\n output the CSL data. All other choices render the\n references using Pandoc. If not specified, attempt to\n infer this from the --output filename extension.\n Otherwise, default to csljson.\n --yml Short for --format=cslyaml.\n --txt Short for --format=plain.\n --md Short for --format=markdown.\n --csl CSL URL or path with CSL XML style used to style\n references (i.e. Pandoc's --csl option). Defaults to\n Manubot's style.\n --bibliography BIBLIOGRAPHY\n File to read manual reference metadata. Specify\n multiple times to load multiple files. Similar to\n pandoc --bibliography.\n --no-infer-prefix Do not attempt to infer the prefix for citekeys\n without a known prefix.\n --allow-invalid-csl-data\n Allow CSL Items that do not conform to the JSON\n Schema. Skips CSL pruning.\n --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}\n Set the logging level for stderr logging\n```\n\n### Pandoc filter\n\nThis package creates the `pandoc-manubot-cite` Pandoc filter,\nproviding access to Manubot's cite-by-ID functionality from within a Pandoc workflow.\n\nOptions are set via Pandoc metadata fields [listed in the docs](https://manubot.github.io/manubot/reference/manubot/pandoc/cite_filter).\n\n<!-- test codeblock contains output of `pandoc-manubot-cite --help` -->\n```\nusage: pandoc-manubot-cite [-h] [--input [INPUT]] [--output [OUTPUT]]\n target_format\n\nPandoc filter for citation by persistent identifier. Filters are command-line\nprograms that read and write a JSON-encoded abstract syntax tree for Pandoc.\nUnless you are debugging, run this filter as part of a pandoc command by\nspecifying --filter=pandoc-manubot-cite.\n\npositional arguments:\n target_format output format of the pandoc command, as per Pandoc's --to\n option\n\noptions:\n -h, --help show this help message and exit\n --input [INPUT] path read JSON input (defaults to stdin)\n --output [OUTPUT] path to write JSON output (defaults to stdout)\n```\n\nOther Pandoc filters exist that do something similar:\n[`pandoc-url2cite`](https://github.com/phiresky/pandoc-url2cite), [pandoc-url2cite-hs](https://github.com/Aver1y/pandoc-url2cite-hs), &\n[`pwcite`](https://github.com/wikicite/wcite#filter-pwcite).\nCurrently, `pandoc-manubot-cite` supports the most types of persistent identifiers.\nWe're interested in creating as much compatibility as possible between these filters and their syntaxes.\n\n#### Manual references\n\nManual references are loaded from the `references` and `bibliography` Pandoc metadata fields.\nIf a manual reference filename ends with `.json` or `.yaml`, it's assumed to contain CSL Data (i.e. Citation Style Language JSON).\nOtherwise, the format is inferred from the extension and converted to CSL JSON using the `pandoc-citeproc --bib2json` [utility](https://github.com/jgm/pandoc-citeproc/blob/master/man/pandoc-citeproc.1.md#convert-mode).\nThe standard citation key for manual references is inferred from the CSL JSON `id` or `note` field.\nWhen no prefix is provided, such as `doi:`, `url:`, or `raw:`, a `raw:` prefix is automatically added.\nIf multiple manual reference files load metadata for the same standard citation `id`, precedence is assigned according to descending filename order.\n\n### Webpage\n\nThe `manubot webpage` command populates a `webpage` directory with Manubot output files.\n\n<!-- test codeblock contains output of `manubot webpage --help` -->\n```\nusage: manubot webpage [-h] [--checkout [CHECKOUT]] [--version VERSION]\n [--timestamp] [--no-ots-cache | --ots-cache OTS_CACHE]\n [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]\n\nUpdate the webpage directory tree with Manubot output files. This command\nshould be run from the root directory of a Manubot manuscript that follows the\nRootstock layout, containing `output` and `webpage` directories. HTML and PDF\noutputs are copied to the webpage directory, which is structured as static\nsource files for website hosting.\n\noptions:\n -h, --help show this help message and exit\n --checkout [CHECKOUT]\n branch to checkout /v directory contents from. For\n example, --checkout=upstream/gh-pages. --checkout is\n equivalent to --checkout=gh-pages. If --checkout is\n ommitted, no checkout is performed.\n --version VERSION Used to create webpage/v/{version} directory.\n Generally a commit hash, tag, or 'local'. When\n omitted, version defaults to the commit hash on CI\n builds and 'local' elsewhere.\n --timestamp timestamp versioned manuscripts in webpage/v using\n OpenTimestamps. Specify this flag to create timestamps\n for the current HTML and PDF outputs and upgrade any\n timestamps from past manuscript versions.\n --no-ots-cache disable the timestamp cache.\n --ots-cache OTS_CACHE\n location for the timestamp cache (default:\n ci/cache/ots).\n --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}\n Set the logging level for stderr logging\n```\n\n### AI-assisted academic authoring\n\nThe `manubot ai-revision` command uses large language models from [OpenAI](https://openai.com/api/) to automatically revise a manuscript and suggest text improvements.\n\n<!-- test codeblock contains output of `manubot ai-revision --help` -->\n```\nusage: manubot ai-revision [-h] --content-directory CONTENT_DIRECTORY\n [--config-directory CONFIG_DIRECTORY]\n [--model-type MODEL_TYPE]\n [--model-kwargs key=value [key=value ...]]\n [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]\n\nRevise manuscript content using AI models to suggest text improvements.\n\noptions:\n -h, --help show this help message and exit\n --content-directory CONTENT_DIRECTORY\n Directory where manuscript content files are located.\n --config-directory CONFIG_DIRECTORY\n Directory where AI revision configuration files are\n located. If unspecified, disables custom\n configuration.\n --model-type MODEL_TYPE\n Model type used to revise the manuscript. Default is\n GPT3CompletionModel. It can be any subclass of\n manubot_ai_editor.models.ManuscriptRevisionModel\n --model-kwargs key=value [key=value ...]\n Keyword arguments for the revision model (--model-\n type), with format key=value.\n --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}\n Set the logging level for stderr logging\n```\n\nThe usual call is:\n\n```\nmanubot ai-revision --content-directory content/\n```\n\nThe parameters `--model-type` and `--model-kwargs` are used for debugging purposes.\nFor example, since the tool splits the text into paragraphs, you might want to see if paragraphs were detected correctly.\nThe tool incurs a cost when using the OpenAI API, so this could be important to check for text with complicated structure.\n\n```\nmanubot ai-revision \\\n --content-directory content/ \\\n --model-type DummyManuscriptRevisionModel \\\n --model-kwargs add_paragraph_marks=true\n```\n\n## Development\n\n### Environment\n\nCreate a development environment using:\n\n```shell\nconda create --name manubot-dev --channel conda-forge \\\n python=3.11 pandoc=2.11.3.1\nconda activate manubot-dev # assumes conda >= 4.4\npip install --editable \".[webpage,dev]\"\n```\n\n### Commands\n\nBelow are some common commands used for development.\nThey assume the working directory is set to the repository's root,\nand the conda environment is activated.\n\n```shell\n# run the test suite\npytest\n\n# install pre-commit git hooks (once per local clone).\n# The pre-commit checks declared in .pre-commit-config.yaml will now\n# run on changed files during git commits.\npre-commit install\n\n# run the pre-commit checks (required to pass CI)\npre-commit run --all-files\n\n# commit despite failing pre-commit checks (will fail CI)\ngit commit --no-verify\n\n# regenerate the README codeblocks for --help messages\npython manubot/tests/test_readme.py\n\n# generate the docs\nportray as_html --overwrite --output_dir=docs\n\n# process the example testing manuscript\nmanubot process \\\n --content-directory=manubot/process/tests/manuscripts/example/content \\\n --output-directory=manubot/process/tests/manuscripts/example/output \\\n --skip-citations \\\n --log-level=INFO\n```\n\n### Release instructions\n\n[![PyPI](https://img.shields.io/pypi/v/manubot.svg?logo=PyPI&style=for-the-badge)](https://pypi.org/project/manubot/)\n\nThis section is only relevant for project maintainers.\nGitHub Actions [deploys](.github/workflows/release.yml) releases to [PyPI](https://pypi.org/project/manubot).\n\nTo create a new release, bump the `__version__` in [`manubot/__init__.py`](manubot/__init__.py).\nThen, set the `TAG` and `OLD_TAG` environment variables:\n\n```shell\nTAG=v$(python setup.py --version)\n\n# fetch tags from the upstream remote\n# (assumes upstream is the manubot organization remote)\ngit fetch --tags upstream main\n\n# get previous release tag, can hardcode like OLD_TAG=v0.3.1\nOLD_TAG=$(git describe --tags --abbrev=0)\n```\n\nThe following commands can help draft release notes:\n\n```shell\n# check out a branch for a pull request as needed\ngit checkout -b \"release-$TAG\"\n\n# create release notes file if it doesn't exist\ntouch \"release-notes/$TAG.md\"\n\n# commit list since previous tag\necho $'\\n\\nCommits\\n-------\\n' >> \"release-notes/$TAG.md\"\ngit log --oneline --decorate=no --reverse $OLD_TAG..HEAD >> \"release-notes/$TAG.md\"\n\n# commit authors since previous tag\necho $'\\n\\nCode authors\\n------------\\n' >> \"release-notes/$TAG.md\"\ngit log $OLD_TAG..HEAD --format='%aN <%aE>' | sort --unique >> \"release-notes/$TAG.md\"\n```\n\nAfter a commit with the above updates is part of `upstream:main`,\nfor example after a PR is merged,\nuse the [GitHub interface](https://github.com/manubot/manubot/releases/new) to create a release with the new \"Tag version\".\nMonitor [GitHub Actions](https://github.com/manubot/manubot/actions?query=workflow%3ARelease) and [PyPI](https://pypi.org/project/manubot/#history) for successful deployment of the release.\n\n## Goals & Acknowledgments\n\nOur goal is to create scholarly infrastructure that encourages open science and assists reproducibility.\nAccordingly, we hope for the Manubot software and philosophy to be adopted widely, by both academic and commercial entities.\nAs such, Manubot is free/libre and open source software (see [`LICENSE.md`](LICENSE.md)).\n\nWe would like to thank the contributors and funders whose support makes this project possible.\nSpecifically, Manubot development has been financially supported by:\n\n- the **Alfred P. Sloan Foundation** in [Grant G-2018-11163](https://sloan.org/grant-detail/8501) to [**@dhimmel**](https://github.com/dhimmel).\n- the **Gordon & Betty Moore Foundation** ([**@DDD-Moore**](https://github.com/DDD-Moore)) in [Grant GBMF4552](https://www.moore.org/grant-detail?grantId=GBMF4552) to [**@cgreene**](https://github.com/cgreene).\n",
"bugtrack_url": null,
"license": "BSD 3-Clause License",
"summary": "\"Python utilities for Manubot: Manuscripts, open and automated\"",
"version": "0.6.1",
"project_urls": {
"Documentation": "https://manubot.github.io/manubot",
"Homepage": "https://manubot.org",
"Publication": "https://greenelab.github.io/meta-review/",
"Source": "https://github.com/manubot/manubot",
"Tracker": "https://github.com/manubot/manubot/issues"
},
"split_keywords": [
"manuscript",
" markdown",
" pandoc",
" publishing",
" references",
" citations",
" csl"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "70ecddd081e646c2a3148dadd8837e72c5f4e85cedad9aa4063dfde9dac3ea58",
"md5": "9e84935fd6448b7d3893fc934d92cb71",
"sha256": "354eaa18baf039d18c6bb961ac932af9ecb99075b6bea4ce7b2755b85e0bb2a3"
},
"downloads": -1,
"filename": "manubot-0.6.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9e84935fd6448b7d3893fc934d92cb71",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 182248,
"upload_time": "2024-07-20T15:07:41",
"upload_time_iso_8601": "2024-07-20T15:07:41.952035Z",
"url": "https://files.pythonhosted.org/packages/70/ec/ddd081e646c2a3148dadd8837e72c5f4e85cedad9aa4063dfde9dac3ea58/manubot-0.6.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ddef8318f343e5ef3ccd6c018bce78ab011d60bccbc7d9cfb23b81372f4b8535",
"md5": "a25589e844a67ca72095616d3651b26e",
"sha256": "de533b96891c32a6378833c097a9331d72f0fe5263f174dbd155bab8f6835aa0"
},
"downloads": -1,
"filename": "manubot-0.6.1.tar.gz",
"has_sig": false,
"md5_digest": "a25589e844a67ca72095616d3651b26e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 175806,
"upload_time": "2024-07-20T15:07:43",
"upload_time_iso_8601": "2024-07-20T15:07:43.561183Z",
"url": "https://files.pythonhosted.org/packages/dd/ef/8318f343e5ef3ccd6c018bce78ab011d60bccbc7d9cfb23b81372f4b8535/manubot-0.6.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-20 15:07:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "manubot",
"github_project": "manubot",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"appveyor": true,
"lcname": "manubot"
}