langchain-utils

Name	langchain-utils JSON
Version	0.8.0 JSON
	download
home_page	https://github.com/tddschn/langchain-utils
Summary	Utilities built upon the langchain library
upload_time	2024-06-05 17:20:17
maintainer	None
docs_url	None
author	Teddy Xinyuan Chen
requires_python	<4.0,>=3.11
license	MIT
keywords	langchain utils llm prompts cli
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # langchain-utils

LangChain Utilities

- [langchain-utils](#langchain-utils)
  - [Prompt generation using LangChain document loaders](#prompt-generation-using-langchain-document-loaders)
    - [Demos](#demos)
    - [`pandocprompt`](#pandocprompt)
    - [`urlprompt`](#urlprompt)
    - [`pdfprompt`](#pdfprompt)
    - [`ytprompt`](#ytprompt)
    - [`textprompt`](#textprompt)
    - [`htmlprompt`](#htmlprompt)
  - [Installation](#installation)
    - [pipx](#pipx)
    - [pip](#pip)
  - [Develop](#develop)

## Prompt generation using LangChain document loaders

Do you find yourself frequently copy-pasting texts from the web / PDFs / other documents into ChatGPT?

If yes, these tools are for you!

Optimized to feed into a chat interface (like ChatGPT) manually in one or multiple (to get around context length limits) goes.

Basically, the prompts generated look like this:

```python
REPLY_OK_IF_YOU_READ_TEMPLATE = '''
Below is {what}, reply "OK" if you read:

"""
{content}
"""
'''.strip()
```

You can feed it directly to a chat interface like ChatGPT, and ask follow up questions about it.

See [`prompts.py`](./langchain_utils/prompts.py) for other variations.

### Demos

- Loading `https://github.com/tddschn/langchain-utils` and copy to clipboard:

<!-- create a video tag with https://user-images.githubusercontent.com/45612704/231729153-341bd962-28cc-40a3-af8b-91e038ccaf6c.mp4 -->

<video src="https://user-images.githubusercontent.com/45612704/231729153-341bd962-28cc-40a3-af8b-91e038ccaf6c.mp4" controls width="100%"></video>

- Load 3 pages of a pdf file, open each part for inspection before copying, and optionally merge 3 pages into 2 prompts that wouldn't go over the `gpt-3.5-turbo`'s context length limit with langchain's `TokenTextSplitter`.

<!-- for https://user-images.githubusercontent.com/45612704/231731553-63cf3cef-a210-4761-8ca3-dd47bedc3393.mp4 -->

<video src="https://user-images.githubusercontent.com/45612704/231731553-63cf3cef-a210-4761-8ca3-dd47bedc3393.mp4" controls width="100%"></video>

### `pandocprompt`

```
$ pandocprompt --help

usage: pandocprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                    [-P PARTS [PARTS ...]] [-r] [-R]
                    [--print-percentage-non-ascii] [-n] [--out OUT] [-C]
                    [-w WHAT] [-M] [--from PANDOC_FROM_FORMAT]
                    [--to PANDOC_TO_FORMAT]
                    [PATH ...]

Get prompts from arbitrary files. You need to have `pandoc` installed and in
$PATH, it will be used to convert source files to desired (hopefully textual)
format. Common use cases: Getting prompts from EPub books or several TeX
files.

positional arguments:
  PATH                  Paths to the text files, or stdin if not provided
                        (default: None)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -C, --from-clipboard  Load text from clipboard (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a
                        document)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)
  --from PANDOC_FROM_FORMAT
                        The format that is passed to -f in pandoc (default:
                        None)
  --to PANDOC_TO_FORMAT
                        The format that is passed to -t in pandoc. gfm-
                        raw_html means GitHub Flavored Markdown with raw HTML
                        stripped. (default: gfm-raw_html)

```
### `urlprompt`

```
$ urlprompt --help

usage: urlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                 [-P PARTS [PARTS ...]] [-r] [-R]
                 [--print-percentage-non-ascii] [-n] [--out OUT] [-w WHAT]
                 [-M] [-j] [-g] [--github-path GITHUB_PATH]
                 [--github-revision GITHUB_REVISION] [--substack]
                 URL

Get a prompt consisting the text content of a webpage

positional arguments:
  URL                   URL to the webpage

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a
                        webpage)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)
  -j, --javascript      Use JavaScript to render the page (default: False)
  -g, --github          Load the raw file from a GitHub URL (default: False)
  --github-path GITHUB_PATH
                        Path to the GitHub file (default: README.md)
  --github-revision GITHUB_REVISION
                        Revision for the GitHub file (default: master)
  --substack            Load from a Substack URL and convert it to Markdown
                        (default: False)

```
### `pdfprompt`

```
$ pdfprompt --help

usage: pdfprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                 [-P PARTS [PARTS ...]] [-r] [-R]
                 [--print-percentage-non-ascii] [-n] [--out OUT]
                 [-p PAGES [PAGES ...]] [-l PAGE_SLICE] [-M] [-w WHAT] [-o]
                 [-O] [-L OCR_LANGUAGE]
                 PDF Path

Get a prompt consisting the text content of a PDF file

positional arguments:
  PDF Path              Path to the PDF file

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -p PAGES [PAGES ...], --pages PAGES [PAGES ...]
                        Only include specified page numbers (default: None)
  -l PAGE_SLICE, --page-slice PAGE_SLICE
                        Use Python slice syntax to select page numbers (e.g.
                        1:3, 1:10:2, etc.) (default: None)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a PDF
                        file)
  -o, --fallback-ocr    Use OCR as fallback if no text detected on page,
                        please set TESSDATA_PREFIX environment variable to the
                        path of your tesseract data directory (default: False)
  -O, --force-ocr       Force OCR on all pages (default: False)
  -L OCR_LANGUAGE, --ocr-language OCR_LANGUAGE
                        Language to use for Tesseract OCR (like eng, chi_sim,
                        chi_tra, chi_tra_vert etc.)) (default: eng)

```
### `ytprompt`

```
$ ytprompt --help

usage: ytprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                [-P PARTS [PARTS ...]] [-r] [-R]
                [--print-percentage-non-ascii] [-n] [--out OUT]
                YouTube URL

Get a prompt consisting Title and Transcript of a YouTube Video

positional arguments:
  YouTube URL           YouTube URL

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)

```
### `textprompt`

```
$ textprompt --help

usage: textprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                  [-P PARTS [PARTS ...]] [-r] [-R]
                  [--print-percentage-non-ascii] [-n] [--out OUT] [-C]
                  [-w WHAT] [-M]
                  [PATH ...]

Get a prompt from text files

positional arguments:
  PATH                  Paths to the text files, or stdin if not provided
                        (default: None)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -C, --from-clipboard  Load text from clipboard (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a
                        document)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)

```
### `htmlprompt`

```
$ htmlprompt --help

usage: htmlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                  [-P PARTS [PARTS ...]] [-r] [-R]
                  [--print-percentage-non-ascii] [-n] [--out OUT] [-C]
                  [-w WHAT] [-M]
                  [PATH ...]

Get a prompt from html files

positional arguments:
  PATH                  Paths to the html files, or stdin if not provided
                        (default: None)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -C, --from-clipboard  Load text from clipboard (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the text content of a
                        html file)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)

```

## Installation

### pipx

This is the recommended installation method.

```
$ pipx install langchain-utils
```

### [pip](https://pypi.org/project/langchain-utils/)

```
$ pip install langchain-utils
```

## Develop

```
$ git clone https://github.com/tddschn/langchain-utils.git
$ cd langchain-utils
$ poetry install
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tddschn/langchain-utils",
    "name": "langchain-utils",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": null,
    "keywords": "langchain, utils, LLM, prompts, CLI",
    "author": "Teddy Xinyuan Chen",
    "author_email": "45612704+tddschn@users.noreply.github.com",
    "download_url": "https://files.pythonhosted.org/packages/f1/7f/93f5a879414f8755811405fd90595d6116b2c9abcd11b355fe6ea8d59824/langchain_utils-0.8.0.tar.gz",
    "platform": null,
    "description": "# langchain-utils\n\nLangChain Utilities\n\n- [langchain-utils](#langchain-utils)\n  - [Prompt generation using LangChain document loaders](#prompt-generation-using-langchain-document-loaders)\n    - [Demos](#demos)\n    - [`pandocprompt`](#pandocprompt)\n    - [`urlprompt`](#urlprompt)\n    - [`pdfprompt`](#pdfprompt)\n    - [`ytprompt`](#ytprompt)\n    - [`textprompt`](#textprompt)\n    - [`htmlprompt`](#htmlprompt)\n  - [Installation](#installation)\n    - [pipx](#pipx)\n    - [pip](#pip)\n  - [Develop](#develop)\n\n## Prompt generation using LangChain document loaders\n\nDo you find yourself frequently copy-pasting texts from the web / PDFs / other documents into ChatGPT?\n\nIf yes, these tools are for you!\n\nOptimized to feed into a chat interface (like ChatGPT) manually in one or multiple (to get around context length limits) goes.\n\nBasically, the prompts generated look like this:\n\n```python\nREPLY_OK_IF_YOU_READ_TEMPLATE = '''\nBelow is {what}, reply \"OK\" if you read:\n\n\"\"\"\n{content}\n\"\"\"\n'''.strip()\n```\n\nYou can feed it directly to a chat interface like ChatGPT, and ask follow up questions about it.\n\nSee [`prompts.py`](./langchain_utils/prompts.py) for other variations.\n\n### Demos\n\n- Loading `https://github.com/tddschn/langchain-utils` and copy to clipboard:\n\n<!-- create a video tag with https://user-images.githubusercontent.com/45612704/231729153-341bd962-28cc-40a3-af8b-91e038ccaf6c.mp4 -->\n\n<video src=\"https://user-images.githubusercontent.com/45612704/231729153-341bd962-28cc-40a3-af8b-91e038ccaf6c.mp4\" controls width=\"100%\"></video>\n\n- Load 3 pages of a pdf file, open each part for inspection before copying, and optionally merge 3 pages into 2 prompts that wouldn't go over the `gpt-3.5-turbo`'s context length limit with langchain's `TokenTextSplitter`.\n\n<!-- for https://user-images.githubusercontent.com/45612704/231731553-63cf3cef-a210-4761-8ca3-dd47bedc3393.mp4 -->\n\n<video src=\"https://user-images.githubusercontent.com/45612704/231731553-63cf3cef-a210-4761-8ca3-dd47bedc3393.mp4\" controls width=\"100%\"></video>\n\n### `pandocprompt`\n\n```\n$ pandocprompt --help\n\nusage: pandocprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]\n                    [-P PARTS [PARTS ...]] [-r] [-R]\n                    [--print-percentage-non-ascii] [-n] [--out OUT] [-C]\n                    [-w WHAT] [-M] [--from PANDOC_FROM_FORMAT]\n                    [--to PANDOC_TO_FORMAT]\n                    [PATH ...]\n\nGet prompts from arbitrary files. You need to have `pandoc` installed and in\n$PATH, it will be used to convert source files to desired (hopefully textual)\nformat. Common use cases: Getting prompts from EPub books or several TeX\nfiles.\n\npositional arguments:\n  PATH                  Paths to the text files, or stdin if not provided\n                        (default: None)\n\noptions:\n  -h, --help            show this help message and exit\n  -V, --version         show program's version number and exit\n  -c, --copy            Copy the prompt to clipboard (default: False)\n  -e, --edit            Edit the prompt and copy manually (default: False)\n  -m model, --model model\n                        Model to use. This only affects the chunk size. Use -S\n                        to disable splitting (infinite chunk size). (default:\n                        gpt-4-32k)\n  -S, --no-split        Do not split the prompt into multiple parts (use this\n                        if the model has a really large context size)\n                        (default: False)\n  -s chunk_size, --chunk-size chunk_size\n                        Chunk size when splitting transcript, also used to\n                        determine whether to split, defaults to 1/2 of the\n                        context length limit of the model (default: None)\n  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]\n                        Parts to select in the processes list of Documents\n                        (default: None)\n  -r, --raw             Wraps the content in triple quotes with no extra text\n                        (default: False)\n  -R, --raw-no-quotes   Output the content only (default: False)\n  --print-percentage-non-ascii\n                        Print percentage of non-ascii characters (default:\n                        False)\n  -n, --dry-run         Dry run (default: False)\n  --out OUT             Output file (default: None)\n  -C, --from-clipboard  Load text from clipboard (default: False)\n  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF\n                        content in the prompt (default: the content of a\n                        document)\n  -M, --merge           Merge contents of all pages before processing\n                        (default: False)\n  --from PANDOC_FROM_FORMAT\n                        The format that is passed to -f in pandoc (default:\n                        None)\n  --to PANDOC_TO_FORMAT\n                        The format that is passed to -t in pandoc. gfm-\n                        raw_html means GitHub Flavored Markdown with raw HTML\n                        stripped. (default: gfm-raw_html)\n\n```\n### `urlprompt`\n\n```\n$ urlprompt --help\n\nusage: urlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]\n                 [-P PARTS [PARTS ...]] [-r] [-R]\n                 [--print-percentage-non-ascii] [-n] [--out OUT] [-w WHAT]\n                 [-M] [-j] [-g] [--github-path GITHUB_PATH]\n                 [--github-revision GITHUB_REVISION] [--substack]\n                 URL\n\nGet a prompt consisting the text content of a webpage\n\npositional arguments:\n  URL                   URL to the webpage\n\noptions:\n  -h, --help            show this help message and exit\n  -V, --version         show program's version number and exit\n  -c, --copy            Copy the prompt to clipboard (default: False)\n  -e, --edit            Edit the prompt and copy manually (default: False)\n  -m model, --model model\n                        Model to use. This only affects the chunk size. Use -S\n                        to disable splitting (infinite chunk size). (default:\n                        gpt-4-32k)\n  -S, --no-split        Do not split the prompt into multiple parts (use this\n                        if the model has a really large context size)\n                        (default: False)\n  -s chunk_size, --chunk-size chunk_size\n                        Chunk size when splitting transcript, also used to\n                        determine whether to split, defaults to 1/2 of the\n                        context length limit of the model (default: None)\n  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]\n                        Parts to select in the processes list of Documents\n                        (default: None)\n  -r, --raw             Wraps the content in triple quotes with no extra text\n                        (default: False)\n  -R, --raw-no-quotes   Output the content only (default: False)\n  --print-percentage-non-ascii\n                        Print percentage of non-ascii characters (default:\n                        False)\n  -n, --dry-run         Dry run (default: False)\n  --out OUT             Output file (default: None)\n  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF\n                        content in the prompt (default: the content of a\n                        webpage)\n  -M, --merge           Merge contents of all pages before processing\n                        (default: False)\n  -j, --javascript      Use JavaScript to render the page (default: False)\n  -g, --github          Load the raw file from a GitHub URL (default: False)\n  --github-path GITHUB_PATH\n                        Path to the GitHub file (default: README.md)\n  --github-revision GITHUB_REVISION\n                        Revision for the GitHub file (default: master)\n  --substack            Load from a Substack URL and convert it to Markdown\n                        (default: False)\n\n```\n### `pdfprompt`\n\n```\n$ pdfprompt --help\n\nusage: pdfprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]\n                 [-P PARTS [PARTS ...]] [-r] [-R]\n                 [--print-percentage-non-ascii] [-n] [--out OUT]\n                 [-p PAGES [PAGES ...]] [-l PAGE_SLICE] [-M] [-w WHAT] [-o]\n                 [-O] [-L OCR_LANGUAGE]\n                 PDF Path\n\nGet a prompt consisting the text content of a PDF file\n\npositional arguments:\n  PDF Path              Path to the PDF file\n\noptions:\n  -h, --help            show this help message and exit\n  -V, --version         show program's version number and exit\n  -c, --copy            Copy the prompt to clipboard (default: False)\n  -e, --edit            Edit the prompt and copy manually (default: False)\n  -m model, --model model\n                        Model to use. This only affects the chunk size. Use -S\n                        to disable splitting (infinite chunk size). (default:\n                        gpt-4-32k)\n  -S, --no-split        Do not split the prompt into multiple parts (use this\n                        if the model has a really large context size)\n                        (default: False)\n  -s chunk_size, --chunk-size chunk_size\n                        Chunk size when splitting transcript, also used to\n                        determine whether to split, defaults to 1/2 of the\n                        context length limit of the model (default: None)\n  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]\n                        Parts to select in the processes list of Documents\n                        (default: None)\n  -r, --raw             Wraps the content in triple quotes with no extra text\n                        (default: False)\n  -R, --raw-no-quotes   Output the content only (default: False)\n  --print-percentage-non-ascii\n                        Print percentage of non-ascii characters (default:\n                        False)\n  -n, --dry-run         Dry run (default: False)\n  --out OUT             Output file (default: None)\n  -p PAGES [PAGES ...], --pages PAGES [PAGES ...]\n                        Only include specified page numbers (default: None)\n  -l PAGE_SLICE, --page-slice PAGE_SLICE\n                        Use Python slice syntax to select page numbers (e.g.\n                        1:3, 1:10:2, etc.) (default: None)\n  -M, --merge           Merge contents of all pages before processing\n                        (default: False)\n  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF\n                        content in the prompt (default: the content of a PDF\n                        file)\n  -o, --fallback-ocr    Use OCR as fallback if no text detected on page,\n                        please set TESSDATA_PREFIX environment variable to the\n                        path of your tesseract data directory (default: False)\n  -O, --force-ocr       Force OCR on all pages (default: False)\n  -L OCR_LANGUAGE, --ocr-language OCR_LANGUAGE\n                        Language to use for Tesseract OCR (like eng, chi_sim,\n                        chi_tra, chi_tra_vert etc.)) (default: eng)\n\n```\n### `ytprompt`\n\n```\n$ ytprompt --help\n\nusage: ytprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]\n                [-P PARTS [PARTS ...]] [-r] [-R]\n                [--print-percentage-non-ascii] [-n] [--out OUT]\n                YouTube URL\n\nGet a prompt consisting Title and Transcript of a YouTube Video\n\npositional arguments:\n  YouTube URL           YouTube URL\n\noptions:\n  -h, --help            show this help message and exit\n  -V, --version         show program's version number and exit\n  -c, --copy            Copy the prompt to clipboard (default: False)\n  -e, --edit            Edit the prompt and copy manually (default: False)\n  -m model, --model model\n                        Model to use. This only affects the chunk size. Use -S\n                        to disable splitting (infinite chunk size). (default:\n                        gpt-4-32k)\n  -S, --no-split        Do not split the prompt into multiple parts (use this\n                        if the model has a really large context size)\n                        (default: False)\n  -s chunk_size, --chunk-size chunk_size\n                        Chunk size when splitting transcript, also used to\n                        determine whether to split, defaults to 1/2 of the\n                        context length limit of the model (default: None)\n  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]\n                        Parts to select in the processes list of Documents\n                        (default: None)\n  -r, --raw             Wraps the content in triple quotes with no extra text\n                        (default: False)\n  -R, --raw-no-quotes   Output the content only (default: False)\n  --print-percentage-non-ascii\n                        Print percentage of non-ascii characters (default:\n                        False)\n  -n, --dry-run         Dry run (default: False)\n  --out OUT             Output file (default: None)\n\n```\n### `textprompt`\n\n```\n$ textprompt --help\n\nusage: textprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]\n                  [-P PARTS [PARTS ...]] [-r] [-R]\n                  [--print-percentage-non-ascii] [-n] [--out OUT] [-C]\n                  [-w WHAT] [-M]\n                  [PATH ...]\n\nGet a prompt from text files\n\npositional arguments:\n  PATH                  Paths to the text files, or stdin if not provided\n                        (default: None)\n\noptions:\n  -h, --help            show this help message and exit\n  -V, --version         show program's version number and exit\n  -c, --copy            Copy the prompt to clipboard (default: False)\n  -e, --edit            Edit the prompt and copy manually (default: False)\n  -m model, --model model\n                        Model to use. This only affects the chunk size. Use -S\n                        to disable splitting (infinite chunk size). (default:\n                        gpt-4-32k)\n  -S, --no-split        Do not split the prompt into multiple parts (use this\n                        if the model has a really large context size)\n                        (default: False)\n  -s chunk_size, --chunk-size chunk_size\n                        Chunk size when splitting transcript, also used to\n                        determine whether to split, defaults to 1/2 of the\n                        context length limit of the model (default: None)\n  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]\n                        Parts to select in the processes list of Documents\n                        (default: None)\n  -r, --raw             Wraps the content in triple quotes with no extra text\n                        (default: False)\n  -R, --raw-no-quotes   Output the content only (default: False)\n  --print-percentage-non-ascii\n                        Print percentage of non-ascii characters (default:\n                        False)\n  -n, --dry-run         Dry run (default: False)\n  --out OUT             Output file (default: None)\n  -C, --from-clipboard  Load text from clipboard (default: False)\n  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF\n                        content in the prompt (default: the content of a\n                        document)\n  -M, --merge           Merge contents of all pages before processing\n                        (default: False)\n\n```\n### `htmlprompt`\n\n```\n$ htmlprompt --help\n\nusage: htmlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]\n                  [-P PARTS [PARTS ...]] [-r] [-R]\n                  [--print-percentage-non-ascii] [-n] [--out OUT] [-C]\n                  [-w WHAT] [-M]\n                  [PATH ...]\n\nGet a prompt from html files\n\npositional arguments:\n  PATH                  Paths to the html files, or stdin if not provided\n                        (default: None)\n\noptions:\n  -h, --help            show this help message and exit\n  -V, --version         show program's version number and exit\n  -c, --copy            Copy the prompt to clipboard (default: False)\n  -e, --edit            Edit the prompt and copy manually (default: False)\n  -m model, --model model\n                        Model to use. This only affects the chunk size. Use -S\n                        to disable splitting (infinite chunk size). (default:\n                        gpt-4-32k)\n  -S, --no-split        Do not split the prompt into multiple parts (use this\n                        if the model has a really large context size)\n                        (default: False)\n  -s chunk_size, --chunk-size chunk_size\n                        Chunk size when splitting transcript, also used to\n                        determine whether to split, defaults to 1/2 of the\n                        context length limit of the model (default: None)\n  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]\n                        Parts to select in the processes list of Documents\n                        (default: None)\n  -r, --raw             Wraps the content in triple quotes with no extra text\n                        (default: False)\n  -R, --raw-no-quotes   Output the content only (default: False)\n  --print-percentage-non-ascii\n                        Print percentage of non-ascii characters (default:\n                        False)\n  -n, --dry-run         Dry run (default: False)\n  --out OUT             Output file (default: None)\n  -C, --from-clipboard  Load text from clipboard (default: False)\n  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF\n                        content in the prompt (default: the text content of a\n                        html file)\n  -M, --merge           Merge contents of all pages before processing\n                        (default: False)\n\n```\n\n## Installation\n\n### pipx\n\nThis is the recommended installation method.\n\n```\n$ pipx install langchain-utils\n```\n\n### [pip](https://pypi.org/project/langchain-utils/)\n\n```\n$ pip install langchain-utils\n```\n\n## Develop\n\n```\n$ git clone https://github.com/tddschn/langchain-utils.git\n$ cd langchain-utils\n$ poetry install\n```",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Utilities built upon the langchain library",
    "version": "0.8.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/tddschn/langchain-utils/issues",
        "Homepage": "https://github.com/tddschn/langchain-utils",
        "Repository": "https://github.com/tddschn/langchain-utils"
    },
    "split_keywords": [
        "langchain",
        " utils",
        " llm",
        " prompts",
        " cli"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0ced281e4deb22b99a6315760a1455db32177a23f260c598194448a04b7cb13d",
                "md5": "24b58db6ad99c70464b977c4783abeaf",
                "sha256": "dff1b76a58de8ac67a51380a1bd16e26a4868d4eaa0944ccc20a5eb5b021ee43"
            },
            "downloads": -1,
            "filename": "langchain_utils-0.8.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "24b58db6ad99c70464b977c4783abeaf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 31206,
            "upload_time": "2024-06-05T17:20:15",
            "upload_time_iso_8601": "2024-06-05T17:20:15.533330Z",
            "url": "https://files.pythonhosted.org/packages/0c/ed/281e4deb22b99a6315760a1455db32177a23f260c598194448a04b7cb13d/langchain_utils-0.8.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f17f93f5a879414f8755811405fd90595d6116b2c9abcd11b355fe6ea8d59824",
                "md5": "c8e69d7ee0bb7756548fbfaf35aaa575",
                "sha256": "73efaf790266acd9deb6105239e9937920e72fa2c2366827094b617b49a4d443"
            },
            "downloads": -1,
            "filename": "langchain_utils-0.8.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c8e69d7ee0bb7756548fbfaf35aaa575",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.11",
            "size": 22193,
            "upload_time": "2024-06-05T17:20:17",
            "upload_time_iso_8601": "2024-06-05T17:20:17.243007Z",
            "url": "https://files.pythonhosted.org/packages/f1/7f/93f5a879414f8755811405fd90595d6116b2c9abcd11b355fe6ea8d59824/langchain_utils-0.8.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-05 17:20:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tddschn",
    "github_project": "langchain-utils",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "langchain-utils"
}

Teddy Xinyuan Chen