# [C Transformers](https://github.com/marella/ctransformers) [![PyPI](https://img.shields.io/pypi/v/ctransformers)](https://pypi.org/project/ctransformers/) [![tests](https://github.com/marella/ctransformers/actions/workflows/tests.yml/badge.svg)](https://github.com/marella/ctransformers/actions/workflows/tests.yml) [![build](https://github.com/marella/ctransformers/actions/workflows/build.yml/badge.svg)](https://github.com/marella/ctransformers/actions/workflows/build.yml)
**Fork of CTransformers to support Langdash functions**
Python bindings for the Transformer models implemented in C/C++ using [GGML](https://github.com/ggerganov/ggml) library.
> Also see [ChatDocs](https://github.com/marella/chatdocs)
- [Supported Models](#supported-models)
- [Installation](#installation)
- [Usage](#usage)
- [Hugging Face Hub](#hugging-face-hub)
- [LangChain](#langchain)
- [GPU](#gpu)
- [Documentation](#documentation)
- [License](#license)
## Supported Models
| Models | Model Type |
| :-------------------- | ----------- |
| GPT-2 | `gpt2` |
| GPT-J, GPT4All-J | `gptj` |
| GPT-NeoX, StableLM | `gpt_neox` |
| LLaMA, LLaMA 2 | `llama` |
| MPT | `mpt` |
| Dolly V2 | `dolly-v2` |
| Replit | `replit` |
| StarCoder, StarChat | `starcoder` |
| Falcon (Experimental) | `falcon` |
## Installation
```sh
pip install ctransformers
```
## Usage
It provides a unified interface for all models:
```py
from ctransformers import AutoModelForCausalLM
llm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-gpt-2.bin', model_type='gpt2')
print(llm('AI is going to'))
```
[Run in Google Colab](https://colab.research.google.com/drive/1GMhYMUAv_TyZkpfvUI1NirM8-9mCXQyL)
If you are getting `illegal instruction` error, try using `lib='avx'` or `lib='basic'`:
```py
llm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-gpt-2.bin', model_type='gpt2', lib='avx')
```
It provides a generator interface for more control:
```py
tokens = llm.tokenize('AI is going to')
for token in llm.generate(tokens):
print(llm.detokenize(token))
```
It can be used with a custom or Hugging Face tokenizer:
```py
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('gpt2')
tokens = tokenizer.encode('AI is going to')
for token in llm.generate(tokens):
print(tokenizer.decode(token))
```
It also provides access to the low-level C API. See [Documentation](#documentation) section below.
### Hugging Face Hub
It can be used with models hosted on the Hub:
```py
llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')
```
If a model repo has multiple model files (`.bin` files), specify a model file using:
```py
llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml', model_file='ggml-model.bin')
```
It can be used with your own models uploaded on the Hub. For better user experience, upload only one model per repo.
To use it with your own model, add `config.json` file to your model repo specifying the `model_type`:
```json
{
"model_type": "gpt2"
}
```
You can also specify additional parameters under `task_specific_params.text-generation`.
See [marella/gpt-2-ggml](https://huggingface.co/marella/gpt-2-ggml/blob/main/config.json) for a minimal example and [marella/gpt-2-ggml-example](https://huggingface.co/marella/gpt-2-ggml-example/blob/main/config.json) for a full example.
### LangChain
It is integrated into LangChain. See [LangChain docs](https://python.langchain.com/docs/ecosystem/integrations/ctransformers).
### GPU
> **Note:** Currently only LLaMA and Falcon models have GPU support.
To run some of the model layers on GPU, set the `gpu_layers` parameter:
```py
llm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-llama.bin', model_type='llama', gpu_layers=50)
```
#### CUDA
Make sure you have installed [CUDA 12](https://developer.nvidia.com/cuda-downloads) and latest [NVIDIA Drivers](https://www.nvidia.com/download/index.aspx).
<details>
<summary><strong>Show instructions for CUDA 11</strong></summary><br>
To use with CUDA 11, install the `ctransformers` package using:
```sh
CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers
```
On Windows PowerShell run:
```sh
$env:CT_CUBLAS=1
pip install ctransformers --no-binary ctransformers
```
On Windows Command Prompt run:
```sh
set CT_CUBLAS=1
pip install ctransformers --no-binary ctransformers
```
</details>
[Run in Google Colab](https://colab.research.google.com/drive/1Ihn7iPCYiqlTotpkqa1tOhUIpJBrJ1Tp)
#### Metal
To enable Metal support, install the `ctransformers` package using:
```sh
CT_METAL=1 pip install ctransformers --no-binary ctransformers
```
## Documentation
<!-- API_DOCS -->
### Config
| Parameter | Type | Description | Default |
| :------------------- | :---------- | :------------------------------------------------------- | :------ |
| `top_k` | `int` | The top-k value to use for sampling. | `40` |
| `top_p` | `float` | The top-p value to use for sampling. | `0.95` |
| `temperature` | `float` | The temperature to use for sampling. | `0.8` |
| `repetition_penalty` | `float` | The repetition penalty to use for sampling. | `1.1` |
| `last_n_tokens` | `int` | The number of last tokens to use for repetition penalty. | `64` |
| `seed` | `int` | The seed value to use for sampling tokens. | `-1` |
| `max_new_tokens` | `int` | The maximum number of new tokens to generate. | `256` |
| `stop` | `List[str]` | A list of sequences to stop generation when encountered. | `None` |
| `stream` | `bool` | Whether to stream the generated text. | `False` |
| `reset` | `bool` | Whether to reset the model state before generating text. | `True` |
| `batch_size` | `int` | The batch size to use for evaluating tokens. | `8` |
| `threads` | `int` | The number of threads to use for evaluating tokens. | `-1` |
| `context_length` | `int` | The maximum context length to use. | `-1` |
| `gpu_layers` | `int` | The number of layers to run on GPU. | `0` |
> **Note:** Currently only LLaMA, MPT, Falcon models support the `context_length` parameter and only LLaMA, Falcon models support the `gpu_layers` parameter.
### <kbd>class</kbd> `AutoModelForCausalLM`
---
#### <kbd>classmethod</kbd> `AutoModelForCausalLM.from_pretrained`
```python
from_pretrained(
model_path_or_repo_id: str,
model_type: Optional[str] = None,
model_file: Optional[str] = None,
config: Optional[ctransformers.hub.AutoConfig] = None,
lib: Optional[str] = None,
local_files_only: bool = False,
**kwargs
) → LLM
```
Loads the language model from a local file or remote repo.
**Args:**
- <b>`model_path_or_repo_id`</b>: The path to a model file or directory or the name of a Hugging Face Hub model repo.
- <b>`model_type`</b>: The model type.
- <b>`model_file`</b>: The name of the model file in repo or directory.
- <b>`config`</b>: `AutoConfig` object.
- <b>`lib`</b>: The path to a shared library or one of `avx2`, `avx`, `basic`.
- <b>`local_files_only`</b>: Whether or not to only look at local files (i.e., do not try to download the model).
**Returns:**
`LLM` object.
### <kbd>class</kbd> `LLM`
### <kbd>method</kbd> `LLM.__init__`
```python
__init__(
model_path: str,
model_type: str,
config: Optional[ctransformers.llm.Config] = None,
lib: Optional[str] = None
)
```
Loads the language model from a local file.
**Args:**
- <b>`model_path`</b>: The path to a model file.
- <b>`model_type`</b>: The model type.
- <b>`config`</b>: `Config` object.
- <b>`lib`</b>: The path to a shared library or one of `avx2`, `avx`, `basic`.
---
##### <kbd>property</kbd> LLM.config
The config object.
---
##### <kbd>property</kbd> LLM.context_length
The context length of model.
---
##### <kbd>property</kbd> LLM.embeddings
The input embeddings.
---
##### <kbd>property</kbd> LLM.eos_token_id
The end-of-sequence token.
---
##### <kbd>property</kbd> LLM.logits
The unnormalized log probabilities.
---
##### <kbd>property</kbd> LLM.model_path
The path to the model file.
---
##### <kbd>property</kbd> LLM.model_type
The model type.
---
##### <kbd>property</kbd> LLM.vocab_size
The number of tokens in vocabulary.
---
#### <kbd>method</kbd> `LLM.detokenize`
```python
detokenize(tokens: Sequence[int], decode: bool = True) → Union[str, bytes]
```
Converts a list of tokens to text.
**Args:**
- <b>`tokens`</b>: The list of tokens.
- <b>`decode`</b>: Whether to decode the text as UTF-8 string.
**Returns:**
The combined text of all tokens.
---
#### <kbd>method</kbd> `LLM.embed`
```python
embed(
input: Union[str, Sequence[int]],
batch_size: Optional[int] = None,
threads: Optional[int] = None
) → List[float]
```
Computes embeddings for a text or list of tokens.
> **Note:** Currently only LLaMA and Falcon models support embeddings.
**Args:**
- <b>`input`</b>: The input text or list of tokens to get embeddings for.
- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`
- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`
**Returns:**
The input embeddings.
---
#### <kbd>method</kbd> `LLM.eval`
```python
eval(
tokens: Sequence[int],
batch_size: Optional[int] = None,
threads: Optional[int] = None
) → None
```
Evaluates a list of tokens.
**Args:**
- <b>`tokens`</b>: The list of tokens to evaluate.
- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`
- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`
---
#### <kbd>method</kbd> `LLM.generate`
```python
generate(
tokens: Sequence[int],
top_k: Optional[int] = None,
top_p: Optional[float] = None,
temperature: Optional[float] = None,
repetition_penalty: Optional[float] = None,
last_n_tokens: Optional[int] = None,
seed: Optional[int] = None,
batch_size: Optional[int] = None,
threads: Optional[int] = None,
reset: Optional[bool] = None
) → Generator[int, NoneType, NoneType]
```
Generates new tokens from a list of tokens.
**Args:**
- <b>`tokens`</b>: The list of tokens to generate tokens from.
- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`
- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`
- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`
- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`
- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`
- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`
- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`
- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`
- <b>`reset`</b>: Whether to reset the model state before generating text. Default: `True`
**Returns:**
The generated tokens.
---
#### <kbd>method</kbd> `LLM.is_eos_token`
```python
is_eos_token(token: int) → bool
```
Checks if a token is an end-of-sequence token.
**Args:**
- <b>`token`</b>: The token to check.
**Returns:**
`True` if the token is an end-of-sequence token else `False`.
---
#### <kbd>method</kbd> `LLM.reset`
```python
reset() → None
```
Resets the model state.
---
#### <kbd>method</kbd> `LLM.sample`
```python
sample(
top_k: Optional[int] = None,
top_p: Optional[float] = None,
temperature: Optional[float] = None,
repetition_penalty: Optional[float] = None,
last_n_tokens: Optional[int] = None,
seed: Optional[int] = None
) → int
```
Samples a token from the model.
**Args:**
- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`
- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`
- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`
- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`
- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`
- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`
**Returns:**
The sampled token.
---
#### <kbd>method</kbd> `LLM.tokenize`
```python
tokenize(text: str) → List[int]
```
Converts a text into list of tokens.
**Args:**
- <b>`text`</b>: The text to tokenize.
**Returns:**
The list of tokens.
---
#### <kbd>method</kbd> `LLM.__call__`
```python
__call__(
prompt: str,
max_new_tokens: Optional[int] = None,
top_k: Optional[int] = None,
top_p: Optional[float] = None,
temperature: Optional[float] = None,
repetition_penalty: Optional[float] = None,
last_n_tokens: Optional[int] = None,
seed: Optional[int] = None,
batch_size: Optional[int] = None,
threads: Optional[int] = None,
stop: Optional[Sequence[str]] = None,
stream: Optional[bool] = None,
reset: Optional[bool] = None
) → Union[str, Generator[str, NoneType, NoneType]]
```
Generates text from a prompt.
**Args:**
- <b>`prompt`</b>: The prompt to generate text from.
- <b>`max_new_tokens`</b>: The maximum number of new tokens to generate. Default: `256`
- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`
- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`
- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`
- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`
- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`
- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`
- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`
- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`
- <b>`stop`</b>: A list of sequences to stop generation when encountered. Default: `None`
- <b>`stream`</b>: Whether to stream the generated text. Default: `False`
- <b>`reset`</b>: Whether to reset the model state before generating text. Default: `True`
**Returns:**
The generated text.
<!-- API_DOCS -->
## License
[MIT](https://github.com/marella/ctransformers/blob/main/LICENSE)
Raw data
{
"_id": null,
"home_page": "https://git.mysymphony.jp.net/nana/ctransformers",
"name": "ctransformers-langdash",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "ctransformers transformers ai llm",
"author": "Nana Mochizuki",
"author_email": "nana@mysymphony.jp.net",
"download_url": "https://files.pythonhosted.org/packages/e2/2e/0619cf3889b602fc29c05a4702a70ed597b5224cefc4eb036c67d68e0aa5/ctransformers-langdash-0.2.14.tar.gz",
"platform": null,
"description": "# [C Transformers](https://github.com/marella/ctransformers) [![PyPI](https://img.shields.io/pypi/v/ctransformers)](https://pypi.org/project/ctransformers/) [![tests](https://github.com/marella/ctransformers/actions/workflows/tests.yml/badge.svg)](https://github.com/marella/ctransformers/actions/workflows/tests.yml) [![build](https://github.com/marella/ctransformers/actions/workflows/build.yml/badge.svg)](https://github.com/marella/ctransformers/actions/workflows/build.yml)\n\n**Fork of CTransformers to support Langdash functions**\n\nPython bindings for the Transformer models implemented in C/C++ using [GGML](https://github.com/ggerganov/ggml) library.\n\n> Also see [ChatDocs](https://github.com/marella/chatdocs)\n\n- [Supported Models](#supported-models)\n- [Installation](#installation)\n- [Usage](#usage)\n - [Hugging Face Hub](#hugging-face-hub)\n - [LangChain](#langchain)\n - [GPU](#gpu)\n- [Documentation](#documentation)\n- [License](#license)\n\n## Supported Models\n\n| Models | Model Type |\n| :-------------------- | ----------- |\n| GPT-2 | `gpt2` |\n| GPT-J, GPT4All-J | `gptj` |\n| GPT-NeoX, StableLM | `gpt_neox` |\n| LLaMA, LLaMA 2 | `llama` |\n| MPT | `mpt` |\n| Dolly V2 | `dolly-v2` |\n| Replit | `replit` |\n| StarCoder, StarChat | `starcoder` |\n| Falcon (Experimental) | `falcon` |\n\n## Installation\n\n```sh\npip install ctransformers\n```\n\n## Usage\n\nIt provides a unified interface for all models:\n\n```py\nfrom ctransformers import AutoModelForCausalLM\n\nllm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-gpt-2.bin', model_type='gpt2')\n\nprint(llm('AI is going to'))\n```\n\n[Run in Google Colab](https://colab.research.google.com/drive/1GMhYMUAv_TyZkpfvUI1NirM8-9mCXQyL)\n\nIf you are getting `illegal instruction` error, try using `lib='avx'` or `lib='basic'`:\n\n```py\nllm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-gpt-2.bin', model_type='gpt2', lib='avx')\n```\n\nIt provides a generator interface for more control:\n\n```py\ntokens = llm.tokenize('AI is going to')\n\nfor token in llm.generate(tokens):\n print(llm.detokenize(token))\n```\n\nIt can be used with a custom or Hugging Face tokenizer:\n\n```py\nfrom transformers import AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained('gpt2')\n\ntokens = tokenizer.encode('AI is going to')\n\nfor token in llm.generate(tokens):\n print(tokenizer.decode(token))\n```\n\nIt also provides access to the low-level C API. See [Documentation](#documentation) section below.\n\n### Hugging Face Hub\n\nIt can be used with models hosted on the Hub:\n\n```py\nllm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')\n```\n\nIf a model repo has multiple model files (`.bin` files), specify a model file using:\n\n```py\nllm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml', model_file='ggml-model.bin')\n```\n\nIt can be used with your own models uploaded on the Hub. For better user experience, upload only one model per repo.\n\nTo use it with your own model, add `config.json` file to your model repo specifying the `model_type`:\n\n```json\n{\n \"model_type\": \"gpt2\"\n}\n```\n\nYou can also specify additional parameters under `task_specific_params.text-generation`.\n\nSee [marella/gpt-2-ggml](https://huggingface.co/marella/gpt-2-ggml/blob/main/config.json) for a minimal example and [marella/gpt-2-ggml-example](https://huggingface.co/marella/gpt-2-ggml-example/blob/main/config.json) for a full example.\n\n### LangChain\n\nIt is integrated into LangChain. See [LangChain docs](https://python.langchain.com/docs/ecosystem/integrations/ctransformers).\n\n### GPU\n\n> **Note:** Currently only LLaMA and Falcon models have GPU support.\n\nTo run some of the model layers on GPU, set the `gpu_layers` parameter:\n\n```py\nllm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-llama.bin', model_type='llama', gpu_layers=50)\n```\n\n#### CUDA\n\nMake sure you have installed [CUDA 12](https://developer.nvidia.com/cuda-downloads) and latest [NVIDIA Drivers](https://www.nvidia.com/download/index.aspx).\n\n<details>\n<summary><strong>Show instructions for CUDA 11</strong></summary><br>\n\nTo use with CUDA 11, install the `ctransformers` package using:\n\n```sh\nCT_CUBLAS=1 pip install ctransformers --no-binary ctransformers\n```\n\nOn Windows PowerShell run:\n\n```sh\n$env:CT_CUBLAS=1\npip install ctransformers --no-binary ctransformers\n```\n\nOn Windows Command Prompt run:\n\n```sh\nset CT_CUBLAS=1\npip install ctransformers --no-binary ctransformers\n```\n\n</details>\n\n[Run in Google Colab](https://colab.research.google.com/drive/1Ihn7iPCYiqlTotpkqa1tOhUIpJBrJ1Tp)\n\n#### Metal\n\nTo enable Metal support, install the `ctransformers` package using:\n\n```sh\nCT_METAL=1 pip install ctransformers --no-binary ctransformers\n```\n\n## Documentation\n\n<!-- API_DOCS -->\n\n### Config\n\n| Parameter | Type | Description | Default |\n| :------------------- | :---------- | :------------------------------------------------------- | :------ |\n| `top_k` | `int` | The top-k value to use for sampling. | `40` |\n| `top_p` | `float` | The top-p value to use for sampling. | `0.95` |\n| `temperature` | `float` | The temperature to use for sampling. | `0.8` |\n| `repetition_penalty` | `float` | The repetition penalty to use for sampling. | `1.1` |\n| `last_n_tokens` | `int` | The number of last tokens to use for repetition penalty. | `64` |\n| `seed` | `int` | The seed value to use for sampling tokens. | `-1` |\n| `max_new_tokens` | `int` | The maximum number of new tokens to generate. | `256` |\n| `stop` | `List[str]` | A list of sequences to stop generation when encountered. | `None` |\n| `stream` | `bool` | Whether to stream the generated text. | `False` |\n| `reset` | `bool` | Whether to reset the model state before generating text. | `True` |\n| `batch_size` | `int` | The batch size to use for evaluating tokens. | `8` |\n| `threads` | `int` | The number of threads to use for evaluating tokens. | `-1` |\n| `context_length` | `int` | The maximum context length to use. | `-1` |\n| `gpu_layers` | `int` | The number of layers to run on GPU. | `0` |\n\n> **Note:** Currently only LLaMA, MPT, Falcon models support the `context_length` parameter and only LLaMA, Falcon models support the `gpu_layers` parameter.\n\n### <kbd>class</kbd> `AutoModelForCausalLM`\n\n---\n\n#### <kbd>classmethod</kbd> `AutoModelForCausalLM.from_pretrained`\n\n```python\nfrom_pretrained(\n model_path_or_repo_id: str,\n model_type: Optional[str] = None,\n model_file: Optional[str] = None,\n config: Optional[ctransformers.hub.AutoConfig] = None,\n lib: Optional[str] = None,\n local_files_only: bool = False,\n **kwargs\n) \u2192 LLM\n```\n\nLoads the language model from a local file or remote repo.\n\n**Args:**\n\n- <b>`model_path_or_repo_id`</b>: The path to a model file or directory or the name of a Hugging Face Hub model repo.\n- <b>`model_type`</b>: The model type.\n- <b>`model_file`</b>: The name of the model file in repo or directory.\n- <b>`config`</b>: `AutoConfig` object.\n- <b>`lib`</b>: The path to a shared library or one of `avx2`, `avx`, `basic`.\n- <b>`local_files_only`</b>: Whether or not to only look at local files (i.e., do not try to download the model).\n\n**Returns:**\n`LLM` object.\n\n### <kbd>class</kbd> `LLM`\n\n### <kbd>method</kbd> `LLM.__init__`\n\n```python\n__init__(\n model_path: str,\n model_type: str,\n config: Optional[ctransformers.llm.Config] = None,\n lib: Optional[str] = None\n)\n```\n\nLoads the language model from a local file.\n\n**Args:**\n\n- <b>`model_path`</b>: The path to a model file.\n- <b>`model_type`</b>: The model type.\n- <b>`config`</b>: `Config` object.\n- <b>`lib`</b>: The path to a shared library or one of `avx2`, `avx`, `basic`.\n\n---\n\n##### <kbd>property</kbd> LLM.config\n\nThe config object.\n\n---\n\n##### <kbd>property</kbd> LLM.context_length\n\nThe context length of model.\n\n---\n\n##### <kbd>property</kbd> LLM.embeddings\n\nThe input embeddings.\n\n---\n\n##### <kbd>property</kbd> LLM.eos_token_id\n\nThe end-of-sequence token.\n\n---\n\n##### <kbd>property</kbd> LLM.logits\n\nThe unnormalized log probabilities.\n\n---\n\n##### <kbd>property</kbd> LLM.model_path\n\nThe path to the model file.\n\n---\n\n##### <kbd>property</kbd> LLM.model_type\n\nThe model type.\n\n---\n\n##### <kbd>property</kbd> LLM.vocab_size\n\nThe number of tokens in vocabulary.\n\n---\n\n#### <kbd>method</kbd> `LLM.detokenize`\n\n```python\ndetokenize(tokens: Sequence[int], decode: bool = True) \u2192 Union[str, bytes]\n```\n\nConverts a list of tokens to text.\n\n**Args:**\n\n- <b>`tokens`</b>: The list of tokens.\n- <b>`decode`</b>: Whether to decode the text as UTF-8 string.\n\n**Returns:**\nThe combined text of all tokens.\n\n---\n\n#### <kbd>method</kbd> `LLM.embed`\n\n```python\nembed(\n input: Union[str, Sequence[int]],\n batch_size: Optional[int] = None,\n threads: Optional[int] = None\n) \u2192 List[float]\n```\n\nComputes embeddings for a text or list of tokens.\n\n> **Note:** Currently only LLaMA and Falcon models support embeddings.\n\n**Args:**\n\n- <b>`input`</b>: The input text or list of tokens to get embeddings for.\n- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`\n- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`\n\n**Returns:**\nThe input embeddings.\n\n---\n\n#### <kbd>method</kbd> `LLM.eval`\n\n```python\neval(\n tokens: Sequence[int],\n batch_size: Optional[int] = None,\n threads: Optional[int] = None\n) \u2192 None\n```\n\nEvaluates a list of tokens.\n\n**Args:**\n\n- <b>`tokens`</b>: The list of tokens to evaluate.\n- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`\n- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`\n\n---\n\n#### <kbd>method</kbd> `LLM.generate`\n\n```python\ngenerate(\n tokens: Sequence[int],\n top_k: Optional[int] = None,\n top_p: Optional[float] = None,\n temperature: Optional[float] = None,\n repetition_penalty: Optional[float] = None,\n last_n_tokens: Optional[int] = None,\n seed: Optional[int] = None,\n batch_size: Optional[int] = None,\n threads: Optional[int] = None,\n reset: Optional[bool] = None\n) \u2192 Generator[int, NoneType, NoneType]\n```\n\nGenerates new tokens from a list of tokens.\n\n**Args:**\n\n- <b>`tokens`</b>: The list of tokens to generate tokens from.\n- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`\n- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`\n- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`\n- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`\n- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`\n- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`\n- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`\n- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`\n- <b>`reset`</b>: Whether to reset the model state before generating text. Default: `True`\n\n**Returns:**\nThe generated tokens.\n\n---\n\n#### <kbd>method</kbd> `LLM.is_eos_token`\n\n```python\nis_eos_token(token: int) \u2192 bool\n```\n\nChecks if a token is an end-of-sequence token.\n\n**Args:**\n\n- <b>`token`</b>: The token to check.\n\n**Returns:**\n`True` if the token is an end-of-sequence token else `False`.\n\n---\n\n#### <kbd>method</kbd> `LLM.reset`\n\n```python\nreset() \u2192 None\n```\n\nResets the model state.\n\n---\n\n#### <kbd>method</kbd> `LLM.sample`\n\n```python\nsample(\n top_k: Optional[int] = None,\n top_p: Optional[float] = None,\n temperature: Optional[float] = None,\n repetition_penalty: Optional[float] = None,\n last_n_tokens: Optional[int] = None,\n seed: Optional[int] = None\n) \u2192 int\n```\n\nSamples a token from the model.\n\n**Args:**\n\n- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`\n- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`\n- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`\n- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`\n- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`\n- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`\n\n**Returns:**\nThe sampled token.\n\n---\n\n#### <kbd>method</kbd> `LLM.tokenize`\n\n```python\ntokenize(text: str) \u2192 List[int]\n```\n\nConverts a text into list of tokens.\n\n**Args:**\n\n- <b>`text`</b>: The text to tokenize.\n\n**Returns:**\nThe list of tokens.\n\n---\n\n#### <kbd>method</kbd> `LLM.__call__`\n\n```python\n__call__(\n prompt: str,\n max_new_tokens: Optional[int] = None,\n top_k: Optional[int] = None,\n top_p: Optional[float] = None,\n temperature: Optional[float] = None,\n repetition_penalty: Optional[float] = None,\n last_n_tokens: Optional[int] = None,\n seed: Optional[int] = None,\n batch_size: Optional[int] = None,\n threads: Optional[int] = None,\n stop: Optional[Sequence[str]] = None,\n stream: Optional[bool] = None,\n reset: Optional[bool] = None\n) \u2192 Union[str, Generator[str, NoneType, NoneType]]\n```\n\nGenerates text from a prompt.\n\n**Args:**\n\n- <b>`prompt`</b>: The prompt to generate text from.\n- <b>`max_new_tokens`</b>: The maximum number of new tokens to generate. Default: `256`\n- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`\n- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`\n- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`\n- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`\n- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`\n- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`\n- <b>`batch_size`</b>: The batch size to use for evaluating tokens. Default: `8`\n- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`\n- <b>`stop`</b>: A list of sequences to stop generation when encountered. Default: `None`\n- <b>`stream`</b>: Whether to stream the generated text. Default: `False`\n- <b>`reset`</b>: Whether to reset the model state before generating text. Default: `True`\n\n**Returns:**\nThe generated text.\n\n<!-- API_DOCS -->\n\n## License\n\n[MIT](https://github.com/marella/ctransformers/blob/main/LICENSE)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Fork of CTransformers to support Langdash functions.",
"version": "0.2.14",
"project_urls": {
"Homepage": "https://git.mysymphony.jp.net/nana/ctransformers"
},
"split_keywords": [
"ctransformers",
"transformers",
"ai",
"llm"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e22e0619cf3889b602fc29c05a4702a70ed597b5224cefc4eb036c67d68e0aa5",
"md5": "7cb2eb97ed1d250cdc2c66462f57f473",
"sha256": "e8052ae9e35b73e0dd13179f5065cce9eda49d42cc85efc2644335789a0a2dbb"
},
"downloads": -1,
"filename": "ctransformers-langdash-0.2.14.tar.gz",
"has_sig": false,
"md5_digest": "7cb2eb97ed1d250cdc2c66462f57f473",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 262260,
"upload_time": "2023-07-24T20:19:26",
"upload_time_iso_8601": "2023-07-24T20:19:26.699154Z",
"url": "https://files.pythonhosted.org/packages/e2/2e/0619cf3889b602fc29c05a4702a70ed597b5224cefc4eb036c67d68e0aa5/ctransformers-langdash-0.2.14.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-24 20:19:26",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "ctransformers-langdash"
}