ctransformers

Name	ctransformers JSON
Version	0.2.27 JSON
	download
home_page	https://github.com/marella/ctransformers
Summary	Python bindings for the Transformer models implemented in C/C++ using GGML library.
upload_time	2023-09-10 15:19:14
maintainer
docs_url	None
author	Ravindra Marella
requires_python
license	MIT
keywords	ctransformers transformers ai llm
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # [CTransformers](https://github.com/marella/ctransformers) [![PyPI](https://img.shields.io/pypi/v/ctransformers)](https://pypi.org/project/ctransformers/) [![tests](https://github.com/marella/ctransformers/actions/workflows/tests.yml/badge.svg)](https://github.com/marella/ctransformers/actions/workflows/tests.yml) [![build](https://github.com/marella/ctransformers/actions/workflows/build.yml/badge.svg)](https://github.com/marella/ctransformers/actions/workflows/build.yml)

Python bindings for the Transformer models implemented in C/C++ using [GGML](https://github.com/ggerganov/ggml) library.

> Also see [ChatDocs](https://github.com/marella/chatdocs)

- [Supported Models](#supported-models)
- [Installation](#installation)
- [Usage](#usage)
  - [🤗 Transformers](#transformers)
  - [LangChain](#langchain)
  - [GPU](#gpu)
  - [GPTQ](#gptq)
- [Documentation](#documentation)
- [License](#license)

## Supported Models

| Models              | Model Type    | CUDA | Metal |
| :------------------ | ------------- | :--: | :---: |
| GPT-2               | `gpt2`        |      |       |
| GPT-J, GPT4All-J    | `gptj`        |      |       |
| GPT-NeoX, StableLM  | `gpt_neox`    |      |       |
| Falcon              | `falcon`      |  ✅  |       |
| LLaMA, LLaMA 2      | `llama`       |  ✅  |  ✅   |
| MPT                 | `mpt`         |  ✅  |       |
| StarCoder, StarChat | `gpt_bigcode` |  ✅  |       |
| Dolly V2            | `dolly-v2`    |      |       |
| Replit              | `replit`      |      |       |

## Installation

```sh
pip install ctransformers
```

## Usage

It provides a unified interface for all models:

```py
from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained("/path/to/ggml-model.bin", model_type="gpt2")

print(llm("AI is going to"))
```

[Run in Google Colab](https://colab.research.google.com/drive/1GMhYMUAv_TyZkpfvUI1NirM8-9mCXQyL)

To stream the output, set `stream=True`:

```py
for text in llm("AI is going to", stream=True):
    print(text, end="", flush=True)
```

You can load models from Hugging Face Hub directly:

```py
llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml")
```

If a model repo has multiple model files (`.bin` or `.gguf` files), specify a model file using:

```py
llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", model_file="ggml-model.bin")
```

<a id="transformers"></a>

### 🤗 Transformers

> **Note:** This is an experimental feature and may change in the future.

To use it with 🤗 Transformers, create model and tokenizer using:

```py
from ctransformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True)
tokenizer = AutoTokenizer.from_pretrained(model)
```

[Run in Google Colab](https://colab.research.google.com/drive/1FVSLfTJ2iBbQ1oU2Rqz0MkpJbaB_5Got)

You can use 🤗 Transformers text generation pipeline:

```py
from transformers import pipeline

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe("AI is going to", max_new_tokens=256))
```

You can use 🤗 Transformers generation [parameters](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig):

```py
pipe("AI is going to", max_new_tokens=256, do_sample=True, temperature=0.8, repetition_penalty=1.1)
```

You can use 🤗 Transformers tokenizers:

```py
from ctransformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True)  # Load model from GGML model repo.
tokenizer = AutoTokenizer.from_pretrained("gpt2")  # Load tokenizer from original model repo.
```

### LangChain

It is integrated into LangChain. See [LangChain docs](https://python.langchain.com/docs/ecosystem/integrations/ctransformers).

### GPU

To run some of the model layers on GPU, set the `gpu_layers` parameter:

```py
llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GGML", gpu_layers=50)
```

[Run in Google Colab](https://colab.research.google.com/drive/1Ihn7iPCYiqlTotpkqa1tOhUIpJBrJ1Tp)

#### CUDA

Install CUDA libraries using:

```sh
pip install ctransformers[cuda]
```

#### ROCm

To enable ROCm support, install the `ctransformers` package using:

```sh
CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers
```

#### Metal

To enable Metal support, install the `ctransformers` package using:

```sh
CT_METAL=1 pip install ctransformers --no-binary ctransformers
```

### GPTQ

> **Note:** This is an experimental feature and only LLaMA models are supported using [ExLlama](https://github.com/turboderp/exllama).

Install additional dependencies using:

```sh
pip install ctransformers[gptq]
```

Load a GPTQ model using:

```py
llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GPTQ")
```

[Run in Google Colab](https://colab.research.google.com/drive/1SzHslJ4CiycMOgrppqecj4VYCWFnyrN0)

> If model name or path doesn't contain the word `gptq` then specify `model_type="gptq"`.

It can also be used with LangChain. Low-level APIs are not fully supported.

## Documentation

<!-- API_DOCS -->

### Config

| Parameter            | Type        | Description                                                     | Default |
| :------------------- | :---------- | :-------------------------------------------------------------- | :------ |
| `top_k`              | `int`       | The top-k value to use for sampling.                            | `40`    |
| `top_p`              | `float`     | The top-p value to use for sampling.                            | `0.95`  |
| `temperature`        | `float`     | The temperature to use for sampling.                            | `0.8`   |
| `repetition_penalty` | `float`     | The repetition penalty to use for sampling.                     | `1.1`   |
| `last_n_tokens`      | `int`       | The number of last tokens to use for repetition penalty.        | `64`    |
| `seed`               | `int`       | The seed value to use for sampling tokens.                      | `-1`    |
| `max_new_tokens`     | `int`       | The maximum number of new tokens to generate.                   | `256`   |
| `stop`               | `List[str]` | A list of sequences to stop generation when encountered.        | `None`  |
| `stream`             | `bool`      | Whether to stream the generated text.                           | `False` |
| `reset`              | `bool`      | Whether to reset the model state before generating text.        | `True`  |
| `batch_size`         | `int`       | The batch size to use for evaluating tokens in a single prompt. | `8`     |
| `threads`            | `int`       | The number of threads to use for evaluating tokens.             | `-1`    |
| `context_length`     | `int`       | The maximum context length to use.                              | `-1`    |
| `gpu_layers`         | `int`       | The number of layers to run on GPU.                             | `0`     |

> **Note:** Currently only LLaMA, MPT and Falcon models support the `context_length` parameter.

### <kbd>class</kbd> `AutoModelForCausalLM`

---

#### <kbd>classmethod</kbd> `AutoModelForCausalLM.from_pretrained`

```python
from_pretrained(
    model_path_or_repo_id: str,
    model_type: Optional[str] = None,
    model_file: Optional[str] = None,
    config: Optional[ctransformers.hub.AutoConfig] = None,
    lib: Optional[str] = None,
    local_files_only: bool = False,
    revision: Optional[str] = None,
    hf: bool = False,
    **kwargs
) → LLM
```

Loads the language model from a local file or remote repo.

**Args:**

- <b>`model_path_or_repo_id`</b>: The path to a model file or directory or the name of a Hugging Face Hub model repo.
- <b>`model_type`</b>: The model type.
- <b>`model_file`</b>: The name of the model file in repo or directory.
- <b>`config`</b>: `AutoConfig` object.
- <b>`lib`</b>: The path to a shared library or one of `avx2`, `avx`, `basic`.
- <b>`local_files_only`</b>: Whether or not to only look at local files (i.e., do not try to download the model).
- <b>`revision`</b>: The specific model version to use. It can be a branch name, a tag name, or a commit id.
- <b>`hf`</b>: Whether to create a Hugging Face Transformers model.

**Returns:**
`LLM` object.

### <kbd>class</kbd> `LLM`

### <kbd>method</kbd> `LLM.__init__`

```python
__init__(
    model_path: str,
    model_type: Optional[str] = None,
    config: Optional[ctransformers.llm.Config] = None,
    lib: Optional[str] = None
)
```

Loads the language model from a local file.

**Args:**

- <b>`model_path`</b>: The path to a model file.
- <b>`model_type`</b>: The model type.
- <b>`config`</b>: `Config` object.
- <b>`lib`</b>: The path to a shared library or one of `avx2`, `avx`, `basic`.

---

##### <kbd>property</kbd> LLM.bos_token_id

The beginning-of-sequence token.

---

##### <kbd>property</kbd> LLM.config

The config object.

---

##### <kbd>property</kbd> LLM.context_length

The context length of model.

---

##### <kbd>property</kbd> LLM.embeddings

The input embeddings.

---

##### <kbd>property</kbd> LLM.eos_token_id

The end-of-sequence token.

---

##### <kbd>property</kbd> LLM.logits

The unnormalized log probabilities.

---

##### <kbd>property</kbd> LLM.model_path

The path to the model file.

---

##### <kbd>property</kbd> LLM.model_type

The model type.

---

##### <kbd>property</kbd> LLM.pad_token_id

The padding token.

---

##### <kbd>property</kbd> LLM.vocab_size

The number of tokens in vocabulary.

---

#### <kbd>method</kbd> `LLM.detokenize`

```python
detokenize(tokens: Sequence[int], decode: bool = True) → Union[str, bytes]
```

Converts a list of tokens to text.

**Args:**

- <b>`tokens`</b>: The list of tokens.
- <b>`decode`</b>: Whether to decode the text as UTF-8 string.

**Returns:**
The combined text of all tokens.

---

#### <kbd>method</kbd> `LLM.embed`

```python
embed(
    input: Union[str, Sequence[int]],
    batch_size: Optional[int] = None,
    threads: Optional[int] = None
) → List[float]
```

Computes embeddings for a text or list of tokens.

> **Note:** Currently only LLaMA and Falcon models support embeddings.

**Args:**

- <b>`input`</b>: The input text or list of tokens to get embeddings for.
- <b>`batch_size`</b>: The batch size to use for evaluating tokens in a single prompt. Default: `8`
- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`

**Returns:**
The input embeddings.

---

#### <kbd>method</kbd> `LLM.eval`

```python
eval(
    tokens: Sequence[int],
    batch_size: Optional[int] = None,
    threads: Optional[int] = None
) → None
```

Evaluates a list of tokens.

**Args:**

- <b>`tokens`</b>: The list of tokens to evaluate.
- <b>`batch_size`</b>: The batch size to use for evaluating tokens in a single prompt. Default: `8`
- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`

---

#### <kbd>method</kbd> `LLM.generate`

```python
generate(
    tokens: Sequence[int],
    top_k: Optional[int] = None,
    top_p: Optional[float] = None,
    temperature: Optional[float] = None,
    repetition_penalty: Optional[float] = None,
    last_n_tokens: Optional[int] = None,
    seed: Optional[int] = None,
    batch_size: Optional[int] = None,
    threads: Optional[int] = None,
    reset: Optional[bool] = None
) → Generator[int, NoneType, NoneType]
```

Generates new tokens from a list of tokens.

**Args:**

- <b>`tokens`</b>: The list of tokens to generate tokens from.
- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`
- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`
- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`
- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`
- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`
- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`
- <b>`batch_size`</b>: The batch size to use for evaluating tokens in a single prompt. Default: `8`
- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`
- <b>`reset`</b>: Whether to reset the model state before generating text. Default: `True`

**Returns:**
The generated tokens.

---

#### <kbd>method</kbd> `LLM.is_eos_token`

```python
is_eos_token(token: int) → bool
```

Checks if a token is an end-of-sequence token.

**Args:**

- <b>`token`</b>: The token to check.

**Returns:**
`True` if the token is an end-of-sequence token else `False`.

---

#### <kbd>method</kbd> `LLM.prepare_inputs_for_generation`

```python
prepare_inputs_for_generation(
    tokens: Sequence[int],
    reset: Optional[bool] = None
) → Sequence[int]
```

Removes input tokens that are evaluated in the past and updates the LLM context.

**Args:**

- <b>`tokens`</b>: The list of input tokens.
- <b>`reset`</b>: Whether to reset the model state before generating text. Default: `True`

**Returns:**
The list of tokens to evaluate.

---

#### <kbd>method</kbd> `LLM.reset`

```python
reset() → None
```

Deprecated since 0.2.27.

---

#### <kbd>method</kbd> `LLM.sample`

```python
sample(
    top_k: Optional[int] = None,
    top_p: Optional[float] = None,
    temperature: Optional[float] = None,
    repetition_penalty: Optional[float] = None,
    last_n_tokens: Optional[int] = None,
    seed: Optional[int] = None
) → int
```

Samples a token from the model.

**Args:**

- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`
- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`
- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`
- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`
- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`
- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`

**Returns:**
The sampled token.

---

#### <kbd>method</kbd> `LLM.tokenize`

```python
tokenize(text: str, add_bos_token: Optional[bool] = None) → List[int]
```

Converts a text into list of tokens.

**Args:**

- <b>`text`</b>: The text to tokenize.
- <b>`add_bos_token`</b>: Whether to add the beginning-of-sequence token.

**Returns:**
The list of tokens.

---

#### <kbd>method</kbd> `LLM.__call__`

```python
__call__(
    prompt: str,
    max_new_tokens: Optional[int] = None,
    top_k: Optional[int] = None,
    top_p: Optional[float] = None,
    temperature: Optional[float] = None,
    repetition_penalty: Optional[float] = None,
    last_n_tokens: Optional[int] = None,
    seed: Optional[int] = None,
    batch_size: Optional[int] = None,
    threads: Optional[int] = None,
    stop: Optional[Sequence[str]] = None,
    stream: Optional[bool] = None,
    reset: Optional[bool] = None
) → Union[str, Generator[str, NoneType, NoneType]]
```

Generates text from a prompt.

**Args:**

- <b>`prompt`</b>: The prompt to generate text from.
- <b>`max_new_tokens`</b>: The maximum number of new tokens to generate. Default: `256`
- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`
- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`
- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`
- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`
- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`
- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`
- <b>`batch_size`</b>: The batch size to use for evaluating tokens in a single prompt. Default: `8`
- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`
- <b>`stop`</b>: A list of sequences to stop generation when encountered. Default: `None`
- <b>`stream`</b>: Whether to stream the generated text. Default: `False`
- <b>`reset`</b>: Whether to reset the model state before generating text. Default: `True`

**Returns:**
The generated text.

<!-- API_DOCS -->

## License

[MIT](https://github.com/marella/ctransformers/blob/main/LICENSE)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/marella/ctransformers",
    "name": "ctransformers",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "ctransformers transformers ai llm",
    "author": "Ravindra Marella",
    "author_email": "mv.ravindra007@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/40/5e/6ed7eaf8f54b5b078e2a609e90369c6999e67f915b9c1927c0d686c494f9/ctransformers-0.2.27.tar.gz",
    "platform": null,
    "description": "# [CTransformers](https://github.com/marella/ctransformers) [![PyPI](https://img.shields.io/pypi/v/ctransformers)](https://pypi.org/project/ctransformers/) [![tests](https://github.com/marella/ctransformers/actions/workflows/tests.yml/badge.svg)](https://github.com/marella/ctransformers/actions/workflows/tests.yml) [![build](https://github.com/marella/ctransformers/actions/workflows/build.yml/badge.svg)](https://github.com/marella/ctransformers/actions/workflows/build.yml)\n\nPython bindings for the Transformer models implemented in C/C++ using [GGML](https://github.com/ggerganov/ggml) library.\n\n> Also see [ChatDocs](https://github.com/marella/chatdocs)\n\n- [Supported Models](#supported-models)\n- [Installation](#installation)\n- [Usage](#usage)\n  - [\ud83e\udd17 Transformers](#transformers)\n  - [LangChain](#langchain)\n  - [GPU](#gpu)\n  - [GPTQ](#gptq)\n- [Documentation](#documentation)\n- [License](#license)\n\n## Supported Models\n\n| Models              | Model Type    | CUDA | Metal |\n| :------------------ | ------------- | :--: | :---: |\n| GPT-2               | `gpt2`        |      |       |\n| GPT-J, GPT4All-J    | `gptj`        |      |       |\n| GPT-NeoX, StableLM  | `gpt_neox`    |      |       |\n| Falcon              | `falcon`      |  \u2705  |       |\n| LLaMA, LLaMA 2      | `llama`       |  \u2705  |  \u2705   |\n| MPT                 | `mpt`         |  \u2705  |       |\n| StarCoder, StarChat | `gpt_bigcode` |  \u2705  |       |\n| Dolly V2            | `dolly-v2`    |      |       |\n| Replit              | `replit`      |      |       |\n\n## Installation\n\n```sh\npip install ctransformers\n```\n\n## Usage\n\nIt provides a unified interface for all models:\n\n```py\nfrom ctransformers import AutoModelForCausalLM\n\nllm = AutoModelForCausalLM.from_pretrained(\"/path/to/ggml-model.bin\", model_type=\"gpt2\")\n\nprint(llm(\"AI is going to\"))\n```\n\n[Run in Google Colab](https://colab.research.google.com/drive/1GMhYMUAv_TyZkpfvUI1NirM8-9mCXQyL)\n\nTo stream the output, set `stream=True`:\n\n```py\nfor text in llm(\"AI is going to\", stream=True):\n    print(text, end=\"\", flush=True)\n```\n\nYou can load models from Hugging Face Hub directly:\n\n```py\nllm = AutoModelForCausalLM.from_pretrained(\"marella/gpt-2-ggml\")\n```\n\nIf a model repo has multiple model files (`.bin` or `.gguf` files), specify a model file using:\n\n```py\nllm = AutoModelForCausalLM.from_pretrained(\"marella/gpt-2-ggml\", model_file=\"ggml-model.bin\")\n```\n\n<a id=\"transformers\"></a>\n\n### \ud83e\udd17 Transformers\n\n> **Note:** This is an experimental feature and may change in the future.\n\nTo use it with \ud83e\udd17 Transformers, create model and tokenizer using:\n\n```py\nfrom ctransformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel = AutoModelForCausalLM.from_pretrained(\"marella/gpt-2-ggml\", hf=True)\ntokenizer = AutoTokenizer.from_pretrained(model)\n```\n\n[Run in Google Colab](https://colab.research.google.com/drive/1FVSLfTJ2iBbQ1oU2Rqz0MkpJbaB_5Got)\n\nYou can use \ud83e\udd17 Transformers text generation pipeline:\n\n```py\nfrom transformers import pipeline\n\npipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\nprint(pipe(\"AI is going to\", max_new_tokens=256))\n```\n\nYou can use \ud83e\udd17 Transformers generation [parameters](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig):\n\n```py\npipe(\"AI is going to\", max_new_tokens=256, do_sample=True, temperature=0.8, repetition_penalty=1.1)\n```\n\nYou can use \ud83e\udd17 Transformers tokenizers:\n\n```py\nfrom ctransformers import AutoModelForCausalLM\nfrom transformers import AutoTokenizer\n\nmodel = AutoModelForCausalLM.from_pretrained(\"marella/gpt-2-ggml\", hf=True)  # Load model from GGML model repo.\ntokenizer = AutoTokenizer.from_pretrained(\"gpt2\")  # Load tokenizer from original model repo.\n```\n\n### LangChain\n\nIt is integrated into LangChain. See [LangChain docs](https://python.langchain.com/docs/ecosystem/integrations/ctransformers).\n\n### GPU\n\nTo run some of the model layers on GPU, set the `gpu_layers` parameter:\n\n```py\nllm = AutoModelForCausalLM.from_pretrained(\"TheBloke/Llama-2-7B-GGML\", gpu_layers=50)\n```\n\n[Run in Google Colab](https://colab.research.google.com/drive/1Ihn7iPCYiqlTotpkqa1tOhUIpJBrJ1Tp)\n\n#### CUDA\n\nInstall CUDA libraries using:\n\n```sh\npip install ctransformers[cuda]\n```\n\n#### ROCm\n\nTo enable ROCm support, install the `ctransformers` package using:\n\n```sh\nCT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers\n```\n\n#### Metal\n\nTo enable Metal support, install the `ctransformers` package using:\n\n```sh\nCT_METAL=1 pip install ctransformers --no-binary ctransformers\n```\n\n### GPTQ\n\n> **Note:** This is an experimental feature and only LLaMA models are supported using [ExLlama](https://github.com/turboderp/exllama).\n\nInstall additional dependencies using:\n\n```sh\npip install ctransformers[gptq]\n```\n\nLoad a GPTQ model using:\n\n```py\nllm = AutoModelForCausalLM.from_pretrained(\"TheBloke/Llama-2-7B-GPTQ\")\n```\n\n[Run in Google Colab](https://colab.research.google.com/drive/1SzHslJ4CiycMOgrppqecj4VYCWFnyrN0)\n\n> If model name or path doesn't contain the word `gptq` then specify `model_type=\"gptq\"`.\n\nIt can also be used with LangChain. Low-level APIs are not fully supported.\n\n## Documentation\n\n<!-- API_DOCS -->\n\n### Config\n\n| Parameter            | Type        | Description                                                     | Default |\n| :------------------- | :---------- | :-------------------------------------------------------------- | :------ |\n| `top_k`              | `int`       | The top-k value to use for sampling.                            | `40`    |\n| `top_p`              | `float`     | The top-p value to use for sampling.                            | `0.95`  |\n| `temperature`        | `float`     | The temperature to use for sampling.                            | `0.8`   |\n| `repetition_penalty` | `float`     | The repetition penalty to use for sampling.                     | `1.1`   |\n| `last_n_tokens`      | `int`       | The number of last tokens to use for repetition penalty.        | `64`    |\n| `seed`               | `int`       | The seed value to use for sampling tokens.                      | `-1`    |\n| `max_new_tokens`     | `int`       | The maximum number of new tokens to generate.                   | `256`   |\n| `stop`               | `List[str]` | A list of sequences to stop generation when encountered.        | `None`  |\n| `stream`             | `bool`      | Whether to stream the generated text.                           | `False` |\n| `reset`              | `bool`      | Whether to reset the model state before generating text.        | `True`  |\n| `batch_size`         | `int`       | The batch size to use for evaluating tokens in a single prompt. | `8`     |\n| `threads`            | `int`       | The number of threads to use for evaluating tokens.             | `-1`    |\n| `context_length`     | `int`       | The maximum context length to use.                              | `-1`    |\n| `gpu_layers`         | `int`       | The number of layers to run on GPU.                             | `0`     |\n\n> **Note:** Currently only LLaMA, MPT and Falcon models support the `context_length` parameter.\n\n### <kbd>class</kbd> `AutoModelForCausalLM`\n\n---\n\n#### <kbd>classmethod</kbd> `AutoModelForCausalLM.from_pretrained`\n\n```python\nfrom_pretrained(\n    model_path_or_repo_id: str,\n    model_type: Optional[str] = None,\n    model_file: Optional[str] = None,\n    config: Optional[ctransformers.hub.AutoConfig] = None,\n    lib: Optional[str] = None,\n    local_files_only: bool = False,\n    revision: Optional[str] = None,\n    hf: bool = False,\n    **kwargs\n) \u2192 LLM\n```\n\nLoads the language model from a local file or remote repo.\n\n**Args:**\n\n- <b>`model_path_or_repo_id`</b>: The path to a model file or directory or the name of a Hugging Face Hub model repo.\n- <b>`model_type`</b>: The model type.\n- <b>`model_file`</b>: The name of the model file in repo or directory.\n- <b>`config`</b>: `AutoConfig` object.\n- <b>`lib`</b>: The path to a shared library or one of `avx2`, `avx`, `basic`.\n- <b>`local_files_only`</b>: Whether or not to only look at local files (i.e., do not try to download the model).\n- <b>`revision`</b>: The specific model version to use. It can be a branch name, a tag name, or a commit id.\n- <b>`hf`</b>: Whether to create a Hugging Face Transformers model.\n\n**Returns:**\n`LLM` object.\n\n### <kbd>class</kbd> `LLM`\n\n### <kbd>method</kbd> `LLM.__init__`\n\n```python\n__init__(\n    model_path: str,\n    model_type: Optional[str] = None,\n    config: Optional[ctransformers.llm.Config] = None,\n    lib: Optional[str] = None\n)\n```\n\nLoads the language model from a local file.\n\n**Args:**\n\n- <b>`model_path`</b>: The path to a model file.\n- <b>`model_type`</b>: The model type.\n- <b>`config`</b>: `Config` object.\n- <b>`lib`</b>: The path to a shared library or one of `avx2`, `avx`, `basic`.\n\n---\n\n##### <kbd>property</kbd> LLM.bos_token_id\n\nThe beginning-of-sequence token.\n\n---\n\n##### <kbd>property</kbd> LLM.config\n\nThe config object.\n\n---\n\n##### <kbd>property</kbd> LLM.context_length\n\nThe context length of model.\n\n---\n\n##### <kbd>property</kbd> LLM.embeddings\n\nThe input embeddings.\n\n---\n\n##### <kbd>property</kbd> LLM.eos_token_id\n\nThe end-of-sequence token.\n\n---\n\n##### <kbd>property</kbd> LLM.logits\n\nThe unnormalized log probabilities.\n\n---\n\n##### <kbd>property</kbd> LLM.model_path\n\nThe path to the model file.\n\n---\n\n##### <kbd>property</kbd> LLM.model_type\n\nThe model type.\n\n---\n\n##### <kbd>property</kbd> LLM.pad_token_id\n\nThe padding token.\n\n---\n\n##### <kbd>property</kbd> LLM.vocab_size\n\nThe number of tokens in vocabulary.\n\n---\n\n#### <kbd>method</kbd> `LLM.detokenize`\n\n```python\ndetokenize(tokens: Sequence[int], decode: bool = True) \u2192 Union[str, bytes]\n```\n\nConverts a list of tokens to text.\n\n**Args:**\n\n- <b>`tokens`</b>: The list of tokens.\n- <b>`decode`</b>: Whether to decode the text as UTF-8 string.\n\n**Returns:**\nThe combined text of all tokens.\n\n---\n\n#### <kbd>method</kbd> `LLM.embed`\n\n```python\nembed(\n    input: Union[str, Sequence[int]],\n    batch_size: Optional[int] = None,\n    threads: Optional[int] = None\n) \u2192 List[float]\n```\n\nComputes embeddings for a text or list of tokens.\n\n> **Note:** Currently only LLaMA and Falcon models support embeddings.\n\n**Args:**\n\n- <b>`input`</b>: The input text or list of tokens to get embeddings for.\n- <b>`batch_size`</b>: The batch size to use for evaluating tokens in a single prompt. Default: `8`\n- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`\n\n**Returns:**\nThe input embeddings.\n\n---\n\n#### <kbd>method</kbd> `LLM.eval`\n\n```python\neval(\n    tokens: Sequence[int],\n    batch_size: Optional[int] = None,\n    threads: Optional[int] = None\n) \u2192 None\n```\n\nEvaluates a list of tokens.\n\n**Args:**\n\n- <b>`tokens`</b>: The list of tokens to evaluate.\n- <b>`batch_size`</b>: The batch size to use for evaluating tokens in a single prompt. Default: `8`\n- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`\n\n---\n\n#### <kbd>method</kbd> `LLM.generate`\n\n```python\ngenerate(\n    tokens: Sequence[int],\n    top_k: Optional[int] = None,\n    top_p: Optional[float] = None,\n    temperature: Optional[float] = None,\n    repetition_penalty: Optional[float] = None,\n    last_n_tokens: Optional[int] = None,\n    seed: Optional[int] = None,\n    batch_size: Optional[int] = None,\n    threads: Optional[int] = None,\n    reset: Optional[bool] = None\n) \u2192 Generator[int, NoneType, NoneType]\n```\n\nGenerates new tokens from a list of tokens.\n\n**Args:**\n\n- <b>`tokens`</b>: The list of tokens to generate tokens from.\n- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`\n- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`\n- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`\n- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`\n- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`\n- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`\n- <b>`batch_size`</b>: The batch size to use for evaluating tokens in a single prompt. Default: `8`\n- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`\n- <b>`reset`</b>: Whether to reset the model state before generating text. Default: `True`\n\n**Returns:**\nThe generated tokens.\n\n---\n\n#### <kbd>method</kbd> `LLM.is_eos_token`\n\n```python\nis_eos_token(token: int) \u2192 bool\n```\n\nChecks if a token is an end-of-sequence token.\n\n**Args:**\n\n- <b>`token`</b>: The token to check.\n\n**Returns:**\n`True` if the token is an end-of-sequence token else `False`.\n\n---\n\n#### <kbd>method</kbd> `LLM.prepare_inputs_for_generation`\n\n```python\nprepare_inputs_for_generation(\n    tokens: Sequence[int],\n    reset: Optional[bool] = None\n) \u2192 Sequence[int]\n```\n\nRemoves input tokens that are evaluated in the past and updates the LLM context.\n\n**Args:**\n\n- <b>`tokens`</b>: The list of input tokens.\n- <b>`reset`</b>: Whether to reset the model state before generating text. Default: `True`\n\n**Returns:**\nThe list of tokens to evaluate.\n\n---\n\n#### <kbd>method</kbd> `LLM.reset`\n\n```python\nreset() \u2192 None\n```\n\nDeprecated since 0.2.27.\n\n---\n\n#### <kbd>method</kbd> `LLM.sample`\n\n```python\nsample(\n    top_k: Optional[int] = None,\n    top_p: Optional[float] = None,\n    temperature: Optional[float] = None,\n    repetition_penalty: Optional[float] = None,\n    last_n_tokens: Optional[int] = None,\n    seed: Optional[int] = None\n) \u2192 int\n```\n\nSamples a token from the model.\n\n**Args:**\n\n- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`\n- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`\n- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`\n- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`\n- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`\n- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`\n\n**Returns:**\nThe sampled token.\n\n---\n\n#### <kbd>method</kbd> `LLM.tokenize`\n\n```python\ntokenize(text: str, add_bos_token: Optional[bool] = None) \u2192 List[int]\n```\n\nConverts a text into list of tokens.\n\n**Args:**\n\n- <b>`text`</b>: The text to tokenize.\n- <b>`add_bos_token`</b>: Whether to add the beginning-of-sequence token.\n\n**Returns:**\nThe list of tokens.\n\n---\n\n#### <kbd>method</kbd> `LLM.__call__`\n\n```python\n__call__(\n    prompt: str,\n    max_new_tokens: Optional[int] = None,\n    top_k: Optional[int] = None,\n    top_p: Optional[float] = None,\n    temperature: Optional[float] = None,\n    repetition_penalty: Optional[float] = None,\n    last_n_tokens: Optional[int] = None,\n    seed: Optional[int] = None,\n    batch_size: Optional[int] = None,\n    threads: Optional[int] = None,\n    stop: Optional[Sequence[str]] = None,\n    stream: Optional[bool] = None,\n    reset: Optional[bool] = None\n) \u2192 Union[str, Generator[str, NoneType, NoneType]]\n```\n\nGenerates text from a prompt.\n\n**Args:**\n\n- <b>`prompt`</b>: The prompt to generate text from.\n- <b>`max_new_tokens`</b>: The maximum number of new tokens to generate. Default: `256`\n- <b>`top_k`</b>: The top-k value to use for sampling. Default: `40`\n- <b>`top_p`</b>: The top-p value to use for sampling. Default: `0.95`\n- <b>`temperature`</b>: The temperature to use for sampling. Default: `0.8`\n- <b>`repetition_penalty`</b>: The repetition penalty to use for sampling. Default: `1.1`\n- <b>`last_n_tokens`</b>: The number of last tokens to use for repetition penalty. Default: `64`\n- <b>`seed`</b>: The seed value to use for sampling tokens. Default: `-1`\n- <b>`batch_size`</b>: The batch size to use for evaluating tokens in a single prompt. Default: `8`\n- <b>`threads`</b>: The number of threads to use for evaluating tokens. Default: `-1`\n- <b>`stop`</b>: A list of sequences to stop generation when encountered. Default: `None`\n- <b>`stream`</b>: Whether to stream the generated text. Default: `False`\n- <b>`reset`</b>: Whether to reset the model state before generating text. Default: `True`\n\n**Returns:**\nThe generated text.\n\n<!-- API_DOCS -->\n\n## License\n\n[MIT](https://github.com/marella/ctransformers/blob/main/LICENSE)\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python bindings for the Transformer models implemented in C/C++ using GGML library.",
    "version": "0.2.27",
    "project_urls": {
        "Homepage": "https://github.com/marella/ctransformers"
    },
    "split_keywords": [
        "ctransformers",
        "transformers",
        "ai",
        "llm"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "14500b608e2abee4fc695b4e7ff5f569f5d32faf84a49e322034716fa157d1cf",
                "md5": "68dda6ec52e5b50b5f58f5b69994efe8",
                "sha256": "6a3ba47556471850d95fdbc59299a82ab91c9dc8b40201c5e7e82d71360772d9"
            },
            "downloads": -1,
            "filename": "ctransformers-0.2.27-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "68dda6ec52e5b50b5f58f5b69994efe8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9853506,
            "upload_time": "2023-09-10T15:18:58",
            "upload_time_iso_8601": "2023-09-10T15:18:58.741637Z",
            "url": "https://files.pythonhosted.org/packages/14/50/0b608e2abee4fc695b4e7ff5f569f5d32faf84a49e322034716fa157d1cf/ctransformers-0.2.27-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "405e6ed7eaf8f54b5b078e2a609e90369c6999e67f915b9c1927c0d686c494f9",
                "md5": "cd03d7b8678b2aa020715d2b736da7fc",
                "sha256": "25653d4be8a5ed4e2d3756544c1e9881bf95404be5371c3ed506a256c28663d5"
            },
            "downloads": -1,
            "filename": "ctransformers-0.2.27.tar.gz",
            "has_sig": false,
            "md5_digest": "cd03d7b8678b2aa020715d2b736da7fc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 376065,
            "upload_time": "2023-09-10T15:19:14",
            "upload_time_iso_8601": "2023-09-10T15:19:14.990008Z",
            "url": "https://files.pythonhosted.org/packages/40/5e/6ed7eaf8f54b5b078e2a609e90369c6999e67f915b9c1927c0d686c494f9/ctransformers-0.2.27.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-10 15:19:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "marella",
    "github_project": "ctransformers",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "ctransformers"
}

Ravindra Marella