simple-falcon


Namesimple-falcon JSON
Version 0.0.7 PyPI version JSON
download
home_pagehttps://github.com/kyegomez/Falcon
SummaryFalcon - Pytorch
upload_time2023-09-06 17:52:04
maintainer
docs_urlNone
authorKye Gomez
requires_python>=3.6,<4.0
licenseMIT
keywords artificial intelligence deep learning optimizers prompt engineering
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Simple Falcon
A simple package for leveraging Falcon 180B and the HF ecosystem's tools, including training/inference scripts, safetensors, integrations with bitsandbytes, PEFT, GPTQ, assisted generation, RoPE scaling support, and rich generation parameters.


## Installation

You can install the package using pip

```bash
pip3 install simple-falcon
```
---

# Usage

```python
from falcon.main import Falcon


falcon = Falcon(
    temperature=0.5, 
    top_p=0.9, 
    max_new_tokens=500,
    quantized=True,
    system_prompt=""
)

prompt = "What is the meaning of the collapse of the wave function?"

result = falcon.run(prompt=prompt)
print(result)
```

# Documentation

The Falcon class provides a convenient interface for conversational agents based on the transformers architecture. It facilitates both single-turn and multi-turn conversations with pre-trained models and allows users to customize certain inference settings such as `temperature`, `top_p`, and token generation limits. Furthermore, it can leverage quantized models for faster performance.

### Purpose

The main purpose of the Falcon class is to:
- Make it easy to initiate and run generative language models.
- Provide efficient conversation interfaces with customization.
- Support both regular and quantized models for better performance.
- Manage conversational history in multi-turn scenarios.

### Class Definition:

```python
class Falcon:
    def __init__(
        self,
        *,
        model_id: str = "tiiuae/falcon-180B",
        temperature: float = None,
        top_p: float = None,
        max_new_tokens: int = None,
        quantized: bool = False,
        system_prompt: str = None
    ):
```

#### Parameters:

- **model_id (str)**: Model identifier from the HuggingFace Model Hub. Default is "tiiuae/falcon-180B".
  
- **temperature (float, optional)**: Controls randomness in the Boltzmann distribution of model predictions. Higher values result in more randomness.
  
- **top_p (float, optional)**: Nucleus sampling: Restricts sampling to the top tokens summing up to this cumulative probability.
  
- **max_new_tokens (int, optional)**: Maximum number of tokens that can be generated in a single inference call.
  
- **quantized (bool)**: If set to `True`, the model loads in 8-bit quantized mode. Default is `False`.
  
- **system_prompt (str, optional)**: Initial system prompt to set the context for the conversation.

### Method Descriptions:

#### 1. run:

```python
def run(self, prompt: str) -> None:
```

Generates a response based on the provided prompt.

**Parameters**:
- **prompt (str)**: Input string to which the model responds.

**Returns**: None. The response is printed to the console.

#### 2. chat:

```python
def chat(self, message: str, history: list[tuple[str, str]], system_prompt: str = None) -> None:
```

Generates a response considering the conversation history.

**Parameters**:
- **message (str)**: User's current message to which the model will respond.
  
- **history (list[tuple[str, str]])**: Conversation history as a list of tuples. Each tuple consists of the user's prompt and the Falcon's response.
  
- **system_prompt (str, optional)**: Initial system prompt to set the context for the conversation.

**Returns**: None. The response is printed to the console.

### Usage Examples:

#### 1. Single-turn conversation:

```python
from falcon import Falcon
import torch

model = Falcon(temperature=0.8)
model.run("What is the capital of France?")
```

#### 2. Multi-turn conversation with history:

```python
from falcon import Falcon
import torch

model = Falcon(system_prompt="Conversational Assistant")
history = [
    ("Hi there!", "Hello! How can I assist you?"),
    ("What's the weather like?", "Sorry, I can't fetch real-time data, but I can provide general info.")
]
model.chat("Tell me a joke.", history)
```

#### 3. Using quantized models:

```python
from falcon import Falcon
import torch

model = Falcon(quantized=True)
model.run("Tell me about quantum computing.")
```

### Mathematical Representation:

The Falcon class essentially leverages the transformer-based generative language model for text generation. The mathematical process can be generalized as:

Given an input sequence \( x = [x_1, x_2, ... , x_n] \), the model predicts the next token \( x_{n+1} \) by:

\[ x_{n+1} = \arg \max P(x_i | x_1, x_2, ... , x_n) \]

Where:
- \( P \) is the probability distribution over the vocabulary generated by the model.
- The argmax operation selects the token with the highest probability.

### Additional Information:

- For best performance, it's recommended to use the Falcon class with CUDA-enabled devices. Ensure that your PyTorch setup supports CUDA.
  
- The Falcon class uses models from the HuggingFace model hub. Ensure you have an active internet connection during the first run as models will be downloaded.
  
- If memory issues arise, consider reducing the `max_new_tokens` parameter or using quantized models.

---

# License
MIT




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kyegomez/Falcon",
    "name": "simple-falcon",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<4.0",
    "maintainer_email": "",
    "keywords": "artificial intelligence,deep learning,optimizers,Prompt Engineering",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/47/a1/19bb6324dd6ec40a6e8042890d2c242c96b1bc8915126c2bc3e12cb2fa2c/simple_falcon-0.0.7.tar.gz",
    "platform": null,
    "description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# Simple Falcon\nA simple package for leveraging Falcon 180B and the HF ecosystem's tools, including training/inference scripts, safetensors, integrations with bitsandbytes, PEFT, GPTQ, assisted generation, RoPE scaling support, and rich generation parameters.\n\n\n## Installation\n\nYou can install the package using pip\n\n```bash\npip3 install simple-falcon\n```\n---\n\n# Usage\n\n```python\nfrom falcon.main import Falcon\n\n\nfalcon = Falcon(\n    temperature=0.5, \n    top_p=0.9, \n    max_new_tokens=500,\n    quantized=True,\n    system_prompt=\"\"\n)\n\nprompt = \"What is the meaning of the collapse of the wave function?\"\n\nresult = falcon.run(prompt=prompt)\nprint(result)\n```\n\n# Documentation\n\nThe Falcon class provides a convenient interface for conversational agents based on the transformers architecture. It facilitates both single-turn and multi-turn conversations with pre-trained models and allows users to customize certain inference settings such as `temperature`, `top_p`, and token generation limits. Furthermore, it can leverage quantized models for faster performance.\n\n### Purpose\n\nThe main purpose of the Falcon class is to:\n- Make it easy to initiate and run generative language models.\n- Provide efficient conversation interfaces with customization.\n- Support both regular and quantized models for better performance.\n- Manage conversational history in multi-turn scenarios.\n\n### Class Definition:\n\n```python\nclass Falcon:\n    def __init__(\n        self,\n        *,\n        model_id: str = \"tiiuae/falcon-180B\",\n        temperature: float = None,\n        top_p: float = None,\n        max_new_tokens: int = None,\n        quantized: bool = False,\n        system_prompt: str = None\n    ):\n```\n\n#### Parameters:\n\n- **model_id (str)**: Model identifier from the HuggingFace Model Hub. Default is \"tiiuae/falcon-180B\".\n  \n- **temperature (float, optional)**: Controls randomness in the Boltzmann distribution of model predictions. Higher values result in more randomness.\n  \n- **top_p (float, optional)**: Nucleus sampling: Restricts sampling to the top tokens summing up to this cumulative probability.\n  \n- **max_new_tokens (int, optional)**: Maximum number of tokens that can be generated in a single inference call.\n  \n- **quantized (bool)**: If set to `True`, the model loads in 8-bit quantized mode. Default is `False`.\n  \n- **system_prompt (str, optional)**: Initial system prompt to set the context for the conversation.\n\n### Method Descriptions:\n\n#### 1. run:\n\n```python\ndef run(self, prompt: str) -> None:\n```\n\nGenerates a response based on the provided prompt.\n\n**Parameters**:\n- **prompt (str)**: Input string to which the model responds.\n\n**Returns**: None. The response is printed to the console.\n\n#### 2. chat:\n\n```python\ndef chat(self, message: str, history: list[tuple[str, str]], system_prompt: str = None) -> None:\n```\n\nGenerates a response considering the conversation history.\n\n**Parameters**:\n- **message (str)**: User's current message to which the model will respond.\n  \n- **history (list[tuple[str, str]])**: Conversation history as a list of tuples. Each tuple consists of the user's prompt and the Falcon's response.\n  \n- **system_prompt (str, optional)**: Initial system prompt to set the context for the conversation.\n\n**Returns**: None. The response is printed to the console.\n\n### Usage Examples:\n\n#### 1. Single-turn conversation:\n\n```python\nfrom falcon import Falcon\nimport torch\n\nmodel = Falcon(temperature=0.8)\nmodel.run(\"What is the capital of France?\")\n```\n\n#### 2. Multi-turn conversation with history:\n\n```python\nfrom falcon import Falcon\nimport torch\n\nmodel = Falcon(system_prompt=\"Conversational Assistant\")\nhistory = [\n    (\"Hi there!\", \"Hello! How can I assist you?\"),\n    (\"What's the weather like?\", \"Sorry, I can't fetch real-time data, but I can provide general info.\")\n]\nmodel.chat(\"Tell me a joke.\", history)\n```\n\n#### 3. Using quantized models:\n\n```python\nfrom falcon import Falcon\nimport torch\n\nmodel = Falcon(quantized=True)\nmodel.run(\"Tell me about quantum computing.\")\n```\n\n### Mathematical Representation:\n\nThe Falcon class essentially leverages the transformer-based generative language model for text generation. The mathematical process can be generalized as:\n\nGiven an input sequence \\( x = [x_1, x_2, ... , x_n] \\), the model predicts the next token \\( x_{n+1} \\) by:\n\n\\[ x_{n+1} = \\arg \\max P(x_i | x_1, x_2, ... , x_n) \\]\n\nWhere:\n- \\( P \\) is the probability distribution over the vocabulary generated by the model.\n- The argmax operation selects the token with the highest probability.\n\n### Additional Information:\n\n- For best performance, it's recommended to use the Falcon class with CUDA-enabled devices. Ensure that your PyTorch setup supports CUDA.\n  \n- The Falcon class uses models from the HuggingFace model hub. Ensure you have an active internet connection during the first run as models will be downloaded.\n  \n- If memory issues arise, consider reducing the `max_new_tokens` parameter or using quantized models.\n\n---\n\n# License\nMIT\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Falcon - Pytorch",
    "version": "0.0.7",
    "project_urls": {
        "Homepage": "https://github.com/kyegomez/Falcon",
        "Repository": "https://github.com/kyegomez/Falcon"
    },
    "split_keywords": [
        "artificial intelligence",
        "deep learning",
        "optimizers",
        "prompt engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "696a29e5cb2877367a8ca21a856dfe4a606937f3c4b95bcd27949d6a6212fa9a",
                "md5": "7ed1101a62d34a4b7e5ec6151c294e3e",
                "sha256": "bbcbfeb175ce8612949f96e2117da4586879038d21f8b2a264313d2d3c36c686"
            },
            "downloads": -1,
            "filename": "simple_falcon-0.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7ed1101a62d34a4b7e5ec6151c294e3e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6,<4.0",
            "size": 5056,
            "upload_time": "2023-09-06T17:52:02",
            "upload_time_iso_8601": "2023-09-06T17:52:02.857552Z",
            "url": "https://files.pythonhosted.org/packages/69/6a/29e5cb2877367a8ca21a856dfe4a606937f3c4b95bcd27949d6a6212fa9a/simple_falcon-0.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "47a119bb6324dd6ec40a6e8042890d2c242c96b1bc8915126c2bc3e12cb2fa2c",
                "md5": "76f3e7af8a394bc2ef973d42c9ee8f67",
                "sha256": "9505b4c2f9d29b406512d9ed47a677d77bfcb020fb2b8f10163969f450e99718"
            },
            "downloads": -1,
            "filename": "simple_falcon-0.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "76f3e7af8a394bc2ef973d42c9ee8f67",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6,<4.0",
            "size": 5332,
            "upload_time": "2023-09-06T17:52:04",
            "upload_time_iso_8601": "2023-09-06T17:52:04.116873Z",
            "url": "https://files.pythonhosted.org/packages/47/a1/19bb6324dd6ec40a6e8042890d2c242c96b1bc8915126c2bc3e12cb2fa2c/simple_falcon-0.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-06 17:52:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kyegomez",
    "github_project": "Falcon",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "simple-falcon"
}
        
Elapsed time: 0.14627s