[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)
# Simple Falcon
A simple package for leveraging Falcon 180B and the HF ecosystem's tools, including training/inference scripts, safetensors, integrations with bitsandbytes, PEFT, GPTQ, assisted generation, RoPE scaling support, and rich generation parameters.
## Installation
You can install the package using pip
```bash
pip3 install simple-falcon
```
---
# Usage
```python
from falcon.main import Falcon
falcon = Falcon(
temperature=0.5,
top_p=0.9,
max_new_tokens=500,
quantized=True,
system_prompt=""
)
prompt = "What is the meaning of the collapse of the wave function?"
result = falcon.run(prompt=prompt)
print(result)
```
# Documentation
The Falcon class provides a convenient interface for conversational agents based on the transformers architecture. It facilitates both single-turn and multi-turn conversations with pre-trained models and allows users to customize certain inference settings such as `temperature`, `top_p`, and token generation limits. Furthermore, it can leverage quantized models for faster performance.
### Purpose
The main purpose of the Falcon class is to:
- Make it easy to initiate and run generative language models.
- Provide efficient conversation interfaces with customization.
- Support both regular and quantized models for better performance.
- Manage conversational history in multi-turn scenarios.
### Class Definition:
```python
class Falcon:
def __init__(
self,
*,
model_id: str = "tiiuae/falcon-180B",
temperature: float = None,
top_p: float = None,
max_new_tokens: int = None,
quantized: bool = False,
system_prompt: str = None
):
```
#### Parameters:
- **model_id (str)**: Model identifier from the HuggingFace Model Hub. Default is "tiiuae/falcon-180B".
- **temperature (float, optional)**: Controls randomness in the Boltzmann distribution of model predictions. Higher values result in more randomness.
- **top_p (float, optional)**: Nucleus sampling: Restricts sampling to the top tokens summing up to this cumulative probability.
- **max_new_tokens (int, optional)**: Maximum number of tokens that can be generated in a single inference call.
- **quantized (bool)**: If set to `True`, the model loads in 8-bit quantized mode. Default is `False`.
- **system_prompt (str, optional)**: Initial system prompt to set the context for the conversation.
### Method Descriptions:
#### 1. run:
```python
def run(self, prompt: str) -> None:
```
Generates a response based on the provided prompt.
**Parameters**:
- **prompt (str)**: Input string to which the model responds.
**Returns**: None. The response is printed to the console.
#### 2. chat:
```python
def chat(self, message: str, history: list[tuple[str, str]], system_prompt: str = None) -> None:
```
Generates a response considering the conversation history.
**Parameters**:
- **message (str)**: User's current message to which the model will respond.
- **history (list[tuple[str, str]])**: Conversation history as a list of tuples. Each tuple consists of the user's prompt and the Falcon's response.
- **system_prompt (str, optional)**: Initial system prompt to set the context for the conversation.
**Returns**: None. The response is printed to the console.
### Usage Examples:
#### 1. Single-turn conversation:
```python
from falcon import Falcon
import torch
model = Falcon(temperature=0.8)
model.run("What is the capital of France?")
```
#### 2. Multi-turn conversation with history:
```python
from falcon import Falcon
import torch
model = Falcon(system_prompt="Conversational Assistant")
history = [
("Hi there!", "Hello! How can I assist you?"),
("What's the weather like?", "Sorry, I can't fetch real-time data, but I can provide general info.")
]
model.chat("Tell me a joke.", history)
```
#### 3. Using quantized models:
```python
from falcon import Falcon
import torch
model = Falcon(quantized=True)
model.run("Tell me about quantum computing.")
```
### Mathematical Representation:
The Falcon class essentially leverages the transformer-based generative language model for text generation. The mathematical process can be generalized as:
Given an input sequence \( x = [x_1, x_2, ... , x_n] \), the model predicts the next token \( x_{n+1} \) by:
\[ x_{n+1} = \arg \max P(x_i | x_1, x_2, ... , x_n) \]
Where:
- \( P \) is the probability distribution over the vocabulary generated by the model.
- The argmax operation selects the token with the highest probability.
### Additional Information:
- For best performance, it's recommended to use the Falcon class with CUDA-enabled devices. Ensure that your PyTorch setup supports CUDA.
- The Falcon class uses models from the HuggingFace model hub. Ensure you have an active internet connection during the first run as models will be downloaded.
- If memory issues arise, consider reducing the `max_new_tokens` parameter or using quantized models.
---
# License
MIT
Raw data
{
"_id": null,
"home_page": "https://github.com/kyegomez/Falcon",
"name": "simple-falcon",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6,<4.0",
"maintainer_email": "",
"keywords": "artificial intelligence,deep learning,optimizers,Prompt Engineering",
"author": "Kye Gomez",
"author_email": "kye@apac.ai",
"download_url": "https://files.pythonhosted.org/packages/47/a1/19bb6324dd6ec40a6e8042890d2c242c96b1bc8915126c2bc3e12cb2fa2c/simple_falcon-0.0.7.tar.gz",
"platform": null,
"description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# Simple Falcon\nA simple package for leveraging Falcon 180B and the HF ecosystem's tools, including training/inference scripts, safetensors, integrations with bitsandbytes, PEFT, GPTQ, assisted generation, RoPE scaling support, and rich generation parameters.\n\n\n## Installation\n\nYou can install the package using pip\n\n```bash\npip3 install simple-falcon\n```\n---\n\n# Usage\n\n```python\nfrom falcon.main import Falcon\n\n\nfalcon = Falcon(\n temperature=0.5, \n top_p=0.9, \n max_new_tokens=500,\n quantized=True,\n system_prompt=\"\"\n)\n\nprompt = \"What is the meaning of the collapse of the wave function?\"\n\nresult = falcon.run(prompt=prompt)\nprint(result)\n```\n\n# Documentation\n\nThe Falcon class provides a convenient interface for conversational agents based on the transformers architecture. It facilitates both single-turn and multi-turn conversations with pre-trained models and allows users to customize certain inference settings such as `temperature`, `top_p`, and token generation limits. Furthermore, it can leverage quantized models for faster performance.\n\n### Purpose\n\nThe main purpose of the Falcon class is to:\n- Make it easy to initiate and run generative language models.\n- Provide efficient conversation interfaces with customization.\n- Support both regular and quantized models for better performance.\n- Manage conversational history in multi-turn scenarios.\n\n### Class Definition:\n\n```python\nclass Falcon:\n def __init__(\n self,\n *,\n model_id: str = \"tiiuae/falcon-180B\",\n temperature: float = None,\n top_p: float = None,\n max_new_tokens: int = None,\n quantized: bool = False,\n system_prompt: str = None\n ):\n```\n\n#### Parameters:\n\n- **model_id (str)**: Model identifier from the HuggingFace Model Hub. Default is \"tiiuae/falcon-180B\".\n \n- **temperature (float, optional)**: Controls randomness in the Boltzmann distribution of model predictions. Higher values result in more randomness.\n \n- **top_p (float, optional)**: Nucleus sampling: Restricts sampling to the top tokens summing up to this cumulative probability.\n \n- **max_new_tokens (int, optional)**: Maximum number of tokens that can be generated in a single inference call.\n \n- **quantized (bool)**: If set to `True`, the model loads in 8-bit quantized mode. Default is `False`.\n \n- **system_prompt (str, optional)**: Initial system prompt to set the context for the conversation.\n\n### Method Descriptions:\n\n#### 1. run:\n\n```python\ndef run(self, prompt: str) -> None:\n```\n\nGenerates a response based on the provided prompt.\n\n**Parameters**:\n- **prompt (str)**: Input string to which the model responds.\n\n**Returns**: None. The response is printed to the console.\n\n#### 2. chat:\n\n```python\ndef chat(self, message: str, history: list[tuple[str, str]], system_prompt: str = None) -> None:\n```\n\nGenerates a response considering the conversation history.\n\n**Parameters**:\n- **message (str)**: User's current message to which the model will respond.\n \n- **history (list[tuple[str, str]])**: Conversation history as a list of tuples. Each tuple consists of the user's prompt and the Falcon's response.\n \n- **system_prompt (str, optional)**: Initial system prompt to set the context for the conversation.\n\n**Returns**: None. The response is printed to the console.\n\n### Usage Examples:\n\n#### 1. Single-turn conversation:\n\n```python\nfrom falcon import Falcon\nimport torch\n\nmodel = Falcon(temperature=0.8)\nmodel.run(\"What is the capital of France?\")\n```\n\n#### 2. Multi-turn conversation with history:\n\n```python\nfrom falcon import Falcon\nimport torch\n\nmodel = Falcon(system_prompt=\"Conversational Assistant\")\nhistory = [\n (\"Hi there!\", \"Hello! How can I assist you?\"),\n (\"What's the weather like?\", \"Sorry, I can't fetch real-time data, but I can provide general info.\")\n]\nmodel.chat(\"Tell me a joke.\", history)\n```\n\n#### 3. Using quantized models:\n\n```python\nfrom falcon import Falcon\nimport torch\n\nmodel = Falcon(quantized=True)\nmodel.run(\"Tell me about quantum computing.\")\n```\n\n### Mathematical Representation:\n\nThe Falcon class essentially leverages the transformer-based generative language model for text generation. The mathematical process can be generalized as:\n\nGiven an input sequence \\( x = [x_1, x_2, ... , x_n] \\), the model predicts the next token \\( x_{n+1} \\) by:\n\n\\[ x_{n+1} = \\arg \\max P(x_i | x_1, x_2, ... , x_n) \\]\n\nWhere:\n- \\( P \\) is the probability distribution over the vocabulary generated by the model.\n- The argmax operation selects the token with the highest probability.\n\n### Additional Information:\n\n- For best performance, it's recommended to use the Falcon class with CUDA-enabled devices. Ensure that your PyTorch setup supports CUDA.\n \n- The Falcon class uses models from the HuggingFace model hub. Ensure you have an active internet connection during the first run as models will be downloaded.\n \n- If memory issues arise, consider reducing the `max_new_tokens` parameter or using quantized models.\n\n---\n\n# License\nMIT\n\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Falcon - Pytorch",
"version": "0.0.7",
"project_urls": {
"Homepage": "https://github.com/kyegomez/Falcon",
"Repository": "https://github.com/kyegomez/Falcon"
},
"split_keywords": [
"artificial intelligence",
"deep learning",
"optimizers",
"prompt engineering"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "696a29e5cb2877367a8ca21a856dfe4a606937f3c4b95bcd27949d6a6212fa9a",
"md5": "7ed1101a62d34a4b7e5ec6151c294e3e",
"sha256": "bbcbfeb175ce8612949f96e2117da4586879038d21f8b2a264313d2d3c36c686"
},
"downloads": -1,
"filename": "simple_falcon-0.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7ed1101a62d34a4b7e5ec6151c294e3e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6,<4.0",
"size": 5056,
"upload_time": "2023-09-06T17:52:02",
"upload_time_iso_8601": "2023-09-06T17:52:02.857552Z",
"url": "https://files.pythonhosted.org/packages/69/6a/29e5cb2877367a8ca21a856dfe4a606937f3c4b95bcd27949d6a6212fa9a/simple_falcon-0.0.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "47a119bb6324dd6ec40a6e8042890d2c242c96b1bc8915126c2bc3e12cb2fa2c",
"md5": "76f3e7af8a394bc2ef973d42c9ee8f67",
"sha256": "9505b4c2f9d29b406512d9ed47a677d77bfcb020fb2b8f10163969f450e99718"
},
"downloads": -1,
"filename": "simple_falcon-0.0.7.tar.gz",
"has_sig": false,
"md5_digest": "76f3e7af8a394bc2ef973d42c9ee8f67",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6,<4.0",
"size": 5332,
"upload_time": "2023-09-06T17:52:04",
"upload_time_iso_8601": "2023-09-06T17:52:04.116873Z",
"url": "https://files.pythonhosted.org/packages/47/a1/19bb6324dd6ec40a6e8042890d2c242c96b1bc8915126c2bc3e12cb2fa2c/simple_falcon-0.0.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-06 17:52:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kyegomez",
"github_project": "Falcon",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "simple-falcon"
}