LangTorch

Name	LangTorch JSON
Version	1.0.8 JSON
	download
home_page	https://github.com/AdamSobieszek/LangTorch
Summary	Framework for intuitive LLM application development with tensors.
upload_time	2024-06-27 13:51:49
maintainer	None
docs_url	None
author	Adam Sobieszek
requires_python	>=3.8
license	None
keywords	langtorch pytorch llm language models chat chains
VCS
bugtrack_url
requirements	numpy torch aiohttp nest_asyncio openai tiktoken pandas retry pyparsing pypandoc transformers omegaconf hydra-core markdown-it-py
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <p align="center">
<img align="center" src="./langtorch_banner.png" alt="LangTorch Logo" style="max-width: 40rem; width: 100%; height: auto; vertical-align: middle;">
</p>

[![PyPI version](https://badge.fury.io/py/langtorch.svg)](https://pypi.org/project/LangTorch/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/AdamSobieszek.svg?style=social&label=Follow%20%40AdamSobieszek)](https://twitter.com/AdamSobieszek)
[![GitHub star chart](https://img.shields.io/github/stars/AdamSobieszek/langtorch?style=social)](https://star-history.com/#AdamSobieszek/langtorch)

[//]: # ([![]&#40;https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat&#41;]&#40;https://discord.gg/6adMQxSpJS&#41;)

**Quickstart Tutorial:**  [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg">](https://colab.research.google.com/github/AdamSobieszek/langtorch/blob/main/quickstart.ipynb)

---

LangTorch is a Python package that accelerates development of complex language model applications by leveraging familiar PyTorch concepts.

While existing frameworks focus on connecting language models to other services, LangTorch aims to change the way you approach creating LLM applications by introducing a unified framework for working with texts, chats, templates, LLMs, API calls and more.
> Powered by **TextTensors** â€” "torch Tensors but with text data entries" â€” offering a flexible way to structure and transform text data and embeddings with seamless parallelization. 

## Installation

```bash
pip install langtorch
```

## Overview

- **No useless classes**:

	Instead of providing wrapper classes for users to memorize, LangTorch introduces fewer, more flexible objects that enable all kinds of text formatting, templating and LLM operations.

- **Unified Approach**: 

	 TextTensors let you structure geometrically and handle in parallel text entries that can represent:
	> strings, documents, prompt templates, completion dictionaries, chat histories, markup languages, chunks, retrieval queries, tokens, embeddings and so on

- **You probably already know LangTorch**: 

	LangTorch components subclass their numerical PyTorch counterparts, which lets users apply their existing coding skills to building novel LLM app architectures.

- **Other goodies like TextModules**

	a subclass of torch.nn.Module working on TextTensors and able to perform:
	> template completions, prompt injections, local and API LLM inference, create embedding, performing operations on embeddings in retrieval and so on, and so on

- Honestly just go to https://langtorch.org there is much more information there!

## Code Examples

The examples are introduced on the main documentation page, but even without much introduction you can see how compact some pretty complex operations can be implemented with LangTorch.

### TextTensors act both as texts and embeddings

```python  
import torch  
  
tensor1 = TextTensor([["Yes"], ["No"]])  
tensor2 = TextTensor(["Yeah", "Nope", "Yup", "Non"])  
  
print(torch.cosine_similarity(tensor1, tensor2))
print("Content:\n", tensor1)
```

```    title="Output:"
tensor([[0.6923, 0.6644, 0.6317, 0.5749],
	    [0.5457, 0.7728, 0.5387, 0.7036]])
Content:
[[Yes], 
 [No ]]
```


LangTorch code looks weird at first, why? Since the utility of Tensors, as used in Torch, relies on their ability to calculate simultaneously products of several weights. The corresponding, and most used, feature in LangTorch allows several prompts to be formatted on several inputs, by defining the multiplication of text entries `text1*text2` similarly to `text1.format(**text2)`


### Chains

The multiplication operation lets us build chains of TextModules with a simple `torch.nn.Sequential`:

```python
chain = torch.nn.Sequential(
    TextModule("Translate this equation to natural language: {}"),
    CoT,
    OpenAI("gpt-4")
    TextModule("Calculate the described quantity: {}"),
    OpenAI("gpt-4", T=0)
)
	
	input_tensor = TextTensor(["170*32 =", "4 times 20 =", "123*45/10 =", "2**10*5 ="])
	output_tensor = chain(input_tensor)
```



### Retrieval & RAG from scratch

The code below is a complete working implementation of a cosine similarity-based retriever:

```python
class Retriever(TextModule):  
    def __init__(self, documents: TextTensor):  
        super().__init__()  
        self.documents = TextTensor(documents).view(-1)  
  
    def forward(self, query: TextTensor, k: int = 5):  
        cos_sim = torch.cosine_similarity(self.documents, query.reshape(1))  
        return self.documents[cos_sim.topk(k)]
         
```
```python title="Usage:"
retriever = Retriever(open("doc.txt", "r").readlines())
query = TextTensor("How to build a retriever?")

print(retriever(query))
```

We can now compose this module with a TextModule making LLM calls to get a custom Retrieval Augmented Generation pipeline:

```python
class RAG(TextModule):  
    def __init__(self, documents: TextTensor, *args, **kwargs):  
        super().__init__(*args, **kwargs)  
        self.retriever = Retriever(documents)  
  
    def forward(self, user_message: TextTensor, k: int = 5):  
        retrieved_context = self.retriever(user_message, k) +"\n"  
        user_message = user_message + "\nCONTEXT:\n" + retrieved_context.sum()  
        return super().forward(user_message)
```

```python title="Usage:"
rag_chat = RAG(paragraphs,  
			   prompt="Use the context to answer the following user query: ",
			   activation="gpt-3.5-turbo")

assistant_response = rag_chat(user_query)
```

Go to https://langtorch.org to understand these RAGs-to-riches code shenanigans.

## License

LangTorch is available under the [MIT license](https://opensource.org/licenses/MIT).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AdamSobieszek/LangTorch",
    "name": "LangTorch",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "LangTorch, PyTorch, LLM, Language Models, Chat, Chains",
    "author": "Adam Sobieszek",
    "author_email": "contact@langtorch.org",
    "download_url": "https://files.pythonhosted.org/packages/19/df/211cb0f8ee85e20cbc179e2e67e0f20b82eed815aa7db68ef61ff8cd494a/LangTorch-1.0.8.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\r\n<img align=\"center\" src=\"./langtorch_banner.png\" alt=\"LangTorch Logo\" style=\"max-width: 40rem; width: 100%; height: auto; vertical-align: middle;\">\r\n</p>\r\n\r\n[![PyPI version](https://badge.fury.io/py/langtorch.svg)](https://pypi.org/project/LangTorch/)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/AdamSobieszek.svg?style=social&label=Follow%20%40AdamSobieszek)](https://twitter.com/AdamSobieszek)\r\n[![GitHub star chart](https://img.shields.io/github/stars/AdamSobieszek/langtorch?style=social)](https://star-history.com/#AdamSobieszek/langtorch)\r\n\r\n[//]: # ([![]&#40;https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat&#41;]&#40;https://discord.gg/6adMQxSpJS&#41;)\r\n\r\n**Quickstart Tutorial:**  [<img align=\"center\" src=\"https://colab.research.google.com/assets/colab-badge.svg\">](https://colab.research.google.com/github/AdamSobieszek/langtorch/blob/main/quickstart.ipynb)\r\n\r\n---\r\n\r\nLangTorch is a Python package that accelerates development of complex language model applications by leveraging familiar PyTorch concepts.\r\n\r\nWhile existing frameworks focus on connecting language models to other services, LangTorch aims to change the way you approach creating LLM applications by introducing a unified framework for working with texts, chats, templates, LLMs, API calls and more.\r\n> Powered by **TextTensors** \u00e2\u20ac\u201d \"torch Tensors but with text data entries\" \u00e2\u20ac\u201d offering a flexible way to structure and transform text data and embeddings with seamless parallelization. \r\n\r\n## Installation\r\n\r\n```bash\r\npip install langtorch\r\n```\r\n\r\n## Overview\r\n\r\n- **No useless classes**:\r\n\r\n\tInstead of providing wrapper classes for users to memorize, LangTorch introduces fewer, more flexible objects that enable all kinds of text formatting, templating and LLM operations.\r\n\r\n- **Unified Approach**: \r\n\r\n\t TextTensors let you structure geometrically and handle in parallel text entries that can represent:\r\n\t> strings, documents, prompt templates, completion dictionaries, chat histories, markup languages, chunks, retrieval queries, tokens, embeddings and so on\r\n\r\n- **You probably already know LangTorch**: \r\n\r\n\tLangTorch components subclass their numerical PyTorch counterparts, which lets users apply their existing coding skills to building novel LLM app architectures.\r\n\r\n- **Other goodies like TextModules**\r\n\r\n\ta subclass of torch.nn.Module working on TextTensors and able to perform:\r\n\t> template completions, prompt injections, local and API LLM inference, create embedding, performing operations on embeddings in retrieval and so on, and so on\r\n\r\n- Honestly just go to https://langtorch.org there is much more information there!\r\n\r\n## Code Examples\r\n\r\nThe examples are introduced on the main documentation page, but even without much introduction you can see how compact some pretty complex operations can be implemented with LangTorch.\r\n\r\n### TextTensors act both as texts and embeddings\r\n\r\n```python  \r\nimport torch  \r\n  \r\ntensor1 = TextTensor([[\"Yes\"], [\"No\"]])  \r\ntensor2 = TextTensor([\"Yeah\", \"Nope\", \"Yup\", \"Non\"])  \r\n  \r\nprint(torch.cosine_similarity(tensor1, tensor2))\r\nprint(\"Content:\\n\", tensor1)\r\n```\r\n\r\n```    title=\"Output:\"\r\ntensor([[0.6923, 0.6644, 0.6317, 0.5749],\r\n\t    [0.5457, 0.7728, 0.5387, 0.7036]])\r\nContent:\r\n[[Yes], \r\n [No ]]\r\n```\r\n\r\n\r\nLangTorch code looks weird at first, why? Since the utility of Tensors, as used in Torch, relies on their ability to calculate simultaneously products of several weights. The corresponding, and most used, feature in LangTorch allows several prompts to be formatted on several inputs, by defining the multiplication of text entries `text1*text2` similarly to `text1.format(**text2)`\r\n\r\n\r\n### Chains\r\n\r\nThe multiplication operation lets us build chains of TextModules with a simple `torch.nn.Sequential`:\r\n\r\n```python\r\nchain = torch.nn.Sequential(\r\n    TextModule(\"Translate this equation to natural language: {}\"),\r\n    CoT,\r\n    OpenAI(\"gpt-4\")\r\n    TextModule(\"Calculate the described quantity: {}\"),\r\n    OpenAI(\"gpt-4\", T=0)\r\n)\r\n\t\r\n\tinput_tensor = TextTensor([\"170*32 =\", \"4 times 20 =\", \"123*45/10 =\", \"2**10*5 =\"])\r\n\toutput_tensor = chain(input_tensor)\r\n```\r\n\r\n\r\n\r\n### Retrieval & RAG from scratch\r\n\r\nThe code below is a complete working implementation of a cosine similarity-based retriever:\r\n\r\n```python\r\nclass Retriever(TextModule):  \r\n    def __init__(self, documents: TextTensor):  \r\n        super().__init__()  \r\n        self.documents = TextTensor(documents).view(-1)  \r\n  \r\n    def forward(self, query: TextTensor, k: int = 5):  \r\n        cos_sim = torch.cosine_similarity(self.documents, query.reshape(1))  \r\n        return self.documents[cos_sim.topk(k)]\r\n         \r\n```\r\n```python title=\"Usage:\"\r\nretriever = Retriever(open(\"doc.txt\", \"r\").readlines())\r\nquery = TextTensor(\"How to build a retriever?\")\r\n\r\nprint(retriever(query))\r\n```\r\n\r\nWe can now compose this module with a TextModule making LLM calls to get a custom Retrieval Augmented Generation pipeline:\r\n\r\n```python\r\nclass RAG(TextModule):  \r\n    def __init__(self, documents: TextTensor, *args, **kwargs):  \r\n        super().__init__(*args, **kwargs)  \r\n        self.retriever = Retriever(documents)  \r\n  \r\n    def forward(self, user_message: TextTensor, k: int = 5):  \r\n        retrieved_context = self.retriever(user_message, k) +\"\\n\"  \r\n        user_message = user_message + \"\\nCONTEXT:\\n\" + retrieved_context.sum()  \r\n        return super().forward(user_message)\r\n```\r\n\r\n```python title=\"Usage:\"\r\nrag_chat = RAG(paragraphs,  \r\n\t\t\t   prompt=\"Use the context to answer the following user query: \",\r\n\t\t\t   activation=\"gpt-3.5-turbo\")\r\n\r\nassistant_response = rag_chat(user_query)\r\n```\r\n\r\nGo to https://langtorch.org to understand these RAGs-to-riches code shenanigans.\r\n\r\n## License\r\n\r\nLangTorch is available under the [MIT license](https://opensource.org/licenses/MIT).\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Framework for intuitive LLM application development with tensors.",
    "version": "1.0.8",
    "project_urls": {
        "Homepage": "https://github.com/AdamSobieszek/LangTorch"
    },
    "split_keywords": [
        "langtorch",
        " pytorch",
        " llm",
        " language models",
        " chat",
        " chains"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b5efe4852a38b204e7ac54a0f4c364974a65cfeb85ad12d87f155d06d23b7f1b",
                "md5": "f65ecad7a7196a8f825da506bdc117f6",
                "sha256": "10461b4598bd5d93e88e237e1df68842fd26e37176d2a38549cc1963223b849d"
            },
            "downloads": -1,
            "filename": "LangTorch-1.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f65ecad7a7196a8f825da506bdc117f6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 120497,
            "upload_time": "2024-06-27T13:51:47",
            "upload_time_iso_8601": "2024-06-27T13:51:47.000620Z",
            "url": "https://files.pythonhosted.org/packages/b5/ef/e4852a38b204e7ac54a0f4c364974a65cfeb85ad12d87f155d06d23b7f1b/LangTorch-1.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "19df211cb0f8ee85e20cbc179e2e67e0f20b82eed815aa7db68ef61ff8cd494a",
                "md5": "d3cb81cda821929d202058a2ef06cc99",
                "sha256": "e2dc35b3d926bcc0363db011800d767753f780188251f5b2eef428f2a9c8e4cd"
            },
            "downloads": -1,
            "filename": "LangTorch-1.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "d3cb81cda821929d202058a2ef06cc99",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 103606,
            "upload_time": "2024-06-27T13:51:49",
            "upload_time_iso_8601": "2024-06-27T13:51:49.078617Z",
            "url": "https://files.pythonhosted.org/packages/19/df/211cb0f8ee85e20cbc179e2e67e0f20b82eed815aa7db68ef61ff8cd494a/LangTorch-1.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-27 13:51:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AdamSobieszek",
    "github_project": "LangTorch",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    "<",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "2.1.0"
                ]
            ]
        },
        {
            "name": "aiohttp",
            "specs": []
        },
        {
            "name": "nest_asyncio",
            "specs": []
        },
        {
            "name": "openai",
            "specs": [
                [
                    ">=",
                    "1.2.4"
                ]
            ]
        },
        {
            "name": "tiktoken",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "retry",
            "specs": []
        },
        {
            "name": "pyparsing",
            "specs": []
        },
        {
            "name": "pypandoc",
            "specs": []
        },
        {
            "name": "transformers",
            "specs": []
        },
        {
            "name": "omegaconf",
            "specs": []
        },
        {
            "name": "hydra-core",
            "specs": []
        },
        {
            "name": "markdown-it-py",
            "specs": []
        }
    ],
    "lcname": "langtorch"
}

Adam Sobieszek