Name | dataformer JSON |
Version |
0.0.2
JSON |
| download |
home_page | None |
Summary | Dataformer is a library to create data for LLMs. |
upload_time | 2024-09-11 06:39:19 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | Apache-2.0 |
keywords |
llm
synthetic
data
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<div align="center">
<img src="https://github.com/DataformerAI/dataformer/assets/39311993/b2515523-19a9-4a54-8f12-1f8de24b7a9f"/>
</div>
<h3 align="center">Solving data for LLMs - Create quality synthetic datasets!</h3>
<p align="center">
<a href="https://x.com/dataformer_ai">
<img src="https://img.shields.io/badge/twitter-black?logo=x"/>
</a>
<a href="https://www.linkedin.com/company/dataformer">
<img src="https://img.shields.io/badge/linkedin-blue?logo=linkedin"/>
</a>
<a href="https://dataformer.ai/discord">
<img src="https://img.shields.io/badge/Discord-7289DA?&logo=discord&logoColor=white"/>
</a>
<a href="https://dataformer.ai/call">
<img src="https://img.shields.io/badge/book_a_call-00897B?&logo=googlemeet&logoColor=white"/>
</a>
</p>
## Why Dataformer?
Dataformer empowers engineers with a robust framework for creating high-quality synthetic datasets for AI, offering speed, reliability, and scalability. Our mission is to supercharge your AI development process by enabling rapid generation of diverse, premium datasets grounded in proven research methodologies.
In the world of AI, compute costs are high, and output quality is paramount. Dataformer allows you to **prioritize data excellence**, addressing both these challenges head-on. By crafting top-tier synthetic data, you can invest your precious time in achieving and sustaining **superior standards** for your AI models.
### One API, Multiple Providers
We integrate with **multiple LLM providers** using one unified API and allow you to make parallel async API calls while respecting rate-limits. We offer the option to cache responses from LLM providers, minimizing redundant API calls and directly reducing operational expenses.
### Research-Backed Iteration at Scale
Leverage state-of-the-art research papers to generate synthetic data while ensuring **adaptability, scalability, and resilience**. Shift your focus from infrastructure concerns to refining your data and enhancing your models.
## Installation
Github Source:
```
pip install dataformer@git+https://github.com/DataformerAI/dataformer.git
```
Using Git:
```
git clone https://github.com/DataformerAI/dataformer.git
cd dataformer
pip install .
```
## Quick Start
AsyncLLM supports various API providers, including:
- OpenAI
- Groq
- Together
- DeepInfra
- OpenRouter
Choose the provider that best suits your needs!
Here's a quick example of how to use Dataformer's AsyncLLM for efficient asynchronous generation:
```python
from dataformer.llms import AsyncLLM
from dataformer.utils import get_request_list, get_messages
from datasets import load_dataset
# Load a sample dataset
dataset = load_dataset("dataformer/self-knowledge", split="train").select(range(3))
instructions = [example["question"] for example in dataset]
# Prepare the request list
sampling_params = {"temperature": 0.7}
request_list = get_request_list(instructions, sampling_params)
# Initialize AsyncLLM with your preferred API provider
llm = AsyncLLM(api_provider="groq", model="llama-3.1-8b-instant")
# Generate responses asynchronously
response_list = get_messages(llm.generate(request_list))
```
## Contribute
We welcome contributions! Check our issues or open a new one to get started.
## Join Community
[Join Dataformer on Discord](https://dataformer.ai/discord)
Raw data
{
"_id": null,
"home_page": null,
"name": "dataformer",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "llm, synthetic, data",
"author": null,
"author_email": "Dataformer AI <contact@dataformer.ai>",
"download_url": "https://files.pythonhosted.org/packages/93/28/d4426f030bedb5414923fdcfead2039edf703448f620b030e483494d6ed4/dataformer-0.0.2.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <img src=\"https://github.com/DataformerAI/dataformer/assets/39311993/b2515523-19a9-4a54-8f12-1f8de24b7a9f\"/>\n</div>\n\n<h3 align=\"center\">Solving data for LLMs - Create quality synthetic datasets!</h3>\n\n<p align=\"center\">\n <a href=\"https://x.com/dataformer_ai\">\n <img src=\"https://img.shields.io/badge/twitter-black?logo=x\"/>\n </a>\n <a href=\"https://www.linkedin.com/company/dataformer\">\n <img src=\"https://img.shields.io/badge/linkedin-blue?logo=linkedin\"/>\n </a>\n <a href=\"https://dataformer.ai/discord\">\n <img src=\"https://img.shields.io/badge/Discord-7289DA?&logo=discord&logoColor=white\"/>\n </a>\n <a href=\"https://dataformer.ai/call\">\n <img src=\"https://img.shields.io/badge/book_a_call-00897B?&logo=googlemeet&logoColor=white\"/>\n </a>\n</p>\n\n## Why Dataformer?\n\nDataformer empowers engineers with a robust framework for creating high-quality synthetic datasets for AI, offering speed, reliability, and scalability. Our mission is to supercharge your AI development process by enabling rapid generation of diverse, premium datasets grounded in proven research methodologies. \nIn the world of AI, compute costs are high, and output quality is paramount. Dataformer allows you to **prioritize data excellence**, addressing both these challenges head-on. By crafting top-tier synthetic data, you can invest your precious time in achieving and sustaining **superior standards** for your AI models.\n\n### One API, Multiple Providers\n\nWe integrate with **multiple LLM providers** using one unified API and allow you to make parallel async API calls while respecting rate-limits. We offer the option to cache responses from LLM providers, minimizing redundant API calls and directly reducing operational expenses.\n\n### Research-Backed Iteration at Scale\n \nLeverage state-of-the-art research papers to generate synthetic data while ensuring **adaptability, scalability, and resilience**. Shift your focus from infrastructure concerns to refining your data and enhancing your models.\n\n## Installation\n\nGithub Source:\n```\npip install dataformer@git+https://github.com/DataformerAI/dataformer.git \n```\n\nUsing Git:\n```\ngit clone https://github.com/DataformerAI/dataformer.git\ncd dataformer\npip install .\n```\n## Quick Start\n\nAsyncLLM supports various API providers, including:\n- OpenAI\n- Groq\n- Together\n- DeepInfra\n- OpenRouter\n\nChoose the provider that best suits your needs!\n\nHere's a quick example of how to use Dataformer's AsyncLLM for efficient asynchronous generation:\n```python\nfrom dataformer.llms import AsyncLLM\nfrom dataformer.utils import get_request_list, get_messages\nfrom datasets import load_dataset\n\n# Load a sample dataset\ndataset = load_dataset(\"dataformer/self-knowledge\", split=\"train\").select(range(3))\ninstructions = [example[\"question\"] for example in dataset]\n\n# Prepare the request list\nsampling_params = {\"temperature\": 0.7}\nrequest_list = get_request_list(instructions, sampling_params)\n\n# Initialize AsyncLLM with your preferred API provider\nllm = AsyncLLM(api_provider=\"groq\", model=\"llama-3.1-8b-instant\")\n\n# Generate responses asynchronously\nresponse_list = get_messages(llm.generate(request_list))\n```\n## Contribute\n\nWe welcome contributions! Check our issues or open a new one to get started.\n\n## Join Community\n\n[Join Dataformer on Discord](https://dataformer.ai/discord)\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Dataformer is a library to create data for LLMs.",
"version": "0.0.2",
"project_urls": {
"Documentation": "https://dataformer.ai/",
"Issues": "https://github.com/DataformerAI/dataformer/issues",
"Source": "https://github.com/DataformerAI/dataformer"
},
"split_keywords": [
"llm",
" synthetic",
" data"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "15c6f119396acf6be7c8540bc3d393307ab2910dac8ecd7d74f002df89666d32",
"md5": "ab16300b4253974e7f3fc59377d38f30",
"sha256": "76480dedd02e5c3a3eb7a3932662512d177fe5d0baef5e599832e401a8ad4312"
},
"downloads": -1,
"filename": "dataformer-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ab16300b4253974e7f3fc59377d38f30",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 35734,
"upload_time": "2024-09-11T06:39:18",
"upload_time_iso_8601": "2024-09-11T06:39:18.236770Z",
"url": "https://files.pythonhosted.org/packages/15/c6/f119396acf6be7c8540bc3d393307ab2910dac8ecd7d74f002df89666d32/dataformer-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9328d4426f030bedb5414923fdcfead2039edf703448f620b030e483494d6ed4",
"md5": "10106d5396698b575d04875c7a039314",
"sha256": "5eb50f625664ad6ce9159a3e4c734cf3d058cd1868ed364b20802731a8b8b7c9"
},
"downloads": -1,
"filename": "dataformer-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "10106d5396698b575d04875c7a039314",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 48497,
"upload_time": "2024-09-11T06:39:19",
"upload_time_iso_8601": "2024-09-11T06:39:19.902583Z",
"url": "https://files.pythonhosted.org/packages/93/28/d4426f030bedb5414923fdcfead2039edf703448f620b030e483494d6ed4/dataformer-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-11 06:39:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "DataformerAI",
"github_project": "dataformer",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "dataformer"
}