# fastdata
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
`fastdata` is a minimal library for generating synthetic data for
training deep learning models. For example, below is how you can
generate a dataset to train a language model to translate from English
to Spanish.
First you need to define the structure of the data you want to generate.
`claudette`, which is the library that fastdata uses to generate data,
requires you to define the schema of the data you want to generate.
``` python
from fastcore.utils import *
```
``` python
class Translation():
"Translation from an English phrase to a Spanish phrase"
def __init__(self, english: str, spanish: str): store_attr()
def __repr__(self): return f"{self.english} ➡ *{self.spanish}*"
Translation("Hello, how are you today?", "Hola, ¿cómo estás hoy?")
```
Hello, how are you today? ➡ *Hola, ¿cómo estás hoy?*
Next, you need to define the prompt that will be used to generate the
data and any inputs you want to pass to the prompt.
``` python
prompt_template = """\
Generate English and Spanish translations on the following topic:
<topic>{topic}</topic>
"""
inputs = [{"topic": "Otters are cute"}, {"topic": "I love programming"}]
```
Finally, we can generate some data with fastdata.
> [!NOTE]
>
> We only support Anthropic models at the moment. Therefore, make sure
> you have an API key for the model you want to use and the proper
> environment variables set or pass the api key to the
> [`FastData`](https://AnswerDotAI.github.io/fastdata/core.html#fastdata)
> class `FastData(api_key="sk-ant-api03-...")`.
``` python
from fastdata.core import FastData
```
``` python
fast_data = FastData(model="claude-3-haiku-20240307")
translations = fast_data.generate(
prompt_template=prompt_template,
inputs=inputs,
schema=Translation,
)
```
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.57it/s]
``` python
from IPython.display import Markdown
```
``` python
Markdown("\n".join(f'- {t}' for t in translations))
```
- I love programming ➡ *Me encanta la programación*
- Otters are cute ➡ *Las nutrias son lindas*
### Installation
Install latest from the GitHub
[repository](https://github.com/AnswerDotAI/fastdata):
``` sh
$ pip install git+https://github.com/AnswerDotAI/fastdata.git
```
or from [pypi](https://pypi.org/project/fastdata/)
``` sh
$ pip install python-fastdata
```
If you’d like to see how best to generate data with fastdata, check out
our blog post [here](https://www.answer.ai/blog/introducing-fastdata)
and some of the examples in the
[examples](https://github.com/AnswerDotAI/fastdata/tree/main/examples)
directory.
## Developer Guide
If you are new to using `nbdev` here are some useful pointers to get you
started.
### Install fastdata in Development mode
``` sh
# make sure fastdata package is installed in development mode
$ pip install -e .
# make changes under nbs/ directory
# ...
# compile to have changes apply to fastdata
$ nbdev_prepare
```
Raw data
{
"_id": null,
"home_page": "https://github.com/AnswerDotAI/fastdata",
"name": "python-fastdata",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "nbdev jupyter notebook python",
"author": "ncoop57",
"author_email": "nc@answer.ai",
"download_url": "https://files.pythonhosted.org/packages/31/99/61c71a850b05e6bde5c439cf95f259e6e3a583347b0eb5c050cd377846de/python_fastdata-0.0.3.tar.gz",
"platform": null,
"description": "# fastdata\n\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n`fastdata` is a minimal library for generating synthetic data for\ntraining deep learning models. For example, below is how you can\ngenerate a dataset to train a language model to translate from English\nto Spanish.\n\nFirst you need to define the structure of the data you want to generate.\n`claudette`, which is the library that fastdata uses to generate data,\nrequires you to define the schema of the data you want to generate.\n\n``` python\nfrom fastcore.utils import *\n```\n\n``` python\nclass Translation():\n \"Translation from an English phrase to a Spanish phrase\"\n def __init__(self, english: str, spanish: str): store_attr()\n def __repr__(self): return f\"{self.english} \u27a1 *{self.spanish}*\"\n\nTranslation(\"Hello, how are you today?\", \"Hola, \u00bfc\u00f3mo est\u00e1s hoy?\")\n```\n\n Hello, how are you today? \u27a1 *Hola, \u00bfc\u00f3mo est\u00e1s hoy?*\n\nNext, you need to define the prompt that will be used to generate the\ndata and any inputs you want to pass to the prompt.\n\n``` python\nprompt_template = \"\"\"\\\nGenerate English and Spanish translations on the following topic:\n<topic>{topic}</topic>\n\"\"\"\n\ninputs = [{\"topic\": \"Otters are cute\"}, {\"topic\": \"I love programming\"}]\n```\n\nFinally, we can generate some data with fastdata.\n\n> [!NOTE]\n>\n> We only support Anthropic models at the moment. Therefore, make sure\n> you have an API key for the model you want to use and the proper\n> environment variables set or pass the api key to the\n> [`FastData`](https://AnswerDotAI.github.io/fastdata/core.html#fastdata)\n> class `FastData(api_key=\"sk-ant-api03-...\")`.\n\n``` python\nfrom fastdata.core import FastData\n```\n\n``` python\nfast_data = FastData(model=\"claude-3-haiku-20240307\")\ntranslations = fast_data.generate(\n prompt_template=prompt_template,\n inputs=inputs,\n schema=Translation,\n)\n```\n\n 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:01<00:00, 1.57it/s]\n\n``` python\nfrom IPython.display import Markdown\n```\n\n``` python\nMarkdown(\"\\n\".join(f'- {t}' for t in translations))\n```\n\n- I love programming \u27a1 *Me encanta la programaci\u00f3n*\n- Otters are cute \u27a1 *Las nutrias son lindas*\n\n### Installation\n\nInstall latest from the GitHub\n[repository](https://github.com/AnswerDotAI/fastdata):\n\n``` sh\n$ pip install git+https://github.com/AnswerDotAI/fastdata.git\n```\n\nor from [pypi](https://pypi.org/project/fastdata/)\n\n``` sh\n$ pip install python-fastdata\n```\n\nIf you\u2019d like to see how best to generate data with fastdata, check out\nour blog post [here](https://www.answer.ai/blog/introducing-fastdata)\nand some of the examples in the\n[examples](https://github.com/AnswerDotAI/fastdata/tree/main/examples)\ndirectory.\n\n## Developer Guide\n\nIf you are new to using `nbdev` here are some useful pointers to get you\nstarted.\n\n### Install fastdata in Development mode\n\n``` sh\n# make sure fastdata package is installed in development mode\n$ pip install -e .\n\n# make changes under nbs/ directory\n# ...\n\n# compile to have changes apply to fastdata\n$ nbdev_prepare\n```\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "Easiest and fastest way to 1B synthetic tokens",
"version": "0.0.3",
"project_urls": {
"Homepage": "https://github.com/AnswerDotAI/fastdata"
},
"split_keywords": [
"nbdev",
"jupyter",
"notebook",
"python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "163e7c84226841b9f8d758b01587c3ea7ac5d18ce5b6090457910afa1aa64f6d",
"md5": "ba992099455cd42e48450cf53e3c0be8",
"sha256": "0d5ce0e4c6df7326b5ba6481f39e4842c795e3126082d84a4e7f81603a23863a"
},
"downloads": -1,
"filename": "python_fastdata-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ba992099455cd42e48450cf53e3c0be8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 8753,
"upload_time": "2024-10-15T22:26:55",
"upload_time_iso_8601": "2024-10-15T22:26:55.677925Z",
"url": "https://files.pythonhosted.org/packages/16/3e/7c84226841b9f8d758b01587c3ea7ac5d18ce5b6090457910afa1aa64f6d/python_fastdata-0.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "319961c71a850b05e6bde5c439cf95f259e6e3a583347b0eb5c050cd377846de",
"md5": "b381d2aacd3489fdc753034ddc9d8f7e",
"sha256": "a6d49b40ca4fde214431c06c87e4446a5adade9c7400f2ca8d3f82b19ea9531e"
},
"downloads": -1,
"filename": "python_fastdata-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "b381d2aacd3489fdc753034ddc9d8f7e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 9457,
"upload_time": "2024-10-15T22:26:57",
"upload_time_iso_8601": "2024-10-15T22:26:57.259917Z",
"url": "https://files.pythonhosted.org/packages/31/99/61c71a850b05e6bde5c439cf95f259e6e3a583347b0eb5c050cd377846de/python_fastdata-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-15 22:26:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AnswerDotAI",
"github_project": "fastdata",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "python-fastdata"
}