python-fastdata


Namepython-fastdata JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/AnswerDotAI/fastdata
SummaryEasiest and fastest way to 1B synthetic tokens
upload_time2024-10-15 22:26:57
maintainerNone
docs_urlNone
authorncoop57
requires_python>=3.9
licenseApache Software License 2.0
keywords nbdev jupyter notebook python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # fastdata


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

`fastdata` is a minimal library for generating synthetic data for
training deep learning models. For example, below is how you can
generate a dataset to train a language model to translate from English
to Spanish.

First you need to define the structure of the data you want to generate.
`claudette`, which is the library that fastdata uses to generate data,
requires you to define the schema of the data you want to generate.

``` python
from fastcore.utils import *
```

``` python
class Translation():
    "Translation from an English phrase to a Spanish phrase"
    def __init__(self, english: str, spanish: str): store_attr()
    def __repr__(self): return f"{self.english} ➡ *{self.spanish}*"

Translation("Hello, how are you today?", "Hola, ¿cómo estás hoy?")
```

    Hello, how are you today? ➡ *Hola, ¿cómo estás hoy?*

Next, you need to define the prompt that will be used to generate the
data and any inputs you want to pass to the prompt.

``` python
prompt_template = """\
Generate English and Spanish translations on the following topic:
<topic>{topic}</topic>
"""

inputs = [{"topic": "Otters are cute"}, {"topic": "I love programming"}]
```

Finally, we can generate some data with fastdata.

> [!NOTE]
>
> We only support Anthropic models at the moment. Therefore, make sure
> you have an API key for the model you want to use and the proper
> environment variables set or pass the api key to the
> [`FastData`](https://AnswerDotAI.github.io/fastdata/core.html#fastdata)
> class `FastData(api_key="sk-ant-api03-...")`.

``` python
from fastdata.core import FastData
```

``` python
fast_data = FastData(model="claude-3-haiku-20240307")
translations = fast_data.generate(
    prompt_template=prompt_template,
    inputs=inputs,
    schema=Translation,
)
```

    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.57it/s]

``` python
from IPython.display import Markdown
```

``` python
Markdown("\n".join(f'- {t}' for t in translations))
```

- I love programming ➡ *Me encanta la programación*
- Otters are cute ➡ *Las nutrias son lindas*

### Installation

Install latest from the GitHub
[repository](https://github.com/AnswerDotAI/fastdata):

``` sh
$ pip install git+https://github.com/AnswerDotAI/fastdata.git
```

or from [pypi](https://pypi.org/project/fastdata/)

``` sh
$ pip install python-fastdata
```

If you’d like to see how best to generate data with fastdata, check out
our blog post [here](https://www.answer.ai/blog/introducing-fastdata)
and some of the examples in the
[examples](https://github.com/AnswerDotAI/fastdata/tree/main/examples)
directory.

## Developer Guide

If you are new to using `nbdev` here are some useful pointers to get you
started.

### Install fastdata in Development mode

``` sh
# make sure fastdata package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to fastdata
$ nbdev_prepare
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AnswerDotAI/fastdata",
    "name": "python-fastdata",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "nbdev jupyter notebook python",
    "author": "ncoop57",
    "author_email": "nc@answer.ai",
    "download_url": "https://files.pythonhosted.org/packages/31/99/61c71a850b05e6bde5c439cf95f259e6e3a583347b0eb5c050cd377846de/python_fastdata-0.0.3.tar.gz",
    "platform": null,
    "description": "# fastdata\n\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n`fastdata` is a minimal library for generating synthetic data for\ntraining deep learning models. For example, below is how you can\ngenerate a dataset to train a language model to translate from English\nto Spanish.\n\nFirst you need to define the structure of the data you want to generate.\n`claudette`, which is the library that fastdata uses to generate data,\nrequires you to define the schema of the data you want to generate.\n\n``` python\nfrom fastcore.utils import *\n```\n\n``` python\nclass Translation():\n    \"Translation from an English phrase to a Spanish phrase\"\n    def __init__(self, english: str, spanish: str): store_attr()\n    def __repr__(self): return f\"{self.english} \u27a1 *{self.spanish}*\"\n\nTranslation(\"Hello, how are you today?\", \"Hola, \u00bfc\u00f3mo est\u00e1s hoy?\")\n```\n\n    Hello, how are you today? \u27a1 *Hola, \u00bfc\u00f3mo est\u00e1s hoy?*\n\nNext, you need to define the prompt that will be used to generate the\ndata and any inputs you want to pass to the prompt.\n\n``` python\nprompt_template = \"\"\"\\\nGenerate English and Spanish translations on the following topic:\n<topic>{topic}</topic>\n\"\"\"\n\ninputs = [{\"topic\": \"Otters are cute\"}, {\"topic\": \"I love programming\"}]\n```\n\nFinally, we can generate some data with fastdata.\n\n> [!NOTE]\n>\n> We only support Anthropic models at the moment. Therefore, make sure\n> you have an API key for the model you want to use and the proper\n> environment variables set or pass the api key to the\n> [`FastData`](https://AnswerDotAI.github.io/fastdata/core.html#fastdata)\n> class `FastData(api_key=\"sk-ant-api03-...\")`.\n\n``` python\nfrom fastdata.core import FastData\n```\n\n``` python\nfast_data = FastData(model=\"claude-3-haiku-20240307\")\ntranslations = fast_data.generate(\n    prompt_template=prompt_template,\n    inputs=inputs,\n    schema=Translation,\n)\n```\n\n    100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:01<00:00,  1.57it/s]\n\n``` python\nfrom IPython.display import Markdown\n```\n\n``` python\nMarkdown(\"\\n\".join(f'- {t}' for t in translations))\n```\n\n- I love programming \u27a1 *Me encanta la programaci\u00f3n*\n- Otters are cute \u27a1 *Las nutrias son lindas*\n\n### Installation\n\nInstall latest from the GitHub\n[repository](https://github.com/AnswerDotAI/fastdata):\n\n``` sh\n$ pip install git+https://github.com/AnswerDotAI/fastdata.git\n```\n\nor from [pypi](https://pypi.org/project/fastdata/)\n\n``` sh\n$ pip install python-fastdata\n```\n\nIf you\u2019d like to see how best to generate data with fastdata, check out\nour blog post [here](https://www.answer.ai/blog/introducing-fastdata)\nand some of the examples in the\n[examples](https://github.com/AnswerDotAI/fastdata/tree/main/examples)\ndirectory.\n\n## Developer Guide\n\nIf you are new to using `nbdev` here are some useful pointers to get you\nstarted.\n\n### Install fastdata in Development mode\n\n``` sh\n# make sure fastdata package is installed in development mode\n$ pip install -e .\n\n# make changes under nbs/ directory\n# ...\n\n# compile to have changes apply to fastdata\n$ nbdev_prepare\n```\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "Easiest and fastest way to 1B synthetic tokens",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://github.com/AnswerDotAI/fastdata"
    },
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "163e7c84226841b9f8d758b01587c3ea7ac5d18ce5b6090457910afa1aa64f6d",
                "md5": "ba992099455cd42e48450cf53e3c0be8",
                "sha256": "0d5ce0e4c6df7326b5ba6481f39e4842c795e3126082d84a4e7f81603a23863a"
            },
            "downloads": -1,
            "filename": "python_fastdata-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ba992099455cd42e48450cf53e3c0be8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 8753,
            "upload_time": "2024-10-15T22:26:55",
            "upload_time_iso_8601": "2024-10-15T22:26:55.677925Z",
            "url": "https://files.pythonhosted.org/packages/16/3e/7c84226841b9f8d758b01587c3ea7ac5d18ce5b6090457910afa1aa64f6d/python_fastdata-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "319961c71a850b05e6bde5c439cf95f259e6e3a583347b0eb5c050cd377846de",
                "md5": "b381d2aacd3489fdc753034ddc9d8f7e",
                "sha256": "a6d49b40ca4fde214431c06c87e4446a5adade9c7400f2ca8d3f82b19ea9531e"
            },
            "downloads": -1,
            "filename": "python_fastdata-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "b381d2aacd3489fdc753034ddc9d8f7e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 9457,
            "upload_time": "2024-10-15T22:26:57",
            "upload_time_iso_8601": "2024-10-15T22:26:57.259917Z",
            "url": "https://files.pythonhosted.org/packages/31/99/61c71a850b05e6bde5c439cf95f259e6e3a583347b0eb5c050cd377846de/python_fastdata-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-15 22:26:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AnswerDotAI",
    "github_project": "fastdata",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "python-fastdata"
}
        
Elapsed time: 0.72029s