<p align="center">
<a href="https://bespokelabs.ai/" target="_blank">
<picture>
<source media="(prefers-color-scheme: light)" width="80" srcset="https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red.png">
<img alt="Bespoke Labs Logo" width="80" src="https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red-on-Black.png">
</picture>
</a>
</p>
<h1 align="center">Bespoke Labs Curator</h1>
<h3 align="center" style="font-size: 20px; margin-bottom: 4px">Data Curation for Post-Training & Structured Data Extraction</h3>
<br/>
<p align="center">
<a href="https://docs.bespokelabs.ai/">
<img alt="Static Badge" src="https://img.shields.io/badge/Docs-docs.bespokelabs.ai-blue?style=flat&link=https%3A%2F%2Fdocs.bespokelabs.ai">
</a>
<a href="https://bespokelabs.ai/">
<img alt="Site" src="https://img.shields.io/badge/Site-bespokelabs.ai-blue?link=https%3A%2F%2Fbespokelabs.ai"/>
</a>
<img alt="PyPI - Version" src="https://img.shields.io/pypi/v/bespokelabs-curator">
<a href="https://twitter.com/bespokelabsai">
<img src="https://img.shields.io/twitter/follow/bespokelabsai" alt="Follow on X" />
</a>
<a href="https://discord.gg/KqpXvpzVBS">
<img alt="Discord" src="https://img.shields.io/discord/1230990265867698186">
</a>
</p>
### Installation
```bash
pip install bespokelabs-curator
```
### Usage
```python
from bespokelabs import curator
from datasets import Dataset
from pydantic import BaseModel, Field
from typing import List
# Create a dataset object for the topics you want to create the poems.
topics = Dataset.from_dict({"topic": [
"Urban loneliness in a bustling city",
"Beauty of Bespoke Labs's Curator library"
]})
# Define a class to encapsulate a list of poems.
class Poem(BaseModel):
poem: str = Field(description="A poem.")
class Poems(BaseModel):
poems_list: List[Poem] = Field(description="A list of poems.")
# We define a Prompter that generates poems which gets applied to the topics dataset.
poet = curator.Prompter(
# `prompt_func` takes a row of the dataset as input.
# `row` is a dictionary with a single key 'topic' in this case.
prompt_func=lambda row: f"Write two poems about {row['topic']}.",
model_name="gpt-4o-mini",
response_format=Poems,
# `row` is the input row, and `poems` is the `Poems` class which
# is parsed from the structured output from the LLM.
parse_func=lambda row, poems: [
{"topic": row["topic"], "poem": p.poem} for p in poems.poems_list
],
)
poem = poet(topics)
print(poem.to_pandas())
# Example output:
# topic poem
# 0 Urban loneliness in a bustling city In the city's heart, where the sirens wail,\nA...
# 1 Urban loneliness in a bustling city City streets hum with a bittersweet song,\nHor...
# 2 Beauty of Bespoke Labs's Curator library In whispers of design and crafted grace,\nBesp...
# 3 Beauty of Bespoke Labs's Curator library In the hushed breath of parchment and ink,\nBe...
```
Note that `topics` can be created with `curator.Prompter` as well,
and we can scale this up to create tens of thousands of diverse poems.
You can see a more detailed example in the [examples/poem.py](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples/poem.py) file,
and other examples in the [examples](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples) directory.
To run the examples, make sure to set your OpenAI API key in
the environment variable `OPENAI_API_KEY` by running `export OPENAI_API_KEY=sk-...` in your terminal.
See the [docs](https://docs.bespokelabs.ai/) for more details as well as
for troubleshooting information.
## Bespoke Curator Viewer
To run the bespoke dataset viewer:
```bash
curator-viewer
```
This will pop up a browser window with the viewer running on `127.0.0.1:3000` by default if you haven't specified a different host and port.
Optional parameters to run the viewer on a different host and port:
```bash
>>> curator-viewer -h
usage: curator-viewer [-h] [--host HOST] [--port PORT] [--verbose]
Curator Viewer
options:
-h, --help show this help message and exit
--host HOST Host to run the server on (default: localhost)
--port PORT Port to run the server on (default: 3000)
--verbose, -v Enables debug logging for more verbose output
```
The only requirement for running `curator-viewer` is to install node. You can install them by following the instructions [here](https://nodejs.org/en/download/package-manager).
For example, to check if you have node installed, you can run:
```bash
node -v
```
If it's not installed, installing latest node on MacOS, you can run:
```bash
# installs nvm (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.0/install.sh | bash
# download and install Node.js (you may need to restart the terminal)
nvm install 22
# verifies the right Node.js version is in the environment
node -v # should print `v22.11.0`
# verifies the right npm version is in the environment
npm -v # should print `10.9.0`
```
Raw data
{
"_id": null,
"home_page": "https://github.com/bespokelabsai/curator",
"name": "bespokelabs-curator",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "ai, curator, bespoke",
"author": "Bespoke Labs",
"author_email": "company@bespokelabs.ai",
"download_url": "https://files.pythonhosted.org/packages/7e/72/278d426cfd256de759b3a5bb7dc625924eb3fd02050eb3b9ca833d2ee50d/bespokelabs_curator-0.1.9.tar.gz",
"platform": null,
"description": "<p align=\"center\">\n <a href=\"https://bespokelabs.ai/\" target=\"_blank\">\n <picture>\n <source media=\"(prefers-color-scheme: light)\" width=\"80\" srcset=\"https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red.png\">\n <img alt=\"Bespoke Labs Logo\" width=\"80\" src=\"https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red-on-Black.png\">\n </picture>\n </a>\n</p>\n\n<h1 align=\"center\">Bespoke Labs Curator</h1>\n<h3 align=\"center\" style=\"font-size: 20px; margin-bottom: 4px\">Data Curation for Post-Training & Structured Data Extraction</h3>\n<br/>\n<p align=\"center\">\n <a href=\"https://docs.bespokelabs.ai/\">\n <img alt=\"Static Badge\" src=\"https://img.shields.io/badge/Docs-docs.bespokelabs.ai-blue?style=flat&link=https%3A%2F%2Fdocs.bespokelabs.ai\">\n </a>\n <a href=\"https://bespokelabs.ai/\">\n <img alt=\"Site\" src=\"https://img.shields.io/badge/Site-bespokelabs.ai-blue?link=https%3A%2F%2Fbespokelabs.ai\"/>\n </a>\n <img alt=\"PyPI - Version\" src=\"https://img.shields.io/pypi/v/bespokelabs-curator\">\n <a href=\"https://twitter.com/bespokelabsai\">\n <img src=\"https://img.shields.io/twitter/follow/bespokelabsai\" alt=\"Follow on X\" />\n </a>\n <a href=\"https://discord.gg/KqpXvpzVBS\">\n <img alt=\"Discord\" src=\"https://img.shields.io/discord/1230990265867698186\">\n </a>\n</p>\n\n\n### Installation\n\n```bash\npip install bespokelabs-curator\n```\n\n### Usage\n\n```python\nfrom bespokelabs import curator\nfrom datasets import Dataset\nfrom pydantic import BaseModel, Field\nfrom typing import List\n\n# Create a dataset object for the topics you want to create the poems.\ntopics = Dataset.from_dict({\"topic\": [\n \"Urban loneliness in a bustling city\",\n \"Beauty of Bespoke Labs's Curator library\"\n]})\n\n# Define a class to encapsulate a list of poems.\nclass Poem(BaseModel):\n poem: str = Field(description=\"A poem.\")\n\nclass Poems(BaseModel):\n poems_list: List[Poem] = Field(description=\"A list of poems.\")\n\n\n# We define a Prompter that generates poems which gets applied to the topics dataset.\npoet = curator.Prompter(\n # `prompt_func` takes a row of the dataset as input.\n # `row` is a dictionary with a single key 'topic' in this case.\n prompt_func=lambda row: f\"Write two poems about {row['topic']}.\",\n model_name=\"gpt-4o-mini\",\n response_format=Poems,\n # `row` is the input row, and `poems` is the `Poems` class which \n # is parsed from the structured output from the LLM.\n parse_func=lambda row, poems: [\n {\"topic\": row[\"topic\"], \"poem\": p.poem} for p in poems.poems_list\n ],\n)\n\npoem = poet(topics)\nprint(poem.to_pandas())\n# Example output:\n# topic poem\n# 0 Urban loneliness in a bustling city In the city's heart, where the sirens wail,\\nA...\n# 1 Urban loneliness in a bustling city City streets hum with a bittersweet song,\\nHor...\n# 2 Beauty of Bespoke Labs's Curator library In whispers of design and crafted grace,\\nBesp...\n# 3 Beauty of Bespoke Labs's Curator library In the hushed breath of parchment and ink,\\nBe...\n```\nNote that `topics` can be created with `curator.Prompter` as well,\nand we can scale this up to create tens of thousands of diverse poems.\nYou can see a more detailed example in the [examples/poem.py](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples/poem.py) file,\nand other examples in the [examples](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples) directory.\n\nTo run the examples, make sure to set your OpenAI API key in \nthe environment variable `OPENAI_API_KEY` by running `export OPENAI_API_KEY=sk-...` in your terminal.\n\nSee the [docs](https://docs.bespokelabs.ai/) for more details as well as \nfor troubleshooting information.\n\n## Bespoke Curator Viewer\n\nTo run the bespoke dataset viewer:\n\n```bash\ncurator-viewer\n```\n\nThis will pop up a browser window with the viewer running on `127.0.0.1:3000` by default if you haven't specified a different host and port.\n\n\nOptional parameters to run the viewer on a different host and port:\n```bash\n>>> curator-viewer -h\nusage: curator-viewer [-h] [--host HOST] [--port PORT] [--verbose]\n\nCurator Viewer\n\noptions:\n -h, --help show this help message and exit\n --host HOST Host to run the server on (default: localhost)\n --port PORT Port to run the server on (default: 3000)\n --verbose, -v Enables debug logging for more verbose output\n```\n\nThe only requirement for running `curator-viewer` is to install node. You can install them by following the instructions [here](https://nodejs.org/en/download/package-manager).\n\nFor example, to check if you have node installed, you can run:\n\n```bash\nnode -v\n```\n\nIf it's not installed, installing latest node on MacOS, you can run:\n\n```bash\n# installs nvm (Node Version Manager)\ncurl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.0/install.sh | bash\n# download and install Node.js (you may need to restart the terminal)\nnvm install 22\n# verifies the right Node.js version is in the environment\nnode -v # should print `v22.11.0`\n# verifies the right npm version is in the environment\nnpm -v # should print `10.9.0`\n```\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Bespoke Labs Curator",
"version": "0.1.9",
"project_urls": {
"Homepage": "https://github.com/bespokelabsai/curator",
"Repository": "https://github.com/bespokelabsai/curator"
},
"split_keywords": [
"ai",
" curator",
" bespoke"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6f5e729ca606192306d1b42bd5dcf9da09047c66f27be1a867806f15904e1c24",
"md5": "c85c7382d3b3cc771a43cc9a6b5bb0b2",
"sha256": "22703e9aec388a79e32aff00437b44a84be868eb103bfd7c9144f593bd2f57a1"
},
"downloads": -1,
"filename": "bespokelabs_curator-0.1.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c85c7382d3b3cc771a43cc9a6b5bb0b2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 1155974,
"upload_time": "2024-11-16T00:14:46",
"upload_time_iso_8601": "2024-11-16T00:14:46.076508Z",
"url": "https://files.pythonhosted.org/packages/6f/5e/729ca606192306d1b42bd5dcf9da09047c66f27be1a867806f15904e1c24/bespokelabs_curator-0.1.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7e72278d426cfd256de759b3a5bb7dc625924eb3fd02050eb3b9ca833d2ee50d",
"md5": "be1ff0f2838a1840fb3fc21a3c9334f4",
"sha256": "775a6c5ec22066ed4547e1d837c6c37b1fa2cbcd46f5d2b164383212e990cfc6"
},
"downloads": -1,
"filename": "bespokelabs_curator-0.1.9.tar.gz",
"has_sig": false,
"md5_digest": "be1ff0f2838a1840fb3fc21a3c9334f4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 1092645,
"upload_time": "2024-11-16T00:14:48",
"upload_time_iso_8601": "2024-11-16T00:14:48.176293Z",
"url": "https://files.pythonhosted.org/packages/7e/72/278d426cfd256de759b3a5bb7dc625924eb3fd02050eb3b9ca833d2ee50d/bespokelabs_curator-0.1.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-16 00:14:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "bespokelabsai",
"github_project": "curator",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "bespokelabs-curator"
}