bespokelabs-curator


Namebespokelabs-curator JSON
Version 0.1.9 PyPI version JSON
download
home_pagehttps://github.com/bespokelabsai/curator
SummaryBespoke Labs Curator
upload_time2024-11-16 00:14:48
maintainerNone
docs_urlNone
authorBespoke Labs
requires_python<4.0,>=3.10
licenseApache-2.0
keywords ai curator bespoke
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <a href="https://bespokelabs.ai/" target="_blank">
    <picture>
      <source media="(prefers-color-scheme: light)" width="80" srcset="https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red.png">
      <img alt="Bespoke Labs Logo" width="80" src="https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red-on-Black.png">
    </picture>
  </a>
</p>

<h1 align="center">Bespoke Labs Curator</h1>
<h3 align="center" style="font-size: 20px; margin-bottom: 4px">Data Curation for Post-Training & Structured Data Extraction</h3>
<br/>
<p align="center">
  <a href="https://docs.bespokelabs.ai/">
    <img alt="Static Badge" src="https://img.shields.io/badge/Docs-docs.bespokelabs.ai-blue?style=flat&link=https%3A%2F%2Fdocs.bespokelabs.ai">
  </a>
  <a href="https://bespokelabs.ai/">
    <img alt="Site" src="https://img.shields.io/badge/Site-bespokelabs.ai-blue?link=https%3A%2F%2Fbespokelabs.ai"/>
  </a>
  <img alt="PyPI - Version" src="https://img.shields.io/pypi/v/bespokelabs-curator">
  <a href="https://twitter.com/bespokelabsai">
    <img src="https://img.shields.io/twitter/follow/bespokelabsai" alt="Follow on X" />
  </a>
  <a href="https://discord.gg/KqpXvpzVBS">
    <img alt="Discord" src="https://img.shields.io/discord/1230990265867698186">
  </a>
</p>


### Installation

```bash
pip install bespokelabs-curator
```

### Usage

```python
from bespokelabs import curator
from datasets import Dataset
from pydantic import BaseModel, Field
from typing import List

# Create a dataset object for the topics you want to create the poems.
topics = Dataset.from_dict({"topic": [
    "Urban loneliness in a bustling city",
    "Beauty of Bespoke Labs's Curator library"
]})

# Define a class to encapsulate a list of poems.
class Poem(BaseModel):
    poem: str = Field(description="A poem.")

class Poems(BaseModel):
    poems_list: List[Poem] = Field(description="A list of poems.")


# We define a Prompter that generates poems which gets applied to the topics dataset.
poet = curator.Prompter(
    # `prompt_func` takes a row of the dataset as input.
    # `row` is a dictionary with a single key 'topic' in this case.
    prompt_func=lambda row: f"Write two poems about {row['topic']}.",
    model_name="gpt-4o-mini",
    response_format=Poems,
    # `row` is the input row, and `poems` is the `Poems` class which 
    # is parsed from the structured output from the LLM.
    parse_func=lambda row, poems: [
        {"topic": row["topic"], "poem": p.poem} for p in poems.poems_list
    ],
)

poem = poet(topics)
print(poem.to_pandas())
# Example output:
#                                       topic                                               poem
# 0       Urban loneliness in a bustling city  In the city's heart, where the sirens wail,\nA...
# 1       Urban loneliness in a bustling city  City streets hum with a bittersweet song,\nHor...
# 2  Beauty of Bespoke Labs's Curator library  In whispers of design and crafted grace,\nBesp...
# 3  Beauty of Bespoke Labs's Curator library  In the hushed breath of parchment and ink,\nBe...
```
Note that `topics` can be created with `curator.Prompter` as well,
and we can scale this up to create tens of thousands of diverse poems.
You can see a more detailed example in the [examples/poem.py](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples/poem.py) file,
and other examples in the [examples](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples) directory.

To run the examples, make sure to set your OpenAI API key in 
the environment variable `OPENAI_API_KEY` by running `export OPENAI_API_KEY=sk-...` in your terminal.

See the [docs](https://docs.bespokelabs.ai/) for more details as well as 
for troubleshooting information.

## Bespoke Curator Viewer

To run the bespoke dataset viewer:

```bash
curator-viewer
```

This will pop up a browser window with the viewer running on `127.0.0.1:3000` by default if you haven't specified a different host and port.


Optional parameters to run the viewer on a different host and port:
```bash
>>> curator-viewer -h
usage: curator-viewer [-h] [--host HOST] [--port PORT] [--verbose]

Curator Viewer

options:
  -h, --help     show this help message and exit
  --host HOST    Host to run the server on (default: localhost)
  --port PORT    Port to run the server on (default: 3000)
  --verbose, -v  Enables debug logging for more verbose output
```

The only requirement for running `curator-viewer` is to install node. You can install them by following the instructions [here](https://nodejs.org/en/download/package-manager).

For example, to check if you have node installed, you can run:

```bash
node -v
```

If it's not installed, installing latest node on MacOS, you can run:

```bash
# installs nvm (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.0/install.sh | bash
# download and install Node.js (you may need to restart the terminal)
nvm install 22
# verifies the right Node.js version is in the environment
node -v # should print `v22.11.0`
# verifies the right npm version is in the environment
npm -v # should print `10.9.0`
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bespokelabsai/curator",
    "name": "bespokelabs-curator",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "ai, curator, bespoke",
    "author": "Bespoke Labs",
    "author_email": "company@bespokelabs.ai",
    "download_url": "https://files.pythonhosted.org/packages/7e/72/278d426cfd256de759b3a5bb7dc625924eb3fd02050eb3b9ca833d2ee50d/bespokelabs_curator-0.1.9.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <a href=\"https://bespokelabs.ai/\" target=\"_blank\">\n    <picture>\n      <source media=\"(prefers-color-scheme: light)\" width=\"80\" srcset=\"https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red.png\">\n      <img alt=\"Bespoke Labs Logo\" width=\"80\" src=\"https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red-on-Black.png\">\n    </picture>\n  </a>\n</p>\n\n<h1 align=\"center\">Bespoke Labs Curator</h1>\n<h3 align=\"center\" style=\"font-size: 20px; margin-bottom: 4px\">Data Curation for Post-Training & Structured Data Extraction</h3>\n<br/>\n<p align=\"center\">\n  <a href=\"https://docs.bespokelabs.ai/\">\n    <img alt=\"Static Badge\" src=\"https://img.shields.io/badge/Docs-docs.bespokelabs.ai-blue?style=flat&link=https%3A%2F%2Fdocs.bespokelabs.ai\">\n  </a>\n  <a href=\"https://bespokelabs.ai/\">\n    <img alt=\"Site\" src=\"https://img.shields.io/badge/Site-bespokelabs.ai-blue?link=https%3A%2F%2Fbespokelabs.ai\"/>\n  </a>\n  <img alt=\"PyPI - Version\" src=\"https://img.shields.io/pypi/v/bespokelabs-curator\">\n  <a href=\"https://twitter.com/bespokelabsai\">\n    <img src=\"https://img.shields.io/twitter/follow/bespokelabsai\" alt=\"Follow on X\" />\n  </a>\n  <a href=\"https://discord.gg/KqpXvpzVBS\">\n    <img alt=\"Discord\" src=\"https://img.shields.io/discord/1230990265867698186\">\n  </a>\n</p>\n\n\n### Installation\n\n```bash\npip install bespokelabs-curator\n```\n\n### Usage\n\n```python\nfrom bespokelabs import curator\nfrom datasets import Dataset\nfrom pydantic import BaseModel, Field\nfrom typing import List\n\n# Create a dataset object for the topics you want to create the poems.\ntopics = Dataset.from_dict({\"topic\": [\n    \"Urban loneliness in a bustling city\",\n    \"Beauty of Bespoke Labs's Curator library\"\n]})\n\n# Define a class to encapsulate a list of poems.\nclass Poem(BaseModel):\n    poem: str = Field(description=\"A poem.\")\n\nclass Poems(BaseModel):\n    poems_list: List[Poem] = Field(description=\"A list of poems.\")\n\n\n# We define a Prompter that generates poems which gets applied to the topics dataset.\npoet = curator.Prompter(\n    # `prompt_func` takes a row of the dataset as input.\n    # `row` is a dictionary with a single key 'topic' in this case.\n    prompt_func=lambda row: f\"Write two poems about {row['topic']}.\",\n    model_name=\"gpt-4o-mini\",\n    response_format=Poems,\n    # `row` is the input row, and `poems` is the `Poems` class which \n    # is parsed from the structured output from the LLM.\n    parse_func=lambda row, poems: [\n        {\"topic\": row[\"topic\"], \"poem\": p.poem} for p in poems.poems_list\n    ],\n)\n\npoem = poet(topics)\nprint(poem.to_pandas())\n# Example output:\n#                                       topic                                               poem\n# 0       Urban loneliness in a bustling city  In the city's heart, where the sirens wail,\\nA...\n# 1       Urban loneliness in a bustling city  City streets hum with a bittersweet song,\\nHor...\n# 2  Beauty of Bespoke Labs's Curator library  In whispers of design and crafted grace,\\nBesp...\n# 3  Beauty of Bespoke Labs's Curator library  In the hushed breath of parchment and ink,\\nBe...\n```\nNote that `topics` can be created with `curator.Prompter` as well,\nand we can scale this up to create tens of thousands of diverse poems.\nYou can see a more detailed example in the [examples/poem.py](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples/poem.py) file,\nand other examples in the [examples](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples) directory.\n\nTo run the examples, make sure to set your OpenAI API key in \nthe environment variable `OPENAI_API_KEY` by running `export OPENAI_API_KEY=sk-...` in your terminal.\n\nSee the [docs](https://docs.bespokelabs.ai/) for more details as well as \nfor troubleshooting information.\n\n## Bespoke Curator Viewer\n\nTo run the bespoke dataset viewer:\n\n```bash\ncurator-viewer\n```\n\nThis will pop up a browser window with the viewer running on `127.0.0.1:3000` by default if you haven't specified a different host and port.\n\n\nOptional parameters to run the viewer on a different host and port:\n```bash\n>>> curator-viewer -h\nusage: curator-viewer [-h] [--host HOST] [--port PORT] [--verbose]\n\nCurator Viewer\n\noptions:\n  -h, --help     show this help message and exit\n  --host HOST    Host to run the server on (default: localhost)\n  --port PORT    Port to run the server on (default: 3000)\n  --verbose, -v  Enables debug logging for more verbose output\n```\n\nThe only requirement for running `curator-viewer` is to install node. You can install them by following the instructions [here](https://nodejs.org/en/download/package-manager).\n\nFor example, to check if you have node installed, you can run:\n\n```bash\nnode -v\n```\n\nIf it's not installed, installing latest node on MacOS, you can run:\n\n```bash\n# installs nvm (Node Version Manager)\ncurl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.0/install.sh | bash\n# download and install Node.js (you may need to restart the terminal)\nnvm install 22\n# verifies the right Node.js version is in the environment\nnode -v # should print `v22.11.0`\n# verifies the right npm version is in the environment\nnpm -v # should print `10.9.0`\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Bespoke Labs Curator",
    "version": "0.1.9",
    "project_urls": {
        "Homepage": "https://github.com/bespokelabsai/curator",
        "Repository": "https://github.com/bespokelabsai/curator"
    },
    "split_keywords": [
        "ai",
        " curator",
        " bespoke"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6f5e729ca606192306d1b42bd5dcf9da09047c66f27be1a867806f15904e1c24",
                "md5": "c85c7382d3b3cc771a43cc9a6b5bb0b2",
                "sha256": "22703e9aec388a79e32aff00437b44a84be868eb103bfd7c9144f593bd2f57a1"
            },
            "downloads": -1,
            "filename": "bespokelabs_curator-0.1.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c85c7382d3b3cc771a43cc9a6b5bb0b2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 1155974,
            "upload_time": "2024-11-16T00:14:46",
            "upload_time_iso_8601": "2024-11-16T00:14:46.076508Z",
            "url": "https://files.pythonhosted.org/packages/6f/5e/729ca606192306d1b42bd5dcf9da09047c66f27be1a867806f15904e1c24/bespokelabs_curator-0.1.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7e72278d426cfd256de759b3a5bb7dc625924eb3fd02050eb3b9ca833d2ee50d",
                "md5": "be1ff0f2838a1840fb3fc21a3c9334f4",
                "sha256": "775a6c5ec22066ed4547e1d837c6c37b1fa2cbcd46f5d2b164383212e990cfc6"
            },
            "downloads": -1,
            "filename": "bespokelabs_curator-0.1.9.tar.gz",
            "has_sig": false,
            "md5_digest": "be1ff0f2838a1840fb3fc21a3c9334f4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 1092645,
            "upload_time": "2024-11-16T00:14:48",
            "upload_time_iso_8601": "2024-11-16T00:14:48.176293Z",
            "url": "https://files.pythonhosted.org/packages/7e/72/278d426cfd256de759b3a5bb7dc625924eb3fd02050eb3b9ca833d2ee50d/bespokelabs_curator-0.1.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-16 00:14:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bespokelabsai",
    "github_project": "curator",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "bespokelabs-curator"
}
        
Elapsed time: 1.19262s