parallel-parrot

Name	parallel-parrot JSON
Version	0.9.0 JSON
	download
home_page	https://github.com/novex-ai/parallel-parrot
Summary	A library for easily and quickly using LLMs on tabular data
upload_time	2023-11-07 19:10:22
maintainer
docs_url	None
author	Brad Ito
requires_python	>=3.9,<3.12
license	MIT
keywords	generative ai pandas llm parallel openai
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # parallel-parrot

A Python library for easily and quickly using LLMs on tabular data.  Because synchronous for-loops are too slow, and parallelism can be a pain.

[![PyPI version](https://badge.fury.io/py/parallel-parrot.svg)](https://badge.fury.io/py/parallel-parrot)
[![Release Notes](https://img.shields.io/github/release/novex-ai/parallel-parrot)](https://github.com/novex-ai/parallel-parrot/releases)
[![pytest](https://github.com/novex-ai/parallel-parrot/actions/workflows/pytest.yml/badge.svg?branch=main)](https://github.com/novex-ai/parallel-parrot/actions/workflows/pytest.yml)
[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)

![flock of parrots](https://res.cloudinary.com/dn7sohze7/image/upload/v1695421900/parallel-parrot/0002-gareth-davies-EGcfyDiUv58-unsplash.jpg)

*Photo by [Gareth Davies](https://unsplash.com/@gdfoto?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)
 on [Unsplash](https://unsplash.com/photos/EGcfyDiUv58?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)* 

See our [blog post](https://bradito.me/blog/parallel-parrot/) on why we built this.

Use cases:
- Generate questions from documents for better Retrieval Augmented Generation (match questions to questions, not documents)
- Sentiment analysis or summarization on a large number of documents
- Data extraction and summarization
- Removal of personal identifiers
- Generate instructions from documents for fine tuning of LLMs

Main Features:
- Supports both pandas dataframes and native python lists of dictionaries
- Supports OpenAI Chat Completion API, with structured output "functions" (more LLMs planned in the future) - including supporting OpenAI JSON mode
- Output formatted data for fine-tuning

Other Features:
- Fast asynchronous (parallel) requests using aiohttp and uvloop, with support for notebook environments that have an existing event loop
- Python logging support (e.g. `logging.basicConfig(level=logging.DEBUG)`)
- Automatic retries, with exponential backoff, jitter, and dynamic header-based delays
- Flexible prompt templates using standard Python [string.Template](https://docs.python.org/3/library/string.html#string.Template).  e.g. `"summarize: ${input}"`
- "Batteries included" with pre-engineered prompt templates
- Programmatic tracking of token usage to support cost control measures.
- Supports `pandas` 1.x and 2.x APIs


## Getting Started

```python
pip install parallel-parrot
```

Define an API configuration object:
```python
import parallel_parrot as pp

config = pp.OpenAIChatCompletionConfig(
    openai_api_key="*your API key*",
    model="gpt-3.5-turbo"
)
```

see the [declaration](https://github.com/novex-ai/parallel-parrot/blob/v0.3.2/parallel_parrot/types.py#L27) of `OpenAIChatCompletionConfig` for more available parameters, including the `system_message`.
All [Open API parameters](https://platform.openai.com/docs/api-reference/chat/create) can be passed.  Note that only models supported by the [OpenAI Chat Completions API](https://platform.openai.com/docs/guides/gpt/gpt-models) can be used with this configuration.

## Generate Text - pp.parallel_text_generation()

This function executes parallel text generation/completion using a LLM.

It does so by:
- Taking in a dataframe or list of dictionaries.
- Applying the python prompt template to each row.  Column names are used as the variable names in the template.
- Calling the LLM API with the prompt for each row.  Runs a single request first, for two reasons:
  - Test access to the API, including credentials, without retries or complicated calling mechanics.
  - Uses that request to automatically obtain [rate limit information](https://platform.openai.com/docs/guides/rate-limits) from the OpenAI API to configure the parallel requests to run with maximum concurrency.
- Appending the output to the input dataframe or list of dictionaries using the output_key.
- Input values are passed through to the outputs to permit custom logic.

Example of `pp.parallel_text_generation()`:
```python
import json
import parallel_parrot as pp


config = pp.OpenAIChatCompletionConfig(
    openai_api_key="*your API key*",
    model="gpt-3.5-turbo",
    system_message="you are a very precise assistant",
)


input_data = [
    {
        "input": "this is a super duper product that will change the world",
        "source": "shopify",
    },
    {
        "input": "this is a horrible product that does not work",
        "source": "amazon"
    }
]

(output, usage_stats) = pp.run_async(
    pp.parallel_text_generation(
        config=config,
        input_data=input_data,
        prompt_template="""
What is the sentiment of this product review?
POSITIVE, NEUTRAL or NEGATIVE?
product review: ${input}
sentiment:""",
        output_key="sentiment",
    )
)

print(json.dumps(output, indent=2))
print(json.dumps(usage_stats, indent=2))
```

example output:
```json
[
    {
        "input": "this is a super duper product that will change the world",
        "source": "shopify",
        "sentiment": "POSITIVE",
    },
    {
        "input": "this is a horrible product that does not work",
        "source": "amazon",
        "sentiment": "NEGATIVE",
    }
]
```

- If the LLM generates multiple outputs (n > 1 for OpenAI), outputs are deduped, then exploded.  Outputs may then contain more rows than the input.
- If no output is generated, then `None` (for lists of dictionaries) or `math.nan` (for pandas) is returned.
- See the [prompt_templates](https://github.com/novex-ai/parallel-parrot/blob/v0.3.2/parallel_parrot/prompt_templates.py) for some pre-engineered templates.

## Generate Data - pp.parallel_data_generation()

Some use-cases are more demanding than the above, and require more complicated outputs.
This function supports prompts which expect to generate lists of dictionaries.

Some examples of these use cases include:
- Generating multiple question/answer pairs from each input document
- Generating multiple title/summary pairs from each input document

It does so by:
- Taking in a pandas dataframe or list of dictionaries
- Applying the python prompt template to each row.  Column names are used as the variable names in the template.
- Generating an API call specifying that we want a list of objects,
  with each object containing values for each of the output_key_names. (OpenAI "functions")
- Calling the LLM API with the prompt for each row
- Parsing the returned JSON data into a list of dictionaries, and retrying on invalid JSON
- Mapping each returned dictionary to a row in the output dataframe or list of dictionaries.  This will result in "exploded" output, where
  the output will contain more than one row for a given input.

Example of `pp.parallel_data_generation()`:
```python
import json
import parallel_parrot as pp

config = pp.OpenAIChatCompletionConfig(
    openai_api_key="*your API key*",
    model="gpt-3.5-turbo-1106",
    n=3,  # tip: to generate many creative outputs, it often makes sense to use n > 1
)

input_data = [
    {
        "text": """
George Washington (February 22, 1732 - December 14, 1799) was an American military officer, statesman, and Founding Father who served as the first president of the United States from 1789 to 1797. Appointed by the Second Continental Congress as commander of the Continental Army in June 1775, Washington led Patriot forces to victory in the American Revolutionary War and then served as president of the Constitutional Convention in 1787, which drafted and ratified the Constitution of the United States and established the American federal government. Washington has thus been called the "Father of his Country".
        """,
        "source_url": "https://en.wikipedia.org/wiki/George_Washington",
    },
    {
        "text": """
John Adams (October 30, 1735 - July 4, 1826) was an American statesman, attorney, diplomat, writer, and Founding Father who served as the second president of the United States from 1797 to 1801. Before his presidency, he was a leader of the American Revolution that achieved independence from Great Britain. During the latter part of the Revolutionary War and in the early years of the new nation, he served the U.S. government as a senior diplomat in Europe. Adams was the first person to hold the office of vice president of the United States, serving from 1789 to 1797. He was a dedicated diarist and regularly corresponded with important contemporaries, including his wife and adviser Abigail Adams and his friend and political rival Thomas Jefferson.
        """,
        "source_url": "https://en.wikipedia.org/wiki/John_Adams",
    },
]

(output, usage_stats) = pp.run_async(
    pp.parallel_data_generation(
        config=config,
        input_data=input_data,
        prompt_template="""
Generate question and answer pairs from the following document.
Output a list of JSON objects with keys "question" and "answer".
Only output questions and answers clearly described in the document.
If there are no questions and answers, output an empty list.
document: ${text}
        """,
        output_key_names=["question", "answer"]
    )
)

print(json.dumps(output, indent=2))
print(json.dumps(usage_stats, indent=2))
```

example output:
```json
[
  {
    "text": "...",
    "source_url": "https://en.wikipedia.org/wiki/George_Washington",
    "question": "Who was the first president of the United States?",
    "answer": "George Washington"
  },
  {
    "text": "...",
    "source_url": "https://en.wikipedia.org/wiki/George_Washington",
    "question": "What position did George Washington hold during the American Revolutionary War?",
    "answer": "Commander of the Continental Army"
  },
  {
    "text": "...",
    "source_url": "https://en.wikipedia.org/wiki/George_Washington",
    "question": "What document did George Washington help draft and ratify?",
    "answer": "The Constitution of the United States"
  },
  // ...
  {
    "text": "...",
    "source_url": "https://en.wikipedia.org/wiki/John_Adams",
    "question": "Who were some important contemporaries that John Adams corresponded with?",
    "answer": "Adams regularly corresponded with important contemporaries, including his wife and adviser Abigail Adams and his friend and political rival Thomas Jefferson."
  },
  {
    "text": "...",
    "source_url": "https://en.wikipedia.org/wiki/John_Adams",
    "question": "Who was John Adams?",
    "answer": "John Adams was an American statesman, attorney, diplomat, writer, and Founding Father."
  },
]
```

Notice that multiple output rows are created for each input, based on what the LLM returns.  All input columns/keys are retained, to permit integration (joining) with other code.

If more than one LLM continuation/response is generated per prompt (e.g. `n` > 1 for OpenAI), then these
outputs are put into additional rows.

If no output is generated (an empty list, or an empty string, or malformed JSON), then `None` (for lists of dictionaries) or `math.nan` (for pandas dataframes) is returned for each key in `output_key_names`.

## Prepare Fine-Tuning Data for OpenAI - pp.write_openai_fine_tuning_jsonl()

If you need to do [OpenAI Fine Tuning](https://platform.openai.com/docs/guides/fine-tuning) - but find it a pain to
split your data at appropriate token counts in jsonl format, the `parrallel-parrot` can help with this as well.

```python
import json
import parallel_parrot as pp

input_data = [
  {
    "question": "Who was the first president of the United States?",
    "answer": "George Washington"
  },
  {
    "question": "What position did George Washington hold during the American Revolutionary War?",
    "answer": "Commander of the Continental Army"
  },
  {
    "question": "What document did George Washington help draft and ratify?",
    "answer": "The Constitution of the United States"
  },
]

paths = pp.write_openai_fine_tuning_jsonl(
    input_data=input_data,
    prompt_key="question",
    completion_key="answer",
    system_message="",
    model="gpt-3.5-turbo-0613",  # used to calculate token counts
    output_file_prefix="/tmp/parallel_parrot/test_fine_tuning",
)
print(json.dumps(paths, indent=2, default=str))
```

This will create files that can be sent directly to the [OpenAI Fine Tuning API](https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset).  Doing so with this example will result in an LLM which knows more than the average parrot about presidents of the USA.

example output paths:
```
/tmp/parallel_parrot/test_fine_tuning.00001.jsonl
/tmp/parallel_parrot/test_fine_tuning.00002.jsonl
```

## Advanced Configuration

The OpenAI `config` object has number of optional parameters:

```python
import parallel_parrot as pp

config = pp.OpenAIChatCompletionConfig(
    openai_api_key="*your API key*",  # required
    model="gpt-3.5-turbo",  # required
    openai_org_id=None,
    system_message=None,
    temperature=None,
    top_p=None,
    n=None,
    max_tokens=None,
    presence_penalty=None,
    frequency_penalty=None,
    logit_bias=None,
    user=None,
    token_limit_mode=pp.TokenLimitMode.RAISE_ERROR
)

```

See [https://platform.openai.com/docs/guides/gpt](https://platform.openai.com/docs/guides/gpt) for the `model` values supported by the chat completions API.  Note that [fine tuned](https://platform.openai.com/docs/guides/fine-tuning) models can be used as well.

See [https://platform.openai.com/docs/api-reference/organization-optional](https://platform.openai.com/docs/api-reference/organization-optional) for more about the `openai_org_id` parameter - which is suitable for separating costs.

See [https://platform.openai.com/docs/api-reference/chat/create](https://platform.openai.com/docs/api-reference/chat/create) for definitions of many of the other parameters.  They can be used to adjust the behavior of the LLM.

The `token_limit_mode` can accept one of two values:

- `pp.TokenLimitMode.RAISE_ERROR` (default) raises an error when the token limit of the context window is exceeded
- `pp.TokenLimitMode.TRUNCATE` - automatically truncates the prompt in response to token limit errors.  These are logged at the `logging.WARNING` log level.
- `pp.TokenLimitMode.IGNORE` - ignore the error, returning `None` and logging a warning.

---

_Note on the name of the package: It's an alliterative animal name that combines the main functionality: parallelism, with the animal that can sort-of talk: parrots (like LLMs)_

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/novex-ai/parallel-parrot",
    "name": "parallel-parrot",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<3.12",
    "maintainer_email": "",
    "keywords": "generative ai,pandas,llm,parallel,openai",
    "author": "Brad Ito",
    "author_email": "phlogisticfugu@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/00/55/27d93208142c29d8a2f195caeba6a1b2c3a61c1260df0522a6efa9aaefaf/parallel_parrot-0.9.0.tar.gz",
    "platform": null,
    "description": "# parallel-parrot\n\nA Python library for easily and quickly using LLMs on tabular data.  Because synchronous for-loops are too slow, and parallelism can be a pain.\n\n[![PyPI version](https://badge.fury.io/py/parallel-parrot.svg)](https://badge.fury.io/py/parallel-parrot)\n[![Release Notes](https://img.shields.io/github/release/novex-ai/parallel-parrot)](https://github.com/novex-ai/parallel-parrot/releases)\n[![pytest](https://github.com/novex-ai/parallel-parrot/actions/workflows/pytest.yml/badge.svg?branch=main)](https://github.com/novex-ai/parallel-parrot/actions/workflows/pytest.yml)\n[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)\n\n![flock of parrots](https://res.cloudinary.com/dn7sohze7/image/upload/v1695421900/parallel-parrot/0002-gareth-davies-EGcfyDiUv58-unsplash.jpg)\n\n*Photo by [Gareth Davies](https://unsplash.com/@gdfoto?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)\n on [Unsplash](https://unsplash.com/photos/EGcfyDiUv58?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)* \n\nSee our [blog post](https://bradito.me/blog/parallel-parrot/) on why we built this.\n\nUse cases:\n- Generate questions from documents for better Retrieval Augmented Generation (match questions to questions, not documents)\n- Sentiment analysis or summarization on a large number of documents\n- Data extraction and summarization\n- Removal of personal identifiers\n- Generate instructions from documents for fine tuning of LLMs\n\nMain Features:\n- Supports both pandas dataframes and native python lists of dictionaries\n- Supports OpenAI Chat Completion API, with structured output \"functions\" (more LLMs planned in the future) - including supporting OpenAI JSON mode\n- Output formatted data for fine-tuning\n\nOther Features:\n- Fast asynchronous (parallel) requests using aiohttp and uvloop, with support for notebook environments that have an existing event loop\n- Python logging support (e.g. `logging.basicConfig(level=logging.DEBUG)`)\n- Automatic retries, with exponential backoff, jitter, and dynamic header-based delays\n- Flexible prompt templates using standard Python [string.Template](https://docs.python.org/3/library/string.html#string.Template).  e.g. `\"summarize: ${input}\"`\n- \"Batteries included\" with pre-engineered prompt templates\n- Programmatic tracking of token usage to support cost control measures.\n- Supports `pandas` 1.x and 2.x APIs\n\n\n## Getting Started\n\n```python\npip install parallel-parrot\n```\n\nDefine an API configuration object:\n```python\nimport parallel_parrot as pp\n\nconfig = pp.OpenAIChatCompletionConfig(\n    openai_api_key=\"*your API key*\",\n    model=\"gpt-3.5-turbo\"\n)\n```\n\nsee the [declaration](https://github.com/novex-ai/parallel-parrot/blob/v0.3.2/parallel_parrot/types.py#L27) of `OpenAIChatCompletionConfig` for more available parameters, including the `system_message`.\nAll [Open API parameters](https://platform.openai.com/docs/api-reference/chat/create) can be passed.  Note that only models supported by the [OpenAI Chat Completions API](https://platform.openai.com/docs/guides/gpt/gpt-models) can be used with this configuration.\n\n## Generate Text - pp.parallel_text_generation()\n\nThis function executes parallel text generation/completion using a LLM.\n\nIt does so by:\n- Taking in a dataframe or list of dictionaries.\n- Applying the python prompt template to each row.  Column names are used as the variable names in the template.\n- Calling the LLM API with the prompt for each row.  Runs a single request first, for two reasons:\n  - Test access to the API, including credentials, without retries or complicated calling mechanics.\n  - Uses that request to automatically obtain [rate limit information](https://platform.openai.com/docs/guides/rate-limits) from the OpenAI API to configure the parallel requests to run with maximum concurrency.\n- Appending the output to the input dataframe or list of dictionaries using the output_key.\n- Input values are passed through to the outputs to permit custom logic.\n\nExample of `pp.parallel_text_generation()`:\n```python\nimport json\nimport parallel_parrot as pp\n\n\nconfig = pp.OpenAIChatCompletionConfig(\n    openai_api_key=\"*your API key*\",\n    model=\"gpt-3.5-turbo\",\n    system_message=\"you are a very precise assistant\",\n)\n\n\ninput_data = [\n    {\n        \"input\": \"this is a super duper product that will change the world\",\n        \"source\": \"shopify\",\n    },\n    {\n        \"input\": \"this is a horrible product that does not work\",\n        \"source\": \"amazon\"\n    }\n]\n\n(output, usage_stats) = pp.run_async(\n    pp.parallel_text_generation(\n        config=config,\n        input_data=input_data,\n        prompt_template=\"\"\"\nWhat is the sentiment of this product review?\nPOSITIVE, NEUTRAL or NEGATIVE?\nproduct review: ${input}\nsentiment:\"\"\",\n        output_key=\"sentiment\",\n    )\n)\n\nprint(json.dumps(output, indent=2))\nprint(json.dumps(usage_stats, indent=2))\n```\n\nexample output:\n```json\n[\n    {\n        \"input\": \"this is a super duper product that will change the world\",\n        \"source\": \"shopify\",\n        \"sentiment\": \"POSITIVE\",\n    },\n    {\n        \"input\": \"this is a horrible product that does not work\",\n        \"source\": \"amazon\",\n        \"sentiment\": \"NEGATIVE\",\n    }\n]\n```\n\n- If the LLM generates multiple outputs (n > 1 for OpenAI), outputs are deduped, then exploded.  Outputs may then contain more rows than the input.\n- If no output is generated, then `None` (for lists of dictionaries) or `math.nan` (for pandas) is returned.\n- See the [prompt_templates](https://github.com/novex-ai/parallel-parrot/blob/v0.3.2/parallel_parrot/prompt_templates.py) for some pre-engineered templates.\n\n## Generate Data - pp.parallel_data_generation()\n\nSome use-cases are more demanding than the above, and require more complicated outputs.\nThis function supports prompts which expect to generate lists of dictionaries.\n\nSome examples of these use cases include:\n- Generating multiple question/answer pairs from each input document\n- Generating multiple title/summary pairs from each input document\n\nIt does so by:\n- Taking in a pandas dataframe or list of dictionaries\n- Applying the python prompt template to each row.  Column names are used as the variable names in the template.\n- Generating an API call specifying that we want a list of objects,\n  with each object containing values for each of the output_key_names. (OpenAI \"functions\")\n- Calling the LLM API with the prompt for each row\n- Parsing the returned JSON data into a list of dictionaries, and retrying on invalid JSON\n- Mapping each returned dictionary to a row in the output dataframe or list of dictionaries.  This will result in \"exploded\" output, where\n  the output will contain more than one row for a given input.\n\nExample of `pp.parallel_data_generation()`:\n```python\nimport json\nimport parallel_parrot as pp\n\nconfig = pp.OpenAIChatCompletionConfig(\n    openai_api_key=\"*your API key*\",\n    model=\"gpt-3.5-turbo-1106\",\n    n=3,  # tip: to generate many creative outputs, it often makes sense to use n > 1\n)\n\ninput_data = [\n    {\n        \"text\": \"\"\"\nGeorge Washington (February 22, 1732 - December 14, 1799) was an American military officer, statesman, and Founding Father who served as the first president of the United States from 1789 to 1797. Appointed by the Second Continental Congress as commander of the Continental Army in June 1775, Washington led Patriot forces to victory in the American Revolutionary War and then served as president of the Constitutional Convention in 1787, which drafted and ratified the Constitution of the United States and established the American federal government. Washington has thus been called the \"Father of his Country\".\n        \"\"\",\n        \"source_url\": \"https://en.wikipedia.org/wiki/George_Washington\",\n    },\n    {\n        \"text\": \"\"\"\nJohn Adams (October 30, 1735 - July 4, 1826) was an American statesman, attorney, diplomat, writer, and Founding Father who served as the second president of the United States from 1797 to 1801. Before his presidency, he was a leader of the American Revolution that achieved independence from Great Britain. During the latter part of the Revolutionary War and in the early years of the new nation, he served the U.S. government as a senior diplomat in Europe. Adams was the first person to hold the office of vice president of the United States, serving from 1789 to 1797. He was a dedicated diarist and regularly corresponded with important contemporaries, including his wife and adviser Abigail Adams and his friend and political rival Thomas Jefferson.\n        \"\"\",\n        \"source_url\": \"https://en.wikipedia.org/wiki/John_Adams\",\n    },\n]\n\n(output, usage_stats) = pp.run_async(\n    pp.parallel_data_generation(\n        config=config,\n        input_data=input_data,\n        prompt_template=\"\"\"\nGenerate question and answer pairs from the following document.\nOutput a list of JSON objects with keys \"question\" and \"answer\".\nOnly output questions and answers clearly described in the document.\nIf there are no questions and answers, output an empty list.\ndocument: ${text}\n        \"\"\",\n        output_key_names=[\"question\", \"answer\"]\n    )\n)\n\nprint(json.dumps(output, indent=2))\nprint(json.dumps(usage_stats, indent=2))\n```\n\nexample output:\n```json\n[\n  {\n    \"text\": \"...\",\n    \"source_url\": \"https://en.wikipedia.org/wiki/George_Washington\",\n    \"question\": \"Who was the first president of the United States?\",\n    \"answer\": \"George Washington\"\n  },\n  {\n    \"text\": \"...\",\n    \"source_url\": \"https://en.wikipedia.org/wiki/George_Washington\",\n    \"question\": \"What position did George Washington hold during the American Revolutionary War?\",\n    \"answer\": \"Commander of the Continental Army\"\n  },\n  {\n    \"text\": \"...\",\n    \"source_url\": \"https://en.wikipedia.org/wiki/George_Washington\",\n    \"question\": \"What document did George Washington help draft and ratify?\",\n    \"answer\": \"The Constitution of the United States\"\n  },\n  // ...\n  {\n    \"text\": \"...\",\n    \"source_url\": \"https://en.wikipedia.org/wiki/John_Adams\",\n    \"question\": \"Who were some important contemporaries that John Adams corresponded with?\",\n    \"answer\": \"Adams regularly corresponded with important contemporaries, including his wife and adviser Abigail Adams and his friend and political rival Thomas Jefferson.\"\n  },\n  {\n    \"text\": \"...\",\n    \"source_url\": \"https://en.wikipedia.org/wiki/John_Adams\",\n    \"question\": \"Who was John Adams?\",\n    \"answer\": \"John Adams was an American statesman, attorney, diplomat, writer, and Founding Father.\"\n  },\n]\n```\n\nNotice that multiple output rows are created for each input, based on what the LLM returns.  All input columns/keys are retained, to permit integration (joining) with other code.\n\nIf more than one LLM continuation/response is generated per prompt (e.g. `n` > 1 for OpenAI), then these\noutputs are put into additional rows.\n\nIf no output is generated (an empty list, or an empty string, or malformed JSON), then `None` (for lists of dictionaries) or `math.nan` (for pandas dataframes) is returned for each key in `output_key_names`.\n\n## Prepare Fine-Tuning Data for OpenAI - pp.write_openai_fine_tuning_jsonl()\n\nIf you need to do [OpenAI Fine Tuning](https://platform.openai.com/docs/guides/fine-tuning) - but find it a pain to\nsplit your data at appropriate token counts in jsonl format, the `parrallel-parrot` can help with this as well.\n\n```python\nimport json\nimport parallel_parrot as pp\n\ninput_data = [\n  {\n    \"question\": \"Who was the first president of the United States?\",\n    \"answer\": \"George Washington\"\n  },\n  {\n    \"question\": \"What position did George Washington hold during the American Revolutionary War?\",\n    \"answer\": \"Commander of the Continental Army\"\n  },\n  {\n    \"question\": \"What document did George Washington help draft and ratify?\",\n    \"answer\": \"The Constitution of the United States\"\n  },\n]\n\npaths = pp.write_openai_fine_tuning_jsonl(\n    input_data=input_data,\n    prompt_key=\"question\",\n    completion_key=\"answer\",\n    system_message=\"\",\n    model=\"gpt-3.5-turbo-0613\",  # used to calculate token counts\n    output_file_prefix=\"/tmp/parallel_parrot/test_fine_tuning\",\n)\nprint(json.dumps(paths, indent=2, default=str))\n```\n\nThis will create files that can be sent directly to the [OpenAI Fine Tuning API](https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset).  Doing so with this example will result in an LLM which knows more than the average parrot about presidents of the USA.\n\nexample output paths:\n```\n/tmp/parallel_parrot/test_fine_tuning.00001.jsonl\n/tmp/parallel_parrot/test_fine_tuning.00002.jsonl\n```\n\n## Advanced Configuration\n\nThe OpenAI `config` object has number of optional parameters:\n\n```python\nimport parallel_parrot as pp\n\nconfig = pp.OpenAIChatCompletionConfig(\n    openai_api_key=\"*your API key*\",  # required\n    model=\"gpt-3.5-turbo\",  # required\n    openai_org_id=None,\n    system_message=None,\n    temperature=None,\n    top_p=None,\n    n=None,\n    max_tokens=None,\n    presence_penalty=None,\n    frequency_penalty=None,\n    logit_bias=None,\n    user=None,\n    token_limit_mode=pp.TokenLimitMode.RAISE_ERROR\n)\n\n```\n\nSee [https://platform.openai.com/docs/guides/gpt](https://platform.openai.com/docs/guides/gpt) for the `model` values supported by the chat completions API.  Note that [fine tuned](https://platform.openai.com/docs/guides/fine-tuning) models can be used as well.\n\nSee [https://platform.openai.com/docs/api-reference/organization-optional](https://platform.openai.com/docs/api-reference/organization-optional) for more about the `openai_org_id` parameter - which is suitable for separating costs.\n\nSee [https://platform.openai.com/docs/api-reference/chat/create](https://platform.openai.com/docs/api-reference/chat/create) for definitions of many of the other parameters.  They can be used to adjust the behavior of the LLM.\n\nThe `token_limit_mode` can accept one of two values:\n\n- `pp.TokenLimitMode.RAISE_ERROR` (default) raises an error when the token limit of the context window is exceeded\n- `pp.TokenLimitMode.TRUNCATE` - automatically truncates the prompt in response to token limit errors.  These are logged at the `logging.WARNING` log level.\n- `pp.TokenLimitMode.IGNORE` - ignore the error, returning `None` and logging a warning.\n\n---\n\n_Note on the name of the package: It's an alliterative animal name that combines the main functionality: parallelism, with the animal that can sort-of talk: parrots (like LLMs)_\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A library for easily and quickly using LLMs on tabular data",
    "version": "0.9.0",
    "project_urls": {
        "Changelog": "https://github.com/novex-ai/parallel-parrot/releases",
        "Homepage": "https://github.com/novex-ai/parallel-parrot",
        "Issues": "https://github.com/novex-ai/parallel-parrot/issues",
        "Repository": "https://github.com/novex-ai/parallel-parrot"
    },
    "split_keywords": [
        "generative ai",
        "pandas",
        "llm",
        "parallel",
        "openai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b59315107bff47be7c6d9a77043e771df2c5c63cc0abacb985aad1dda30b3f29",
                "md5": "6953f1cbf82db833cc54d30fde6e97c8",
                "sha256": "82b029a8663f7f70c21e5c166323b03d3fc546afec0fa3939e2755c637b07db5"
            },
            "downloads": -1,
            "filename": "parallel_parrot-0.9.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6953f1cbf82db833cc54d30fde6e97c8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<3.12",
            "size": 23153,
            "upload_time": "2023-11-07T19:10:20",
            "upload_time_iso_8601": "2023-11-07T19:10:20.738041Z",
            "url": "https://files.pythonhosted.org/packages/b5/93/15107bff47be7c6d9a77043e771df2c5c63cc0abacb985aad1dda30b3f29/parallel_parrot-0.9.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "005527d93208142c29d8a2f195caeba6a1b2c3a61c1260df0522a6efa9aaefaf",
                "md5": "8228280824d441971a18196338effa81",
                "sha256": "1159dba08b63fb23a278278612a1ed4019f7e930ee173d917319cd009bafaba6"
            },
            "downloads": -1,
            "filename": "parallel_parrot-0.9.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8228280824d441971a18196338effa81",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<3.12",
            "size": 22676,
            "upload_time": "2023-11-07T19:10:22",
            "upload_time_iso_8601": "2023-11-07T19:10:22.458067Z",
            "url": "https://files.pythonhosted.org/packages/00/55/27d93208142c29d8a2f195caeba6a1b2c3a61c1260df0522a6efa9aaefaf/parallel_parrot-0.9.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-07 19:10:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "novex-ai",
    "github_project": "parallel-parrot",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "parallel-parrot"
}

Brad Ito