wopo

Name	wopo JSON
Version	0.0.2 JSON
	download
home_page	https://github.com/wordlabs-io/wopo
Summary	wordlabs open prompt optimiser: Automatic prompt optimisation
upload_time	2023-08-07 10:30:48
maintainer
docs_url	None
author	wordlabs.io
requires_python
license	LICENSE.txt
keywords	python nlp prompt optimisation
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # wordlabs.io Open Prompt Optimiser (or WOPO)

## Need I introduce prompt optimisation?
Large Language Models (LLMs) are AI agents capable of performing specific tasks via natural language instruction on text based information. 

This makes them incredibly powerful. However, these machines lack idempotency 

> Idempotency: Given a question, the answer will always be the same. In math, 1 + 1 is always 2.
> The result of the add operation on any two real numbers is always idempotent.
> This, however, is not the case with LLMs, which may produce different results for the same question if asked twice

Additionally, finding the best possible prompt that incorporates a lot of different aspects of thinking is a labour of effort, not of intellect. This task can be better performed by prompt optimisation libraries. 

## WOPO: Usage 
### Prerequisites
WOPO uses Prefect for orchestration, however, PyPi was unable to find the right versions for installation. 
If not already installed, use the command below
```
pip install prefect==2.11.2
```
Then install the WOPO library 
```
pip install wopo==0.0.1
```
> Please note that this library is still in alpha release, code for WOPO will be changing rapidly in the coming months.
> If you face any issues, make sure to drop a message here!

 ### Usage
```python
from wopo import WOPO

"""
For prompt optimisation, we basically need three things:
1. Initial Prompt
2. A set of context vs output (i.e. if operation prompt was performed on context, what would be the correct output>)
3. A selection strategy (to decide how to choose the right prompt

You will also need to pass a function that takes in a string and sends it to the LLM and returns the string.
Pass this function in the keyword argument text_gen_func
This keeps things simple and allows you to write any kind of function you'd like to interact with your LLM
"""
import openai 

openai.organisation = 'ORG-ID'
openai.api_key = 'API_KEY'

def generate_func(prompt, return_explanation = False):
    response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
            {"role": "system", "content": prompt}
        ]
    )
    resp_text = response['choices'][0]['message']['content']
    return resp_text

prompt = "Remove vowels"

#List of ip/op pairs with labels context and output
data = [
    {
        "context": "Sentence",
        "output" : "Sntnc"
    },
    {
        "context": "This is a sentence",
        "output" : "Ths s sntnc"
    }
]

#Initialise optimiser
y = WOPO(prompt, data, generate_func)

"""
num_iters: number of times we perform optimisation
num_steps_per_iter: number of times the prompt is updated in each step
top_k: at the end of each step, how many best prompts are selected to be merged into one
(this is being done so that the prompt generalises over multiple cases instead of specialising for one)

Returns:
optimal_prompt
results: scores of each step 
agent_states: complete logs of how each step changed the prompt and related feedback
"""
optimal_prompt, results, agent_states = y.run_optimisation(num_iters = 5, num_step_per_iter = 1, top_k = 2)

"""
You can also run simple tests to analyse how well the new prompt is working
The below function will return a Pandas DataFrame containing all the relevant information,
and also save the file to specified save location 
"""
test = y.run_test(some_other_data, save_location = "test_result.csv")
 ```

If you only have one ip/op pair, use ```WOPO.run_single_chain_optimisation()``` function. You may additionally specify the ```stop_at_score``` criteria (between 0-100) at which the chain can stop early

### Available selection strategies
1. Max: Choose the prompt that gets closest to the answer ```WOPO(strategy = 'max')```
2. Top K: Combine the prompts from the top k highest scoring prompts
   ```python
   WOPO(strategy = 'top_k')
   WOPO.run_optimisation(top_k = top_k)
   ```
3. Random Selection from Top K: Given top k scoring prompts, select n random prompts from the top k prompts
   ```python
   WOPO(strategy = 'random_from_top_k')
   WOPO.run_optimisation(top_k = top_k, random_sample_size = random_sample_size)
   ```
### Prompt Minification
Optimal prompts offer better accuracy in terms of output, but they tend to be verbose. Most LLMs are not cheap to operate, and it is best to use the fewest number of tokens possible in prompting. To reduce the number of tokens being used, we can simply find which words are most likely to be implicitly understood by the LLM even if they are removed.
> For example, if I were to say 'The quick brown fox' you immediately think of 'jumps over the lazy dog', even though 'ate all my peanut butter' was also a valid sentence
The process is called Entropy Minification.

```python
"""
Specify the model name (default: bert-base-uncased) from HuggingFace Transformers library and provide a percentile score (default: 0.1).
The tokens falling in the top percentile score of likelihood will be removed 
"""
optimal_prompt, _, _ = WOPO.run_optimisation()
minified_prompt = WOPO.minify(model_name = model_name, percentile = percentile)
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wordlabs-io/wopo",
    "name": "wopo",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,nlp,prompt,optimisation",
    "author": "wordlabs.io",
    "author_email": "<tanishk.kithannae@wordlabs.io>",
    "download_url": "https://files.pythonhosted.org/packages/8f/36/bed6398980b38c95599e44709decffaaf49f172117a2db16151bbb523789/wopo-0.0.2.tar.gz",
    "platform": null,
    "description": "# wordlabs.io Open Prompt Optimiser (or WOPO)\r\n\r\n## Need I introduce prompt optimisation?\r\nLarge Language Models (LLMs) are AI agents capable of performing specific tasks via natural language instruction on text based information. \r\n\r\nThis makes them incredibly powerful. However, these machines lack idempotency \r\n\r\n> Idempotency: Given a question, the answer will always be the same. In math, 1 + 1 is always 2.\r\n> The result of the add operation on any two real numbers is always idempotent.\r\n> This, however, is not the case with LLMs, which may produce different results for the same question if asked twice\r\n\r\nAdditionally, finding the best possible prompt that incorporates a lot of different aspects of thinking is a labour of effort, not of intellect. This task can be better performed by prompt optimisation libraries. \r\n\r\n## WOPO: Usage \r\n### Prerequisites\r\nWOPO uses Prefect for orchestration, however, PyPi was unable to find the right versions for installation. \r\nIf not already installed, use the command below\r\n```\r\npip install prefect==2.11.2\r\n```\r\nThen install the WOPO library \r\n```\r\npip install wopo==0.0.1\r\n```\r\n> Please note that this library is still in alpha release, code for WOPO will be changing rapidly in the coming months.\r\n> If you face any issues, make sure to drop a message here!\r\n\r\n ### Usage\r\n```python\r\nfrom wopo import WOPO\r\n\r\n\"\"\"\r\nFor prompt optimisation, we basically need three things:\r\n1. Initial Prompt\r\n2. A set of context vs output (i.e. if operation prompt was performed on context, what would be the correct output>)\r\n3. A selection strategy (to decide how to choose the right prompt\r\n\r\nYou will also need to pass a function that takes in a string and sends it to the LLM and returns the string.\r\nPass this function in the keyword argument text_gen_func\r\nThis keeps things simple and allows you to write any kind of function you'd like to interact with your LLM\r\n\"\"\"\r\nimport openai \r\n\r\nopenai.organisation = 'ORG-ID'\r\nopenai.api_key = 'API_KEY'\r\n\r\ndef generate_func(prompt, return_explanation = False):\r\n    response = openai.ChatCompletion.create(\r\n    model=\"gpt-3.5-turbo\",\r\n    messages=[\r\n            {\"role\": \"system\", \"content\": prompt}\r\n        ]\r\n    )\r\n    resp_text = response['choices'][0]['message']['content']\r\n    return resp_text\r\n\r\nprompt = \"Remove vowels\"\r\n\r\n#List of ip/op pairs with labels context and output\r\ndata = [\r\n    {\r\n        \"context\": \"Sentence\",\r\n        \"output\" : \"Sntnc\"\r\n    },\r\n    {\r\n        \"context\": \"This is a sentence\",\r\n        \"output\" : \"Ths s sntnc\"\r\n    }\r\n]\r\n\r\n#Initialise optimiser\r\ny = WOPO(prompt, data, generate_func)\r\n\r\n\"\"\"\r\nnum_iters: number of times we perform optimisation\r\nnum_steps_per_iter: number of times the prompt is updated in each step\r\ntop_k: at the end of each step, how many best prompts are selected to be merged into one\r\n(this is being done so that the prompt generalises over multiple cases instead of specialising for one)\r\n\r\nReturns:\r\noptimal_prompt\r\nresults: scores of each step \r\nagent_states: complete logs of how each step changed the prompt and related feedback\r\n\"\"\"\r\noptimal_prompt, results, agent_states = y.run_optimisation(num_iters = 5, num_step_per_iter = 1, top_k = 2)\r\n\r\n\"\"\"\r\nYou can also run simple tests to analyse how well the new prompt is working\r\nThe below function will return a Pandas DataFrame containing all the relevant information,\r\nand also save the file to specified save location \r\n\"\"\"\r\ntest = y.run_test(some_other_data, save_location = \"test_result.csv\")\r\n ```\r\n\r\nIf you only have one ip/op pair, use ```WOPO.run_single_chain_optimisation()``` function. You may additionally specify the ```stop_at_score``` criteria (between 0-100) at which the chain can stop early\r\n\r\n### Available selection strategies\r\n1. Max: Choose the prompt that gets closest to the answer ```WOPO(strategy = 'max')```\r\n2. Top K: Combine the prompts from the top k highest scoring prompts\r\n   ```python\r\n   WOPO(strategy = 'top_k')\r\n   WOPO.run_optimisation(top_k = top_k)\r\n   ```\r\n3. Random Selection from Top K: Given top k scoring prompts, select n random prompts from the top k prompts\r\n   ```python\r\n   WOPO(strategy = 'random_from_top_k')\r\n   WOPO.run_optimisation(top_k = top_k, random_sample_size = random_sample_size)\r\n   ```\r\n### Prompt Minification\r\nOptimal prompts offer better accuracy in terms of output, but they tend to be verbose. Most LLMs are not cheap to operate, and it is best to use the fewest number of tokens possible in prompting. To reduce the number of tokens being used, we can simply find which words are most likely to be implicitly understood by the LLM even if they are removed.\r\n> For example, if I were to say 'The quick brown fox' you immediately think of 'jumps over the lazy dog', even though 'ate all my peanut butter' was also a valid sentence\r\nThe process is called Entropy Minification.\r\n\r\n```python\r\n\"\"\"\r\nSpecify the model name (default: bert-base-uncased) from HuggingFace Transformers library and provide a percentile score (default: 0.1).\r\nThe tokens falling in the top percentile score of likelihood will be removed \r\n\"\"\"\r\noptimal_prompt, _, _ = WOPO.run_optimisation()\r\nminified_prompt = WOPO.minify(model_name = model_name, percentile = percentile)\r\n```\r\n\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": "LICENSE.txt",
    "summary": "wordlabs open prompt optimiser: Automatic prompt optimisation",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/wordlabs-io/wopo"
    },
    "split_keywords": [
        "python",
        "nlp",
        "prompt",
        "optimisation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "59ed894a4d826a93178025189ac2248c77dcc962c5475566220d5554c48f1081",
                "md5": "4f3bf291b63551aae1019e4af2f90bea",
                "sha256": "18d6e1c0ab4800f46b197630faea50bd898d807d71039c7e89b40983712454cc"
            },
            "downloads": -1,
            "filename": "wopo-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4f3bf291b63551aae1019e4af2f90bea",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 10213,
            "upload_time": "2023-08-07T10:30:46",
            "upload_time_iso_8601": "2023-08-07T10:30:46.385320Z",
            "url": "https://files.pythonhosted.org/packages/59/ed/894a4d826a93178025189ac2248c77dcc962c5475566220d5554c48f1081/wopo-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8f36bed6398980b38c95599e44709decffaaf49f172117a2db16151bbb523789",
                "md5": "90784f95f0bda388dfa57672339164d9",
                "sha256": "44dc6105b0fd46a76a4514ae526a15d8c8052817953bbbbac7668466561d7bc2"
            },
            "downloads": -1,
            "filename": "wopo-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "90784f95f0bda388dfa57672339164d9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 11397,
            "upload_time": "2023-08-07T10:30:48",
            "upload_time_iso_8601": "2023-08-07T10:30:48.505118Z",
            "url": "https://files.pythonhosted.org/packages/8f/36/bed6398980b38c95599e44709decffaaf49f172117a2db16151bbb523789/wopo-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-07 10:30:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wordlabs-io",
    "github_project": "wopo",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "wopo"
}

wordlabs.io