py-bugs-open-ai

Name	py-bugs-open-ai JSON
Version	0.1.3 JSON
	download
home_page	https://github.com/valmikirao/py_bugs_open_ai
Summary	A utility to help use OpenAI to find bugs in large projects or git diffs in python code. Makes heavy use of caching to save time/money
upload_time	2023-05-26 18:52:17
maintainer
docs_url	None
author	Valmiki Rao
requires_python	>=3.8
license	GNU General Public License v3
keywords	py_bugs_open_ai
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI
coveralls test coverage	No coveralls.

            # Python Bugs OpenAI

![version](https://img.shields.io/pypi/v/py_bugs_open_ai)
![python versions](https://img.shields.io/pypi/pyversions/py_bugs_open_ai)
![build](https://img.shields.io/github/actions/workflow/status/valmikirao/py_bugs_open_ai/push-workflow.yml?branch=master)

* Free software: GNU General Public License v3
* Note for Python 3.8 and MacOS: I can't get this to work on my local machine with this combination, but it seems to
 work in ubuntu, so I'm keeping this as working for Python 3.8

A utility to help use OpenAI to find bugs in large projects or git diffs in python code.  Makes heavy use of caching to save time/money

# Table of Contents

1. [Installation](#Installation)
2. [Usage](#Usage)
3. [System Text](#SystemText)
4. [Skipping False Positives](#Skipping)
5. [Providing Examples](#Examples)
6. [TODO](#TODO)
7. [Credits](#Credits)


## Installation <a id="Installation"/>

```shell
# in local virtual env
$ pip install py-bugs-open-ai

# globally
$ pipx install py-bugs-open-ai
```

## Usage <a id="Usage"/>

```shell
# check for bugs in file
$ pybugsai foo.py

# in a repo
$ git ls-files '*.py' | pybugsai --in

# in the diff from master
$ git diff master -- '*.py' | pybugsai --diff-in
```

`pybugsai` makes heavy use of caching and you should make sure to somehow persist the cache if you run it your ci/cd

From the help:

```text
Usage: pybugsai [OPTIONS] [FILE]...

  Chunks up python files and sends the pieces to open-ai to see if it thinks
  there are any bugs in it

Options:
  -c, --config TEXT               The config file.  Overrides the [pybugsai]
                                  section in pybugsai.cfg and setup.cfg
  --files-from-stdin, --in        Take the list of files from standard in,
                                  such that you could run this script like
                                  `git ls-files -- '*.py' | pybugsai --in`
  --api-key-env-variable TEXT     The environment variable which the openai
                                  api key is stored in  [default:
                                  OPEN_AI_API_KEY]
  --model TEXT                    The openai model used  [default:
                                  gpt-3.5-turbo]
  --embeddings-model TEXT
  --max-chunk-size, --chunk INTEGER
                                  The script tries to break the python down
                                  into chunk sizes smaller than this
                                  [default: 500]
  --abs-max-chunk-size, --abs-chunk INTEGER
                                  Sometimes the script can't break up the code
                                  into chunks smaller than --max-chunk-size.
                                  This is the absolute maximum size of chunk
                                  it will send.  If a chunk is bigger than
                                  this, it will be reported as a warning or as
                                  an error if --strict-chunk-size is set.
                                  Defaults to --max-chunk-size
  --cache-dir, --cache TEXT       The cache directory [~/.pybugsai/cache]
  --refresh-cache
  --die-after INTEGER             After this many errors are found, the
                                  scripts stops running  [default: 3]
  --strict-chunk-size, --strict   If true and there is a chunk that is bigger
                                  than --abs-max-chunk-size, it will be marked
                                  as an error
  --skip-chunks TEXT              The hashes of the chunks to skip.  Can be
                                  added multiple times are be a comma-
                                  delimited list
  --diff-from-stdin, --diff-in    Be able to take `git diff` from the std-in
                                  and then only check the chunks for lines
                                  that are different
  --is-bug-re, --re TEXT          If the response from OpenAI matches this
                                  regular-expression, then it is marked as an
                                  error.  Might be necessary to change this
                                  from the default if you use a customer
                                  --system-content  [default: ^ERROR\b]
  -i, --is-bug-re-ignore-case     Ignore the case when applying the `--is-bug-
                                  re`
  -s, --system-content TEXT       The system content sent to OpenAI
  --examples-file TEXT            File containing example code and responses
                                  to guide openai in finding bugs or non-bugs.
                                  See README for format and more information
                                  [default: ~/.pybugsai/examples.yml]
  --max-tokens-to-send INTEGER    Maximum number of tokens to send to the
                                  OpenAI api, include the examples in the
                                  --examples-file.  pybugsai uses embeddings
                                  to only send the most relevant examples if
                                  it can't send them all without exceeding
                                  this count  [default: 1000]
  --help                          Show this message and exit.

```

The default for any readme can be set in the `[pybugsai]` of the config files (`pybugsai.cfg`, `setup.cfg`, or the
file specified by the `--config` option):

```text
file:                                   file
config:                                 --config, -c
files_from_stdin (true or false):       --files-from-stdin, --in
api_key_env_variable:                   --api-key-env-variable
model:                                  --model
embeddings_model:                       --embeddings-model
max_chunk_size:                         --max-chunk-size, --chunk
abs_max_chunk_size:                     --abs-max-chunk-size, --abs-chunk
cache_dir:                              --cache-dir, --cache
refresh_cache (true or false):          --refresh-cache
die_after:                              --die-after
strict_chunk_size (true or false):      --strict-chunk-size, --strict
skip_chunks:                            --skip-chunks
diff_from_stdin (true or false):        --diff-from-stdin, --diff-in
is_bug_re:                              --is-bug-re, --re
is_bug_re_ignore_case (true or false):  --is-bug-re-ignore-case, -i
system_content:                         --system-content, -s
examples_file:                          --examples-file
max_tokens_to_send:                     --max-tokens-to-send

```

## System Text <a id="SystemText"/>

The `--system-text` argument, `system_text` config variable tells OpenAI what function it should be fulfilling.  Since
the default value was too long to include in the `--help` message, here it is:

```text

```

## Skipping False Positives <a id="Skipping"/>

Sometimes, openai is smart enough to interpret comments added to the code

```python
sys.path.join(foo, bar)  # sys in imported earlier (pybugsai)
```

More reliably, you can have it skip certain chunks of code by using their hashes and the `--skip-chunks` option or
the `skip_chunks` argument in the `.cfg` file.  The hashes are reported in the output

```text
foo.py:1-51; 8a49edc09f token count: 390 - ok
foo.py:68-101; 907cf1dc2c token count: 380 - ok
foo.py:103-148; 3156754fe4 token count: 451 - error
foo.py:150-168; 91b78bdac4 token count: 183 - error
foo.py:171-172; 71daa97727 token count: 13 - ok
```

So if you wanted to skip the two above errors, you could do the following:

```text
[pybugsai]
skip_chunks = 3156754fe4,91b78bdac4
```

## Providing Examples <a id="Examples"/>

You can provide examples of potential bugs in a file.  By default, the cli looks for this file at
``, but it can also be specified with the `--examples-files` argument.  The file is a Yaml
file with the following format:

```yaml
examples:
  - code: <some code>
    response: <what you wound want OpenAI to respond with for this type of code>
  - <more examples>
```

So, for example:

```yaml
examples:
  - code: os.path.join('dir', 'file')
    response: "OK: Assume that the \"os\" module was imported above"
  - code: my_companys_module.my_companys_function(-1)
    response: "ERROR: my_companys_module.my_companys_function() errors with negative values"
```

If the token count in the query plus the `--system-text` plus the chunk size are greater than `--max-tokens-to-send`,
then the `` will use embeddings to figure out which of the examples are relevant to this particular chunk
and just send those.  Please note that standard billing applies to getting the embeddings.  The embedding results are
cached

If you don't know what embeddings are, this might help explain it:
https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb

## TODO <a id="TODO"/>

* Allow this to use LLM's besides OpenAI
* Add tooling to have some sort of remote cache, so if you run it locally then another contributor or the CI/CD can
  take advantage of the same cache

## Credits <a id="Credits"/>

Created by Valmiki Rao <valmikirao@gmail.com>

This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.

* Cookiecutter: https://github.com/audreyr/cookiecutter
* `audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/valmikirao/py_bugs_open_ai",
    "name": "py-bugs-open-ai",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "py_bugs_open_ai",
    "author": "Valmiki Rao",
    "author_email": "valmikirao@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b7/73/dce0dc689196324cd5e91abf34e8703a4e806663c1071bbb178076c62e43/py_bugs_open_ai-0.1.3.tar.gz",
    "platform": null,
    "description": "# Python Bugs OpenAI\n\n![version](https://img.shields.io/pypi/v/py_bugs_open_ai)\n![python versions](https://img.shields.io/pypi/pyversions/py_bugs_open_ai)\n![build](https://img.shields.io/github/actions/workflow/status/valmikirao/py_bugs_open_ai/push-workflow.yml?branch=master)\n\n* Free software: GNU General Public License v3\n* Note for Python 3.8 and MacOS: I can't get this to work on my local machine with this combination, but it seems to\n work in ubuntu, so I'm keeping this as working for Python 3.8\n\nA utility to help use OpenAI to find bugs in large projects or git diffs in python code.  Makes heavy use of caching to save time/money\n\n# Table of Contents\n\n1. [Installation](#Installation)\n2. [Usage](#Usage)\n3. [System Text](#SystemText)\n4. [Skipping False Positives](#Skipping)\n5. [Providing Examples](#Examples)\n6. [TODO](#TODO)\n7. [Credits](#Credits)\n\n\n## Installation <a id=\"Installation\"/>\n\n```shell\n# in local virtual env\n$ pip install py-bugs-open-ai\n\n# globally\n$ pipx install py-bugs-open-ai\n```\n\n## Usage <a id=\"Usage\"/>\n\n```shell\n# check for bugs in file\n$ pybugsai foo.py\n\n# in a repo\n$ git ls-files '*.py' | pybugsai --in\n\n# in the diff from master\n$ git diff master -- '*.py' | pybugsai --diff-in\n```\n\n`pybugsai` makes heavy use of caching and you should make sure to somehow persist the cache if you run it your ci/cd\n\nFrom the help:\n\n```text\nUsage: pybugsai [OPTIONS] [FILE]...\n\n  Chunks up python files and sends the pieces to open-ai to see if it thinks\n  there are any bugs in it\n\nOptions:\n  -c, --config TEXT               The config file.  Overrides the [pybugsai]\n                                  section in pybugsai.cfg and setup.cfg\n  --files-from-stdin, --in        Take the list of files from standard in,\n                                  such that you could run this script like\n                                  `git ls-files -- '*.py' | pybugsai --in`\n  --api-key-env-variable TEXT     The environment variable which the openai\n                                  api key is stored in  [default:\n                                  OPEN_AI_API_KEY]\n  --model TEXT                    The openai model used  [default:\n                                  gpt-3.5-turbo]\n  --embeddings-model TEXT\n  --max-chunk-size, --chunk INTEGER\n                                  The script tries to break the python down\n                                  into chunk sizes smaller than this\n                                  [default: 500]\n  --abs-max-chunk-size, --abs-chunk INTEGER\n                                  Sometimes the script can't break up the code\n                                  into chunks smaller than --max-chunk-size.\n                                  This is the absolute maximum size of chunk\n                                  it will send.  If a chunk is bigger than\n                                  this, it will be reported as a warning or as\n                                  an error if --strict-chunk-size is set.\n                                  Defaults to --max-chunk-size\n  --cache-dir, --cache TEXT       The cache directory [~/.pybugsai/cache]\n  --refresh-cache\n  --die-after INTEGER             After this many errors are found, the\n                                  scripts stops running  [default: 3]\n  --strict-chunk-size, --strict   If true and there is a chunk that is bigger\n                                  than --abs-max-chunk-size, it will be marked\n                                  as an error\n  --skip-chunks TEXT              The hashes of the chunks to skip.  Can be\n                                  added multiple times are be a comma-\n                                  delimited list\n  --diff-from-stdin, --diff-in    Be able to take `git diff` from the std-in\n                                  and then only check the chunks for lines\n                                  that are different\n  --is-bug-re, --re TEXT          If the response from OpenAI matches this\n                                  regular-expression, then it is marked as an\n                                  error.  Might be necessary to change this\n                                  from the default if you use a customer\n                                  --system-content  [default: ^ERROR\\b]\n  -i, --is-bug-re-ignore-case     Ignore the case when applying the `--is-bug-\n                                  re`\n  -s, --system-content TEXT       The system content sent to OpenAI\n  --examples-file TEXT            File containing example code and responses\n                                  to guide openai in finding bugs or non-bugs.\n                                  See README for format and more information\n                                  [default: ~/.pybugsai/examples.yml]\n  --max-tokens-to-send INTEGER    Maximum number of tokens to send to the\n                                  OpenAI api, include the examples in the\n                                  --examples-file.  pybugsai uses embeddings\n                                  to only send the most relevant examples if\n                                  it can't send them all without exceeding\n                                  this count  [default: 1000]\n  --help                          Show this message and exit.\n\n```\n\nThe default for any readme can be set in the `[pybugsai]` of the config files (`pybugsai.cfg`, `setup.cfg`, or the\nfile specified by the `--config` option):\n\n```text\nfile:                                   file\nconfig:                                 --config, -c\nfiles_from_stdin (true or false):       --files-from-stdin, --in\napi_key_env_variable:                   --api-key-env-variable\nmodel:                                  --model\nembeddings_model:                       --embeddings-model\nmax_chunk_size:                         --max-chunk-size, --chunk\nabs_max_chunk_size:                     --abs-max-chunk-size, --abs-chunk\ncache_dir:                              --cache-dir, --cache\nrefresh_cache (true or false):          --refresh-cache\ndie_after:                              --die-after\nstrict_chunk_size (true or false):      --strict-chunk-size, --strict\nskip_chunks:                            --skip-chunks\ndiff_from_stdin (true or false):        --diff-from-stdin, --diff-in\nis_bug_re:                              --is-bug-re, --re\nis_bug_re_ignore_case (true or false):  --is-bug-re-ignore-case, -i\nsystem_content:                         --system-content, -s\nexamples_file:                          --examples-file\nmax_tokens_to_send:                     --max-tokens-to-send\n\n```\n\n## System Text <a id=\"SystemText\"/>\n\nThe `--system-text` argument, `system_text` config variable tells OpenAI what function it should be fulfilling.  Since\nthe default value was too long to include in the `--help` message, here it is:\n\n```text\n\n```\n\n## Skipping False Positives <a id=\"Skipping\"/>\n\nSometimes, openai is smart enough to interpret comments added to the code\n\n```python\nsys.path.join(foo, bar)  # sys in imported earlier (pybugsai)\n```\n\nMore reliably, you can have it skip certain chunks of code by using their hashes and the `--skip-chunks` option or\nthe `skip_chunks` argument in the `.cfg` file.  The hashes are reported in the output\n\n```text\nfoo.py:1-51; 8a49edc09f token count: 390 - ok\nfoo.py:68-101; 907cf1dc2c token count: 380 - ok\nfoo.py:103-148; 3156754fe4 token count: 451 - error\nfoo.py:150-168; 91b78bdac4 token count: 183 - error\nfoo.py:171-172; 71daa97727 token count: 13 - ok\n```\n\nSo if you wanted to skip the two above errors, you could do the following:\n\n```text\n[pybugsai]\nskip_chunks = 3156754fe4,91b78bdac4\n```\n\n## Providing Examples <a id=\"Examples\"/>\n\nYou can provide examples of potential bugs in a file.  By default, the cli looks for this file at\n``, but it can also be specified with the `--examples-files` argument.  The file is a Yaml\nfile with the following format:\n\n```yaml\nexamples:\n  - code: <some code>\n    response: <what you wound want OpenAI to respond with for this type of code>\n  - <more examples>\n```\n\nSo, for example:\n\n```yaml\nexamples:\n  - code: os.path.join('dir', 'file')\n    response: \"OK: Assume that the \\\"os\\\" module was imported above\"\n  - code: my_companys_module.my_companys_function(-1)\n    response: \"ERROR: my_companys_module.my_companys_function() errors with negative values\"\n```\n\nIf the token count in the query plus the `--system-text` plus the chunk size are greater than `--max-tokens-to-send`,\nthen the `` will use embeddings to figure out which of the examples are relevant to this particular chunk\nand just send those.  Please note that standard billing applies to getting the embeddings.  The embedding results are\ncached\n\nIf you don't know what embeddings are, this might help explain it:\nhttps://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb\n\n## TODO <a id=\"TODO\"/>\n\n* Allow this to use LLM's besides OpenAI\n* Add tooling to have some sort of remote cache, so if you run it locally then another contributor or the CI/CD can\n  take advantage of the same cache\n\n## Credits <a id=\"Credits\"/>\n\nCreated by Valmiki Rao <valmikirao@gmail.com>\n\nThis package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.\n\n* Cookiecutter: https://github.com/audreyr/cookiecutter\n* `audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage\n",
    "bugtrack_url": null,
    "license": "GNU General Public License v3",
    "summary": "A utility to help use OpenAI to find bugs in large projects or git diffs in python code.  Makes heavy use of caching to save time/money",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://github.com/valmikirao/py_bugs_open_ai"
    },
    "split_keywords": [
        "py_bugs_open_ai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "be0a4ab9c45d9d61035c207556852a69ab6dda6971d80a9c45f55b3812a21bee",
                "md5": "88b840f0b902793e5caab822d4901870",
                "sha256": "3345b180616debcfb42618b208029fb0fbd55932f28eed5fc873a602ac2f3ded"
            },
            "downloads": -1,
            "filename": "py_bugs_open_ai-0.1.3-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "88b840f0b902793e5caab822d4901870",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.8",
            "size": 19576,
            "upload_time": "2023-05-26T18:52:16",
            "upload_time_iso_8601": "2023-05-26T18:52:16.214404Z",
            "url": "https://files.pythonhosted.org/packages/be/0a/4ab9c45d9d61035c207556852a69ab6dda6971d80a9c45f55b3812a21bee/py_bugs_open_ai-0.1.3-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b773dce0dc689196324cd5e91abf34e8703a4e806663c1071bbb178076c62e43",
                "md5": "b78e9df8e694f5f4a277d7174232f572",
                "sha256": "79b1a6932072c8abdd2118dae06865375b38ebf4fd35cb72b72af0c75aaa84db"
            },
            "downloads": -1,
            "filename": "py_bugs_open_ai-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "b78e9df8e694f5f4a277d7174232f572",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 20048,
            "upload_time": "2023-05-26T18:52:17",
            "upload_time_iso_8601": "2023-05-26T18:52:17.816972Z",
            "url": "https://files.pythonhosted.org/packages/b7/73/dce0dc689196324cd5e91abf34e8703a4e806663c1071bbb178076c62e43/py_bugs_open_ai-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-26 18:52:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "valmikirao",
    "github_project": "py_bugs_open_ai",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "py-bugs-open-ai"
}

Valmiki Rao