# Python Bugs OpenAI
![version](https://img.shields.io/pypi/v/py_bugs_open_ai)
![python versions](https://img.shields.io/pypi/pyversions/py_bugs_open_ai)
![build](https://img.shields.io/github/actions/workflow/status/valmikirao/py_bugs_open_ai/push-workflow.yml?branch=master)
* Free software: GNU General Public License v3
* Note for Python 3.8 and MacOS: I can't get this to work on my local machine with this combination, but it seems to
work in ubuntu, so I'm keeping this as working for Python 3.8
A utility to help use OpenAI to find bugs in large projects or git diffs in python code. Makes heavy use of caching to save time/money
# Table of Contents
1. [Installation](#Installation)
2. [Usage](#Usage)
3. [System Text](#SystemText)
4. [Skipping False Positives](#Skipping)
5. [Providing Examples](#Examples)
6. [TODO](#TODO)
7. [Credits](#Credits)
## Installation <a id="Installation"/>
```shell
# in local virtual env
$ pip install py-bugs-open-ai
# globally
$ pipx install py-bugs-open-ai
```
## Usage <a id="Usage"/>
```shell
# check for bugs in file
$ pybugsai foo.py
# in a repo
$ git ls-files '*.py' | pybugsai --in
# in the diff from master
$ git diff master -- '*.py' | pybugsai --diff-in
```
`pybugsai` makes heavy use of caching and you should make sure to somehow persist the cache if you run it your ci/cd
From the help:
```text
Usage: pybugsai [OPTIONS] [FILE]...
Chunks up python files and sends the pieces to open-ai to see if it thinks
there are any bugs in it
Options:
-c, --config TEXT The config file. Overrides the [pybugsai]
section in pybugsai.cfg and setup.cfg
--files-from-stdin, --in Take the list of files from standard in,
such that you could run this script like
`git ls-files -- '*.py' | pybugsai --in`
--api-key-env-variable TEXT The environment variable which the openai
api key is stored in [default:
OPEN_AI_API_KEY]
--model TEXT The openai model used [default:
gpt-3.5-turbo]
--embeddings-model TEXT
--max-chunk-size, --chunk INTEGER
The script tries to break the python down
into chunk sizes smaller than this
[default: 500]
--abs-max-chunk-size, --abs-chunk INTEGER
Sometimes the script can't break up the code
into chunks smaller than --max-chunk-size.
This is the absolute maximum size of chunk
it will send. If a chunk is bigger than
this, it will be reported as a warning or as
an error if --strict-chunk-size is set.
Defaults to --max-chunk-size
--cache-dir, --cache TEXT The cache directory [~/.pybugsai/cache]
--refresh-cache
--die-after INTEGER After this many errors are found, the
scripts stops running [default: 3]
--strict-chunk-size, --strict If true and there is a chunk that is bigger
than --abs-max-chunk-size, it will be marked
as an error
--skip-chunks TEXT The hashes of the chunks to skip. Can be
added multiple times are be a comma-
delimited list
--diff-from-stdin, --diff-in Be able to take `git diff` from the std-in
and then only check the chunks for lines
that are different
--is-bug-re, --re TEXT If the response from OpenAI matches this
regular-expression, then it is marked as an
error. Might be necessary to change this
from the default if you use a customer
--system-content [default: ^ERROR\b]
-i, --is-bug-re-ignore-case Ignore the case when applying the `--is-bug-
re`
-s, --system-content TEXT The system content sent to OpenAI
--examples-file TEXT File containing example code and responses
to guide openai in finding bugs or non-bugs.
See README for format and more information
[default: ~/.pybugsai/examples.yml]
--max-tokens-to-send INTEGER Maximum number of tokens to send to the
OpenAI api, include the examples in the
--examples-file. pybugsai uses embeddings
to only send the most relevant examples if
it can't send them all without exceeding
this count [default: 1000]
--help Show this message and exit.
```
The default for any readme can be set in the `[pybugsai]` of the config files (`pybugsai.cfg`, `setup.cfg`, or the
file specified by the `--config` option):
```text
file: file
config: --config, -c
files_from_stdin (true or false): --files-from-stdin, --in
api_key_env_variable: --api-key-env-variable
model: --model
embeddings_model: --embeddings-model
max_chunk_size: --max-chunk-size, --chunk
abs_max_chunk_size: --abs-max-chunk-size, --abs-chunk
cache_dir: --cache-dir, --cache
refresh_cache (true or false): --refresh-cache
die_after: --die-after
strict_chunk_size (true or false): --strict-chunk-size, --strict
skip_chunks: --skip-chunks
diff_from_stdin (true or false): --diff-from-stdin, --diff-in
is_bug_re: --is-bug-re, --re
is_bug_re_ignore_case (true or false): --is-bug-re-ignore-case, -i
system_content: --system-content, -s
examples_file: --examples-file
max_tokens_to_send: --max-tokens-to-send
```
## System Text <a id="SystemText"/>
The `--system-text` argument, `system_text` config variable tells OpenAI what function it should be fulfilling. Since
the default value was too long to include in the `--help` message, here it is:
```text
```
## Skipping False Positives <a id="Skipping"/>
Sometimes, openai is smart enough to interpret comments added to the code
```python
sys.path.join(foo, bar) # sys in imported earlier (pybugsai)
```
More reliably, you can have it skip certain chunks of code by using their hashes and the `--skip-chunks` option or
the `skip_chunks` argument in the `.cfg` file. The hashes are reported in the output
```text
foo.py:1-51; 8a49edc09f token count: 390 - ok
foo.py:68-101; 907cf1dc2c token count: 380 - ok
foo.py:103-148; 3156754fe4 token count: 451 - error
foo.py:150-168; 91b78bdac4 token count: 183 - error
foo.py:171-172; 71daa97727 token count: 13 - ok
```
So if you wanted to skip the two above errors, you could do the following:
```text
[pybugsai]
skip_chunks = 3156754fe4,91b78bdac4
```
## Providing Examples <a id="Examples"/>
You can provide examples of potential bugs in a file. By default, the cli looks for this file at
``, but it can also be specified with the `--examples-files` argument. The file is a Yaml
file with the following format:
```yaml
examples:
- code: <some code>
response: <what you wound want OpenAI to respond with for this type of code>
- <more examples>
```
So, for example:
```yaml
examples:
- code: os.path.join('dir', 'file')
response: "OK: Assume that the \"os\" module was imported above"
- code: my_companys_module.my_companys_function(-1)
response: "ERROR: my_companys_module.my_companys_function() errors with negative values"
```
If the token count in the query plus the `--system-text` plus the chunk size are greater than `--max-tokens-to-send`,
then the `` will use embeddings to figure out which of the examples are relevant to this particular chunk
and just send those. Please note that standard billing applies to getting the embeddings. The embedding results are
cached
If you don't know what embeddings are, this might help explain it:
https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb
## TODO <a id="TODO"/>
* Allow this to use LLM's besides OpenAI
* Add tooling to have some sort of remote cache, so if you run it locally then another contributor or the CI/CD can
take advantage of the same cache
## Credits <a id="Credits"/>
Created by Valmiki Rao <valmikirao@gmail.com>
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
* Cookiecutter: https://github.com/audreyr/cookiecutter
* `audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
Raw data
{
"_id": null,
"home_page": "https://github.com/valmikirao/py_bugs_open_ai",
"name": "py-bugs-open-ai",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "py_bugs_open_ai",
"author": "Valmiki Rao",
"author_email": "valmikirao@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b7/73/dce0dc689196324cd5e91abf34e8703a4e806663c1071bbb178076c62e43/py_bugs_open_ai-0.1.3.tar.gz",
"platform": null,
"description": "# Python Bugs OpenAI\n\n![version](https://img.shields.io/pypi/v/py_bugs_open_ai)\n![python versions](https://img.shields.io/pypi/pyversions/py_bugs_open_ai)\n![build](https://img.shields.io/github/actions/workflow/status/valmikirao/py_bugs_open_ai/push-workflow.yml?branch=master)\n\n* Free software: GNU General Public License v3\n* Note for Python 3.8 and MacOS: I can't get this to work on my local machine with this combination, but it seems to\n work in ubuntu, so I'm keeping this as working for Python 3.8\n\nA utility to help use OpenAI to find bugs in large projects or git diffs in python code. Makes heavy use of caching to save time/money\n\n# Table of Contents\n\n1. [Installation](#Installation)\n2. [Usage](#Usage)\n3. [System Text](#SystemText)\n4. [Skipping False Positives](#Skipping)\n5. [Providing Examples](#Examples)\n6. [TODO](#TODO)\n7. [Credits](#Credits)\n\n\n## Installation <a id=\"Installation\"/>\n\n```shell\n# in local virtual env\n$ pip install py-bugs-open-ai\n\n# globally\n$ pipx install py-bugs-open-ai\n```\n\n## Usage <a id=\"Usage\"/>\n\n```shell\n# check for bugs in file\n$ pybugsai foo.py\n\n# in a repo\n$ git ls-files '*.py' | pybugsai --in\n\n# in the diff from master\n$ git diff master -- '*.py' | pybugsai --diff-in\n```\n\n`pybugsai` makes heavy use of caching and you should make sure to somehow persist the cache if you run it your ci/cd\n\nFrom the help:\n\n```text\nUsage: pybugsai [OPTIONS] [FILE]...\n\n Chunks up python files and sends the pieces to open-ai to see if it thinks\n there are any bugs in it\n\nOptions:\n -c, --config TEXT The config file. Overrides the [pybugsai]\n section in pybugsai.cfg and setup.cfg\n --files-from-stdin, --in Take the list of files from standard in,\n such that you could run this script like\n `git ls-files -- '*.py' | pybugsai --in`\n --api-key-env-variable TEXT The environment variable which the openai\n api key is stored in [default:\n OPEN_AI_API_KEY]\n --model TEXT The openai model used [default:\n gpt-3.5-turbo]\n --embeddings-model TEXT\n --max-chunk-size, --chunk INTEGER\n The script tries to break the python down\n into chunk sizes smaller than this\n [default: 500]\n --abs-max-chunk-size, --abs-chunk INTEGER\n Sometimes the script can't break up the code\n into chunks smaller than --max-chunk-size.\n This is the absolute maximum size of chunk\n it will send. If a chunk is bigger than\n this, it will be reported as a warning or as\n an error if --strict-chunk-size is set.\n Defaults to --max-chunk-size\n --cache-dir, --cache TEXT The cache directory [~/.pybugsai/cache]\n --refresh-cache\n --die-after INTEGER After this many errors are found, the\n scripts stops running [default: 3]\n --strict-chunk-size, --strict If true and there is a chunk that is bigger\n than --abs-max-chunk-size, it will be marked\n as an error\n --skip-chunks TEXT The hashes of the chunks to skip. Can be\n added multiple times are be a comma-\n delimited list\n --diff-from-stdin, --diff-in Be able to take `git diff` from the std-in\n and then only check the chunks for lines\n that are different\n --is-bug-re, --re TEXT If the response from OpenAI matches this\n regular-expression, then it is marked as an\n error. Might be necessary to change this\n from the default if you use a customer\n --system-content [default: ^ERROR\\b]\n -i, --is-bug-re-ignore-case Ignore the case when applying the `--is-bug-\n re`\n -s, --system-content TEXT The system content sent to OpenAI\n --examples-file TEXT File containing example code and responses\n to guide openai in finding bugs or non-bugs.\n See README for format and more information\n [default: ~/.pybugsai/examples.yml]\n --max-tokens-to-send INTEGER Maximum number of tokens to send to the\n OpenAI api, include the examples in the\n --examples-file. pybugsai uses embeddings\n to only send the most relevant examples if\n it can't send them all without exceeding\n this count [default: 1000]\n --help Show this message and exit.\n\n```\n\nThe default for any readme can be set in the `[pybugsai]` of the config files (`pybugsai.cfg`, `setup.cfg`, or the\nfile specified by the `--config` option):\n\n```text\nfile: file\nconfig: --config, -c\nfiles_from_stdin (true or false): --files-from-stdin, --in\napi_key_env_variable: --api-key-env-variable\nmodel: --model\nembeddings_model: --embeddings-model\nmax_chunk_size: --max-chunk-size, --chunk\nabs_max_chunk_size: --abs-max-chunk-size, --abs-chunk\ncache_dir: --cache-dir, --cache\nrefresh_cache (true or false): --refresh-cache\ndie_after: --die-after\nstrict_chunk_size (true or false): --strict-chunk-size, --strict\nskip_chunks: --skip-chunks\ndiff_from_stdin (true or false): --diff-from-stdin, --diff-in\nis_bug_re: --is-bug-re, --re\nis_bug_re_ignore_case (true or false): --is-bug-re-ignore-case, -i\nsystem_content: --system-content, -s\nexamples_file: --examples-file\nmax_tokens_to_send: --max-tokens-to-send\n\n```\n\n## System Text <a id=\"SystemText\"/>\n\nThe `--system-text` argument, `system_text` config variable tells OpenAI what function it should be fulfilling. Since\nthe default value was too long to include in the `--help` message, here it is:\n\n```text\n\n```\n\n## Skipping False Positives <a id=\"Skipping\"/>\n\nSometimes, openai is smart enough to interpret comments added to the code\n\n```python\nsys.path.join(foo, bar) # sys in imported earlier (pybugsai)\n```\n\nMore reliably, you can have it skip certain chunks of code by using their hashes and the `--skip-chunks` option or\nthe `skip_chunks` argument in the `.cfg` file. The hashes are reported in the output\n\n```text\nfoo.py:1-51; 8a49edc09f token count: 390 - ok\nfoo.py:68-101; 907cf1dc2c token count: 380 - ok\nfoo.py:103-148; 3156754fe4 token count: 451 - error\nfoo.py:150-168; 91b78bdac4 token count: 183 - error\nfoo.py:171-172; 71daa97727 token count: 13 - ok\n```\n\nSo if you wanted to skip the two above errors, you could do the following:\n\n```text\n[pybugsai]\nskip_chunks = 3156754fe4,91b78bdac4\n```\n\n## Providing Examples <a id=\"Examples\"/>\n\nYou can provide examples of potential bugs in a file. By default, the cli looks for this file at\n``, but it can also be specified with the `--examples-files` argument. The file is a Yaml\nfile with the following format:\n\n```yaml\nexamples:\n - code: <some code>\n response: <what you wound want OpenAI to respond with for this type of code>\n - <more examples>\n```\n\nSo, for example:\n\n```yaml\nexamples:\n - code: os.path.join('dir', 'file')\n response: \"OK: Assume that the \\\"os\\\" module was imported above\"\n - code: my_companys_module.my_companys_function(-1)\n response: \"ERROR: my_companys_module.my_companys_function() errors with negative values\"\n```\n\nIf the token count in the query plus the `--system-text` plus the chunk size are greater than `--max-tokens-to-send`,\nthen the `` will use embeddings to figure out which of the examples are relevant to this particular chunk\nand just send those. Please note that standard billing applies to getting the embeddings. The embedding results are\ncached\n\nIf you don't know what embeddings are, this might help explain it:\nhttps://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb\n\n## TODO <a id=\"TODO\"/>\n\n* Allow this to use LLM's besides OpenAI\n* Add tooling to have some sort of remote cache, so if you run it locally then another contributor or the CI/CD can\n take advantage of the same cache\n\n## Credits <a id=\"Credits\"/>\n\nCreated by Valmiki Rao <valmikirao@gmail.com>\n\nThis package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.\n\n* Cookiecutter: https://github.com/audreyr/cookiecutter\n* `audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage\n",
"bugtrack_url": null,
"license": "GNU General Public License v3",
"summary": "A utility to help use OpenAI to find bugs in large projects or git diffs in python code. Makes heavy use of caching to save time/money",
"version": "0.1.3",
"project_urls": {
"Homepage": "https://github.com/valmikirao/py_bugs_open_ai"
},
"split_keywords": [
"py_bugs_open_ai"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "be0a4ab9c45d9d61035c207556852a69ab6dda6971d80a9c45f55b3812a21bee",
"md5": "88b840f0b902793e5caab822d4901870",
"sha256": "3345b180616debcfb42618b208029fb0fbd55932f28eed5fc873a602ac2f3ded"
},
"downloads": -1,
"filename": "py_bugs_open_ai-0.1.3-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "88b840f0b902793e5caab822d4901870",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 19576,
"upload_time": "2023-05-26T18:52:16",
"upload_time_iso_8601": "2023-05-26T18:52:16.214404Z",
"url": "https://files.pythonhosted.org/packages/be/0a/4ab9c45d9d61035c207556852a69ab6dda6971d80a9c45f55b3812a21bee/py_bugs_open_ai-0.1.3-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b773dce0dc689196324cd5e91abf34e8703a4e806663c1071bbb178076c62e43",
"md5": "b78e9df8e694f5f4a277d7174232f572",
"sha256": "79b1a6932072c8abdd2118dae06865375b38ebf4fd35cb72b72af0c75aaa84db"
},
"downloads": -1,
"filename": "py_bugs_open_ai-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "b78e9df8e694f5f4a277d7174232f572",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 20048,
"upload_time": "2023-05-26T18:52:17",
"upload_time_iso_8601": "2023-05-26T18:52:17.816972Z",
"url": "https://files.pythonhosted.org/packages/b7/73/dce0dc689196324cd5e91abf34e8703a4e806663c1071bbb178076c62e43/py_bugs_open_ai-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-26 18:52:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "valmikirao",
"github_project": "py_bugs_open_ai",
"travis_ci": true,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "py-bugs-open-ai"
}