gpt-code-search


Namegpt-code-search JSON
Version 0.0.9 PyPI version JSON
download
home_pagehttps://github.com/narenmanoharan/gpt-code-search
Summarygpt-code-search enables you to search your codebase with natural language.
upload_time2023-07-11 05:13:42
maintainer
docs_urlNone
authornarenmanoharan
requires_python>=3.8.17,<4.0.0
licenseApache-2.0
keywords gpt code search gpt4 llm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <h1>gpt-code-search</h1>
  <img
    height="240"
    width="240"
    alt="logo"
    src="https://raw.githubusercontent.com/narenmanoharan/gpt-code-search/main/public/logo.png"
  />
  <p>
    <b>gpt-code-search</b> is a tool enabling you to search your codebase with natural language. It utilizes OpenAI's function calling to retrieve, search and answer queries about your code, boosting productivity and code understanding.
  </p>
</div>

## Features

- 🧠 **GPT-4**: Code search, retrieval, and answering all done with OpenAI's [function calling](https://openai.com/blog/function-calling-and-other-api-updates).
- 🔐 **Privacy-first**: Code snippets only leave your machine when you ask a question and the LLM requests the relevant code.
- 🔥 **Works instantly**: No pre-processing, chunking, or indexing, get started right away.
- 📦 **File-system backed**: Works with any code on your machine.

## Getting Started

### Installation

```bash
pip install gpt-code-search
```

### Usage

#### Ask a question about your codebase

To query about the purpose of your codebase, you can use the `query` command:

```bash
gpt-code-search query "What does this codebase do?"
# or use the shorthand alias
gcs query "What does this codebase do?"
```

<img src="public/demo.gif" width="750"  alt="gpt-code-search demo"/>

If you want to generate a test for a specific file, for example analytics.py, you can mention the file name to improve accuracy:
```bash
gcs query "Can you generate a test for analytics.py?"
```

For a general usage question about a certain module, like analytics, you can use keywords to search across the codebase:
```bash
gcs query "How do I use the analytics module?"
```

**Remember, mentioning the file name or specific keywords improves the accuracy of the search.**

#### Select a model to use

```bash
gcs select-model
```

Defaults to `gpt-3.5-turbo-16k`. The selected model is stored in `$HOME/.gpt-code-search/config.toml`.


### Configuration

The tool will prompt you to configure the `OPENAI_API_KEY`, if you haven't already.

## Problem

You want to leverage the power of GPT-4 to search your codebase, but you don't want to manually copy and paste code snippets into a prompt nor send your code to another third-party service (other than OpenAI).

This tool solves these problems by letting GPT-4 determine the most relevant code snippets within your codebase. Also, it meets you where you already live, in your terminal, not a new UI or window.

Examples of the types of questions you might want to ask:

- 🐛 Help debugging errors and finding the relevant code and files
- 📝 Document large files or functionalities formatted as markdown
- 🛠️ Generate new code based on existing files and conventions
- 📨 Ask general questions about any part of the codebase

## How it works

We utilize OpenAI's function calling to let GPT-4 call certain predefined functions in our library. You do not need to implement any of these functions yourself. These functions are designed to interact with your codebase and return enough context for the LLM to perform code searches without pre-indexing it or uploading your repo to a third party other than OpenAI. So, you only need to run the tool from the directory you want to search.

<img src="public/architecture.png" width="650" />

The functions currently available for the LLM to call are:

- `search_codebase` - searches the codebase using a TF-IDF vectorizer
- `get_file_tree` - provides the file tree of the codebase
- `get_file_contents` - provides the contents of a file

These functions are implemented in `gpt-code-search` and are triggered by chat completions. The LLM is prompted to utilize the search_codebase and get_file_tree function as needed to find the necessary context to answer your query and then loops as needed to collect more context with the get_file_contents until the LLM responds.

### Privacy

This tool prioritizes privacy. Outside of the LLM, no code is sent to us and is only used as context for the LLM. We do collect anonymous usage data to improve the tool, but you can opt out of this.

## Limitations

This does have some limitations, namely:

- The LLM is unable to load context across multiple files at once. This means that if you ask a question that requires context from multiple files, you will need to ask multiple questions.
- Specify the file name and keywords in your question to improve accuracy. For example, if you want to ask a question about `analytics.py`, mention the file name in your question.
- The level of search and retrieval is limited by the context window, which refers to the scope of the search conducted by the tool, meaning that we can only search 5 levels deep in the file system. So you need to run the tool from the folder/package closest to the code you want to search.

These limitations lead to suboptimal results in a few cases, but we're working on improving this. **We wanted to get this tool out there as soon as possible to get feedback and iterate on it!**

## Roadmap

- [ ] Use vector embeddings to improve search and retrieval
- [ ] Add support for generating code and saving it to a file
- [ ] Support for searching across multiple codebases
- [ ] Allow the model to create new functions that it can then execute
- [ ] Use [guidance](https://github.com/microsoft/guidance) to improve prompts
- [ ] Add support for additional models (Claude, Bedrock, etc)

## Analytics

We collect anonymous crash and usage data to help us improve the tool. This data aids in understanding usage patterns and improving the tool. You can opt out of analytics by running:

```bash
gcs opt-out-of-analytics
```

You can check the data that by looking at the [analytics](core/analytics.py) and [config](core/config.py) files.

Here's an exhaustive list of the data we collect:

```
- exception - stacktraces of crashes
- uuid - a unique identifier for the user
- model - the model used for the query
- usage - the type of usage (query_count, query_at, query_execution_time)
```

**Note: We do not collect any PII (ip-address), queries or code snippets.**

## Contributing

We love contributions from the community! ❤️ If you'd like to contribute, feel free to fork the repository and submit a pull request.

Please read our [Code of Conduct](CODE_OF_CONDUCT.md) and [Contributing Guide](CONTRIBUTING.md) for more detailed steps and information.

## Code of Conduct

We are committed to fostering a welcoming community. To ensure that everyone feels safe and welcome, we have a [Code of Conduct](CODE_OF_CONDUCT.md) that all contributors, maintainers, and users of this project are expected to adhere to.

## Support

If you're having trouble using `gpt-code-search`, feel free to [open an issue](https://github.com/narenmanoharan/gpt-code-search/issues) on our GitHub. You can also reach out to us directly at [narenkmanoharan@gmail.com](mailto:narenkmanoharan@gmail.com). We're always happy to help!

## Feedback

Your feedback is very important to us! If you have ideas for how we can improve `gpt-code-search`, we'd love to hear from you. Please [open an issue](https://github.com/narenmanoharan/gpt-code-search/issues) or reach out to us directly at [narenkmanoharan@gmail](mailto:narenkmanoharan@gmail) with your feedback or thoughts.

## License

This project is licensed under the terms of the [Apache 2.0](LICENSE).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/narenmanoharan/gpt-code-search",
    "name": "gpt-code-search",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8.17,<4.0.0",
    "maintainer_email": "",
    "keywords": "gpt,code,search,gpt4,llm",
    "author": "narenmanoharan",
    "author_email": "narenkmanoharan@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/cc/c7/8200b9385434ce5586e346dc4b52b5b606c8b6ef2f797ac4894a158b4d43/gpt_code_search-0.0.9.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <h1>gpt-code-search</h1>\n  <img\n    height=\"240\"\n    width=\"240\"\n    alt=\"logo\"\n    src=\"https://raw.githubusercontent.com/narenmanoharan/gpt-code-search/main/public/logo.png\"\n  />\n  <p>\n    <b>gpt-code-search</b> is a tool enabling you to search your codebase with natural language. It utilizes OpenAI's function calling to retrieve, search and answer queries about your code, boosting productivity and code understanding.\n  </p>\n</div>\n\n## Features\n\n- \ud83e\udde0 **GPT-4**: Code search, retrieval, and answering all done with OpenAI's [function calling](https://openai.com/blog/function-calling-and-other-api-updates).\n- \ud83d\udd10 **Privacy-first**: Code snippets only leave your machine when you ask a question and the LLM requests the relevant code.\n- \ud83d\udd25 **Works instantly**: No pre-processing, chunking, or indexing, get started right away.\n- \ud83d\udce6 **File-system backed**: Works with any code on your machine.\n\n## Getting Started\n\n### Installation\n\n```bash\npip install gpt-code-search\n```\n\n### Usage\n\n#### Ask a question about your codebase\n\nTo query about the purpose of your codebase, you can use the `query` command:\n\n```bash\ngpt-code-search query \"What does this codebase do?\"\n# or use the shorthand alias\ngcs query \"What does this codebase do?\"\n```\n\n<img src=\"public/demo.gif\" width=\"750\"  alt=\"gpt-code-search demo\"/>\n\nIf you want to generate a test for a specific file, for example analytics.py, you can mention the file name to improve accuracy:\n```bash\ngcs query \"Can you generate a test for analytics.py?\"\n```\n\nFor a general usage question about a certain module, like analytics, you can use keywords to search across the codebase:\n```bash\ngcs query \"How do I use the analytics module?\"\n```\n\n**Remember, mentioning the file name or specific keywords improves the accuracy of the search.**\n\n#### Select a model to use\n\n```bash\ngcs select-model\n```\n\nDefaults to `gpt-3.5-turbo-16k`. The selected model is stored in `$HOME/.gpt-code-search/config.toml`.\n\n\n### Configuration\n\nThe tool will prompt you to configure the `OPENAI_API_KEY`, if you haven't already.\n\n## Problem\n\nYou want to leverage the power of GPT-4 to search your codebase, but you don't want to manually copy and paste code snippets into a prompt nor send your code to another third-party service (other than OpenAI).\n\nThis tool solves these problems by letting GPT-4 determine the most relevant code snippets within your codebase. Also, it meets you where you already live, in your terminal, not a new UI or window.\n\nExamples of the types of questions you might want to ask:\n\n- \ud83d\udc1b Help debugging errors and finding the relevant code and files\n- \ud83d\udcdd Document large files or functionalities formatted as markdown\n- \ud83d\udee0\ufe0f Generate new code based on existing files and conventions\n- \ud83d\udce8 Ask general questions about any part of the codebase\n\n## How it works\n\nWe utilize OpenAI's function calling to let GPT-4 call certain predefined functions in our library. You do not need to implement any of these functions yourself. These functions are designed to interact with your codebase and return enough context for the LLM to perform code searches without pre-indexing it or uploading your repo to a third party other than OpenAI. So, you only need to run the tool from the directory you want to search.\n\n<img src=\"public/architecture.png\" width=\"650\" />\n\nThe functions currently available for the LLM to call are:\n\n- `search_codebase` - searches the codebase using a TF-IDF vectorizer\n- `get_file_tree` - provides the file tree of the codebase\n- `get_file_contents` - provides the contents of a file\n\nThese functions are implemented in `gpt-code-search` and are triggered by chat completions. The LLM is prompted to utilize the search_codebase and get_file_tree function as needed to find the necessary context to answer your query and then loops as needed to collect more context with the get_file_contents until the LLM responds.\n\n### Privacy\n\nThis tool prioritizes privacy. Outside of the LLM, no code is sent to us and is only used as context for the LLM. We do collect anonymous usage data to improve the tool, but you can opt out of this.\n\n## Limitations\n\nThis does have some limitations, namely:\n\n- The LLM is unable to load context across multiple files at once. This means that if you ask a question that requires context from multiple files, you will need to ask multiple questions.\n- Specify the file name and keywords in your question to improve accuracy. For example, if you want to ask a question about `analytics.py`, mention the file name in your question.\n- The level of search and retrieval is limited by the context window, which refers to the scope of the search conducted by the tool, meaning that we can only search 5 levels deep in the file system. So you need to run the tool from the folder/package closest to the code you want to search.\n\nThese limitations lead to suboptimal results in a few cases, but we're working on improving this. **We wanted to get this tool out there as soon as possible to get feedback and iterate on it!**\n\n## Roadmap\n\n- [ ] Use vector embeddings to improve search and retrieval\n- [ ] Add support for generating code and saving it to a file\n- [ ] Support for searching across multiple codebases\n- [ ] Allow the model to create new functions that it can then execute\n- [ ] Use [guidance](https://github.com/microsoft/guidance) to improve prompts\n- [ ] Add support for additional models (Claude, Bedrock, etc)\n\n## Analytics\n\nWe collect anonymous crash and usage data to help us improve the tool. This data aids in understanding usage patterns and improving the tool. You can opt out of analytics by running:\n\n```bash\ngcs opt-out-of-analytics\n```\n\nYou can check the data that by looking at the [analytics](core/analytics.py) and [config](core/config.py) files.\n\nHere's an exhaustive list of the data we collect:\n\n```\n- exception - stacktraces of crashes\n- uuid - a unique identifier for the user\n- model - the model used for the query\n- usage - the type of usage (query_count, query_at, query_execution_time)\n```\n\n**Note: We do not collect any PII (ip-address), queries or code snippets.**\n\n## Contributing\n\nWe love contributions from the community! \u2764\ufe0f If you'd like to contribute, feel free to fork the repository and submit a pull request.\n\nPlease read our [Code of Conduct](CODE_OF_CONDUCT.md) and [Contributing Guide](CONTRIBUTING.md) for more detailed steps and information.\n\n## Code of Conduct\n\nWe are committed to fostering a welcoming community. To ensure that everyone feels safe and welcome, we have a [Code of Conduct](CODE_OF_CONDUCT.md) that all contributors, maintainers, and users of this project are expected to adhere to.\n\n## Support\n\nIf you're having trouble using `gpt-code-search`, feel free to [open an issue](https://github.com/narenmanoharan/gpt-code-search/issues) on our GitHub. You can also reach out to us directly at [narenkmanoharan@gmail.com](mailto:narenkmanoharan@gmail.com). We're always happy to help!\n\n## Feedback\n\nYour feedback is very important to us! If you have ideas for how we can improve `gpt-code-search`, we'd love to hear from you. Please [open an issue](https://github.com/narenmanoharan/gpt-code-search/issues) or reach out to us directly at [narenkmanoharan@gmail](mailto:narenkmanoharan@gmail) with your feedback or thoughts.\n\n## License\n\nThis project is licensed under the terms of the [Apache 2.0](LICENSE).\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "gpt-code-search enables you to search your codebase with natural language.",
    "version": "0.0.9",
    "project_urls": {
        "Homepage": "https://github.com/narenmanoharan/gpt-code-search",
        "Repository": "https://github.com/narenmanoharan/gpt-code-search",
        "discussions": "https://github.com/narenmanoharan/gpt-code-search/discussions",
        "issues": "https://github.com/narenmanoharan/gpt-code-search/issues",
        "wiki": "https://github.com/narenmanoharan/gpt-code-search/wiki"
    },
    "split_keywords": [
        "gpt",
        "code",
        "search",
        "gpt4",
        "llm"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f10fa2e1d76b1a7700c26a02f77ef915463f5e9d86b80d013b105df355ef4067",
                "md5": "1e7408b250e916e4daab05f516cfb4a0",
                "sha256": "b8267b59b5085cf576c76ae898362b1fd7cb2167df5858448d50232b09c83199"
            },
            "downloads": -1,
            "filename": "gpt_code_search-0.0.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1e7408b250e916e4daab05f516cfb4a0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.17,<4.0.0",
            "size": 17709,
            "upload_time": "2023-07-11T05:13:41",
            "upload_time_iso_8601": "2023-07-11T05:13:41.068603Z",
            "url": "https://files.pythonhosted.org/packages/f1/0f/a2e1d76b1a7700c26a02f77ef915463f5e9d86b80d013b105df355ef4067/gpt_code_search-0.0.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ccc78200b9385434ce5586e346dc4b52b5b606c8b6ef2f797ac4894a158b4d43",
                "md5": "ec9ac2e25f2d483d0988d0bfe67f1dfa",
                "sha256": "4856e2b6ead616d2c264cc9b0282e66a31a81e4e833ecd7a74888ba7b37d99e3"
            },
            "downloads": -1,
            "filename": "gpt_code_search-0.0.9.tar.gz",
            "has_sig": false,
            "md5_digest": "ec9ac2e25f2d483d0988d0bfe67f1dfa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.17,<4.0.0",
            "size": 17975,
            "upload_time": "2023-07-11T05:13:42",
            "upload_time_iso_8601": "2023-07-11T05:13:42.346132Z",
            "url": "https://files.pythonhosted.org/packages/cc/c7/8200b9385434ce5586e346dc4b52b5b606c8b6ef2f797ac4894a158b4d43/gpt_code_search-0.0.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-11 05:13:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "narenmanoharan",
    "github_project": "gpt-code-search",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "gpt-code-search"
}
        
Elapsed time: 0.09012s