[**🇨🇳中文**](https://github.com/shibing624/codeassist/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/codeassist/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/codeassist/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624)
<div align="center">
<a href="https://github.com/shibing624/codeassist">
<img src="https://github.com/shibing624/codeassist/blob/main/docs/codeassist.png" height="130" alt="Logo">
</a>
</div>
-----------------
# CodeAssist: Advanced Code Completion Tool
[![PyPI version](https://badge.fury.io/py/codeassist.svg)](https://badge.fury.io/py/codeassist)
[![Contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)
[![GitHub contributors](https://img.shields.io/github/contributors/shibing624/codeassist.svg)](https://github.com/shibing624/codeassist/graphs/contributors)
[![License Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
[![python_vesion](https://img.shields.io/badge/Python-3.5%2B-green.svg)](requirements.txt)
[![GitHub issues](https://img.shields.io/github/issues/shibing624/codeassist.svg)](https://github.com/shibing624/codeassist/issues)
[![Wechat Group](http://vlog.sfyc.ltd/wechat_everyday/wxgroup_logo.png?imageView2/0/w/60/h/20)](#Contact)
## Introduction
**CodeAssist** is an advanced code completion tool that intelligently provides high-quality code completions for Python, Java, and C++ and so on.
CodeAssist 是一个高级代码补全工具,高质量为 Python、Java 和 C++ 等编程语言补全代码
## Features
- GPT based code completion
- Code completion for `Python`, `Java`, `C++`, `javascript` and so on
- Line and block code completion
- Train(Fine-tuning) and predict model with your own data
### Release Models
| Arch | BaseModel | Model | Model Size |
|:-------|:------------------|:------------------------------------------------------------------------------------------------------------------------|:----------:|
| GPT | gpt2 | [shibing624/code-autocomplete-gpt2-base](https://huggingface.co/shibing624/code-autocomplete-gpt2-base) | 487MB |
| GPT | distilgpt2 | [shibing624/code-autocomplete-distilgpt2-python](https://huggingface.co/shibing624/code-autocomplete-distilgpt2-python) | 319MB |
| GPT | bigcode/starcoder | [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0) | 29GB |
### Demo
HuggingFace Demo: https://huggingface.co/spaces/shibing624/code-autocomplete
backend model: `shibing624/code-autocomplete-gpt2-base`
## Install
```shell
pip install torch # conda install pytorch
pip install -U codeassist
```
or
```shell
git clone https://github.com/shibing624/codeassist.git
cd CodeAssist
python setup.py install
```
## Usage
### WizardCoder model
WizardCoder-15b is fine-tuned `bigcode/starcoder` with alpaca code data, you can use the following code to generate code:
example: [examples/wizardcoder_demo.py](https://github.com/shibing624/CodeAssist/blob/main/examples/wizardcoder_demo.py)
```python
import sys
sys.path.append('..')
from codeassist import WizardCoder
m = WizardCoder("WizardLM/WizardCoder-15B-V1.0")
print(m.generate('def load_csv_file(file_path):')[0])
```
output:
```python
import csv
def load_csv_file(file_path):
"""
Load data from a CSV file and return a list of dictionaries.
"""
# Open the file in read mode
with open(file_path, 'r') as file:
# Create a CSV reader object
csv_reader = csv.DictReader(file)
# Initialize an empty list to store the data
data = []
# Iterate over each row of data
for row in csv_reader:
# Append the row of data to the list
data.append(row)
# Return the list of data
return data
```
model output is impressively effective, it currently supports English and Chinese input, you can enter instructions or code prefixes as required.
### distilgpt2 model
distilgpt2 fine-tuned code autocomplete model, you can use the following code:
example: [examples/distilgpt2_demo.py](https://github.com/shibing624/CodeAssist/blob/main/examples/distilgpt2_demo.py)
```python
import sys
sys.path.append('..')
from codeassist import GPT2Coder
m = GPT2Coder("shibing624/code-autocomplete-distilgpt2-python")
print(m.generate('import torch.nn as')[0])
```
output:
```shell
import torch.nn as nn
import torch.nn.functional as F
```
### Use with huggingface/transformers:
example: [examples/use_transformers_gpt2.py](https://github.com/shibing624/CodeAssist/blob/main/examples/use_transformers_gpt2.py)
### Train Model
#### Train WizardCoder model
example: [examples/training_wizardcoder_mydata.py](https://github.com/shibing624/CodeAssist/blob/main/examples/training_wizardcoder_mydata.py)
```shell
cd examples
CUDA_VISIBLE_DEVICES=0,1 python training_wizardcoder_mydata.py --do_train --do_predict --num_epochs 1 --output_dir outputs-wizard --model_name WizardLM/WizardCoder-15B-V1.0
```
- GPU memory: 31GB
- finetune need 2*V100(32GB)
- inference need 1*V100(32GB)
#### Train distilgpt2 model
example: [examples/training_gpt2_mydata.py](https://github.com/shibing624/CodeAssist/blob/main/examples/training_gpt2_mydata.py)
```shell
cd examples
python training_gpt2_mydata.py --do_train --do_predict --num_epochs 15 --output_dir outputs-gpt2 --model_name gpt2
```
PS: fine-tuned result model is GPT2-python: [shibing624/code-autocomplete-gpt2-base](https://huggingface.co/shibing624/code-autocomplete-gpt2-base),
I spent about 24 hours with V100 to fine-tune it.
### Server
start FastAPI server:
example: [examples/server.py](https://github.com/shibing624/CodeAssist/blob/main/examples/server.py)
```shell
cd examples
python server.py
```
open url: http://0.0.0.0:8001/docs
![api](https://github.com/shibing624/CodeAssist/blob/main/docs/api.png)
## Dataset
This allows to customize dataset building. Below is an example of the building process.
Let's use Python codes from [Awesome-pytorch-list](https://github.com/bharathgs/Awesome-pytorch-list)
1. We want the model to help auto-complete codes at a general level. The codes of The Algorithms suits the need.
2. This code from this project is well written (high-quality codes).
dataset tree:
```shell
examples/download/python
├── train.txt
└── valid.txt
└── test.txt
```
There are three ways to build dataset:
1. Use the huggingface/datasets library load the dataset
huggingface datasets [https://huggingface.co/datasets/shibing624/source_code](https://huggingface.co/datasets/shibing624/source_code)
```python
from datasets import load_dataset
dataset = load_dataset("shibing624/source_code", "python") # python or java or cpp
print(dataset)
print(dataset['test'][0:10])
```
output:
```shell
DatasetDict({
train: Dataset({
features: ['text'],
num_rows: 5215412
})
validation: Dataset({
features: ['text'],
num_rows: 10000
})
test: Dataset({
features: ['text'],
num_rows: 10000
})
})
{'text': [
" {'max_epochs': [1, 2]},\n",
' refit=False,\n', ' cv=3,\n',
" scoring='roc_auc',\n", ' )\n',
' search.fit(*data)\n',
'',
' def test_module_output_not_1d(self, net_cls, data):\n',
' from skorch.toy import make_classifier\n',
' module = make_classifier(\n'
]}
```
2. Download dataset from Cloud
| Name | Source | Download | Size |
| :------- | :--------- | :---------: | :---------: |
| Python+Java+CPP source code | Awesome-pytorch-list(5.22 Million lines) | [github_source_code.zip](https://github.com/shibing624/codeassist/releases/download/0.0.4/source_code.zip) | 105M |
download dataset and unzip it, put to `examples/`.
3. Get source code from scratch and build dataset
[prepare_code_data.py](https://github.com/shibing624/CodeAssist/blob/main/examples/prepare_code_data.py)
```shell
cd examples
python prepare_code_data.py --num_repos 260
```
## Contact
- Issue(建议)
:[![GitHub issues](https://img.shields.io/github/issues/shibing624/codeassist.svg)](https://github.com/shibing624/codeassist/issues)
- 邮件我:xuming: xuming624@qq.com
- 微信我: 加我*微信号:xuming624, 备注:个人名称-公司-NLP* 进NLP交流群。
<img src="docs/wechat.jpeg" width="200" />
## Citation
如果你在研究中使用了codeassist,请按如下格式引用:
APA:
```latex
Xu, M. codeassist: Code AutoComplete with GPT model (Version 1.0.0) [Computer software]. https://github.com/shibing624/codeassist
```
BibTeX:
```latex
@software{Xu_codeassist,
author = {Ming Xu},
title = {CodeAssist: Code AutoComplete with Generation model},
url = {https://github.com/shibing624/codeassist},
version = {1.0.0}
}
```
## License
This repository is licensed under the [The Apache License 2.0](LICENSE).
Please follow the [Attribution-NonCommercial 4.0 International](https://github.com/nlpxucan/WizardLM/blob/main/WizardCoder/MODEL_WEIGHTS_LICENSE) to use the WizardCoder model.
## Contribute
项目代码还很粗糙,如果大家对代码有所改进,欢迎提交回本项目,在提交之前,注意以下两点:
- 在`tests`添加相应的单元测试
- 使用`python setup.py test`来运行所有单元测试,确保所有单测都是通过的
之后即可提交PR。
## Reference
- [gpt-2-simple](https://github.com/minimaxir/gpt-2-simple)
- [galois-autocompleter](https://github.com/galois-autocompleter/galois-autocompleter)
- [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)
Raw data
{
"_id": null,
"home_page": "https://github.com/shibing624/autocoder",
"name": "codeassist",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.5",
"maintainer_email": "",
"keywords": "CodeGenie,autocomplete,code-autocomplete",
"author": "XuMing",
"author_email": "xuming624@qq.com",
"download_url": "https://files.pythonhosted.org/packages/bf/6b/fda02855792a18070447029027e7877b6b9131aae1ab08f70dac9aa77575/codeassist-1.0.0.tar.gz",
"platform": null,
"description": "[**\ud83c\udde8\ud83c\uddf3\u4e2d\u6587**](https://github.com/shibing624/codeassist/blob/main/README.md) | [**\ud83c\udf10English**](https://github.com/shibing624/codeassist/blob/main/README_EN.md) | [**\ud83d\udcd6\u6587\u6863/Docs**](https://github.com/shibing624/codeassist/wiki) | [**\ud83e\udd16\u6a21\u578b/Models**](https://huggingface.co/shibing624) \n\n<div align=\"center\">\n <a href=\"https://github.com/shibing624/codeassist\">\n <img src=\"https://github.com/shibing624/codeassist/blob/main/docs/codeassist.png\" height=\"130\" alt=\"Logo\">\n </a>\n</div>\n\n-----------------\n\n# CodeAssist: Advanced Code Completion Tool\n[![PyPI version](https://badge.fury.io/py/codeassist.svg)](https://badge.fury.io/py/codeassist)\n[![Contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)\n[![GitHub contributors](https://img.shields.io/github/contributors/shibing624/codeassist.svg)](https://github.com/shibing624/codeassist/graphs/contributors)\n[![License Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)\n[![python_vesion](https://img.shields.io/badge/Python-3.5%2B-green.svg)](requirements.txt)\n[![GitHub issues](https://img.shields.io/github/issues/shibing624/codeassist.svg)](https://github.com/shibing624/codeassist/issues)\n[![Wechat Group](http://vlog.sfyc.ltd/wechat_everyday/wxgroup_logo.png?imageView2/0/w/60/h/20)](#Contact)\n\n## Introduction\n\n**CodeAssist** is an advanced code completion tool that intelligently provides high-quality code completions for Python, Java, and C++ and so on. \n\nCodeAssist \u662f\u4e00\u4e2a\u9ad8\u7ea7\u4ee3\u7801\u8865\u5168\u5de5\u5177\uff0c\u9ad8\u8d28\u91cf\u4e3a Python\u3001Java \u548c C++ \u7b49\u7f16\u7a0b\u8bed\u8a00\u8865\u5168\u4ee3\u7801\n\n\n## Features\n\n- GPT based code completion\n- Code completion for `Python`, `Java`, `C++`, `javascript` and so on\n- Line and block code completion\n- Train(Fine-tuning) and predict model with your own data\n\n### Release Models\n\n| Arch | BaseModel | Model | Model Size | \n|:-------|:------------------|:------------------------------------------------------------------------------------------------------------------------|:----------:|\n| GPT | gpt2 | [shibing624/code-autocomplete-gpt2-base](https://huggingface.co/shibing624/code-autocomplete-gpt2-base) | 487MB |\n| GPT | distilgpt2 | [shibing624/code-autocomplete-distilgpt2-python](https://huggingface.co/shibing624/code-autocomplete-distilgpt2-python) | 319MB |\n| GPT | bigcode/starcoder | [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0) | 29GB |\n\n\n### Demo\n\nHuggingFace Demo: https://huggingface.co/spaces/shibing624/code-autocomplete\n\nbackend model: `shibing624/code-autocomplete-gpt2-base`\n\n## Install\n\n```shell\npip install torch # conda install pytorch\npip install -U codeassist\n```\n\nor\n\n```shell\ngit clone https://github.com/shibing624/codeassist.git\ncd CodeAssist\npython setup.py install\n```\n\n## Usage\n\n### WizardCoder model\n\nWizardCoder-15b is fine-tuned `bigcode/starcoder` with alpaca code data, you can use the following code to generate code:\n\nexample: [examples/wizardcoder_demo.py](https://github.com/shibing624/CodeAssist/blob/main/examples/wizardcoder_demo.py)\n\n```python\nimport sys\n\nsys.path.append('..')\nfrom codeassist import WizardCoder\n\nm = WizardCoder(\"WizardLM/WizardCoder-15B-V1.0\")\nprint(m.generate('def load_csv_file(file_path):')[0])\n```\n\noutput:\n\n\n```python\nimport csv\n\ndef load_csv_file(file_path):\n \"\"\"\n Load data from a CSV file and return a list of dictionaries.\n \"\"\"\n # Open the file in read mode\n with open(file_path, 'r') as file:\n # Create a CSV reader object\n csv_reader = csv.DictReader(file)\n # Initialize an empty list to store the data\n data = []\n # Iterate over each row of data\n for row in csv_reader:\n # Append the row of data to the list\n data.append(row)\n # Return the list of data\n return data\n```\n\nmodel output is impressively effective, it currently supports English and Chinese input, you can enter instructions or code prefixes as required.\n\n### distilgpt2 model\n\n\ndistilgpt2 fine-tuned code autocomplete model, you can use the following code:\n\nexample: [examples/distilgpt2_demo.py](https://github.com/shibing624/CodeAssist/blob/main/examples/distilgpt2_demo.py)\n\n```python\nimport sys\n\nsys.path.append('..')\nfrom codeassist import GPT2Coder\n\nm = GPT2Coder(\"shibing624/code-autocomplete-distilgpt2-python\")\nprint(m.generate('import torch.nn as')[0])\n```\n\noutput:\n\n```shell\nimport torch.nn as nn\nimport torch.nn.functional as F\n```\n\n### Use with huggingface/transformers\uff1a\n\nexample: [examples/use_transformers_gpt2.py](https://github.com/shibing624/CodeAssist/blob/main/examples/use_transformers_gpt2.py)\n\n### Train Model\n#### Train WizardCoder model\nexample: [examples/training_wizardcoder_mydata.py](https://github.com/shibing624/CodeAssist/blob/main/examples/training_wizardcoder_mydata.py)\n\n```shell\ncd examples\nCUDA_VISIBLE_DEVICES=0,1 python training_wizardcoder_mydata.py --do_train --do_predict --num_epochs 1 --output_dir outputs-wizard --model_name WizardLM/WizardCoder-15B-V1.0\n```\n\n- GPU memory: 31GB\n- finetune need 2*V100(32GB)\n- inference need 1*V100(32GB)\n\n#### Train distilgpt2 model\nexample: [examples/training_gpt2_mydata.py](https://github.com/shibing624/CodeAssist/blob/main/examples/training_gpt2_mydata.py)\n\n```shell\ncd examples\npython training_gpt2_mydata.py --do_train --do_predict --num_epochs 15 --output_dir outputs-gpt2 --model_name gpt2\n```\n\nPS: fine-tuned result model is GPT2-python: [shibing624/code-autocomplete-gpt2-base](https://huggingface.co/shibing624/code-autocomplete-gpt2-base), \nI spent about 24 hours with V100 to fine-tune it. \n\n\n### Server\n\nstart FastAPI server:\n\nexample: [examples/server.py](https://github.com/shibing624/CodeAssist/blob/main/examples/server.py)\n\n```shell\ncd examples\npython server.py\n```\n\nopen url: http://0.0.0.0:8001/docs\n\n![api](https://github.com/shibing624/CodeAssist/blob/main/docs/api.png)\n\n\n\n## Dataset\n\nThis allows to customize dataset building. Below is an example of the building process.\n\nLet's use Python codes from [Awesome-pytorch-list](https://github.com/bharathgs/Awesome-pytorch-list)\n\n1. We want the model to help auto-complete codes at a general level. The codes of The Algorithms suits the need.\n2. This code from this project is well written (high-quality codes).\n\ndataset tree:\n\n```shell\nexamples/download/python\n\u251c\u2500\u2500 train.txt\n\u2514\u2500\u2500 valid.txt\n\u2514\u2500\u2500 test.txt\n```\n\nThere are three ways to build dataset:\n1. Use the huggingface/datasets library load the dataset\nhuggingface datasets [https://huggingface.co/datasets/shibing624/source_code](https://huggingface.co/datasets/shibing624/source_code)\n\n```python\nfrom datasets import load_dataset\ndataset = load_dataset(\"shibing624/source_code\", \"python\") # python or java or cpp\nprint(dataset)\nprint(dataset['test'][0:10])\n```\n\noutput:\n```shell\nDatasetDict({\n train: Dataset({\n features: ['text'],\n num_rows: 5215412\n })\n validation: Dataset({\n features: ['text'],\n num_rows: 10000\n })\n test: Dataset({\n features: ['text'],\n num_rows: 10000\n })\n})\n{'text': [\n\" {'max_epochs': [1, 2]},\\n\", \n' refit=False,\\n', ' cv=3,\\n', \n\" scoring='roc_auc',\\n\", ' )\\n', \n' search.fit(*data)\\n', \n'', \n' def test_module_output_not_1d(self, net_cls, data):\\n', \n' from skorch.toy import make_classifier\\n', \n' module = make_classifier(\\n'\n]}\n```\n\n2. Download dataset from Cloud\n\n| Name | Source | Download | Size |\n| :------- | :--------- | :---------: | :---------: |\n| Python+Java+CPP source code | Awesome-pytorch-list(5.22 Million lines) | [github_source_code.zip](https://github.com/shibing624/codeassist/releases/download/0.0.4/source_code.zip) | 105M |\n\ndownload dataset and unzip it, put to `examples/`.\n\n3. Get source code from scratch and build dataset\n\n[prepare_code_data.py](https://github.com/shibing624/CodeAssist/blob/main/examples/prepare_code_data.py)\n\n```shell\ncd examples\npython prepare_code_data.py --num_repos 260\n```\n\n\n## Contact\n\n- Issue(\u5efa\u8bae)\n \uff1a[![GitHub issues](https://img.shields.io/github/issues/shibing624/codeassist.svg)](https://github.com/shibing624/codeassist/issues)\n- \u90ae\u4ef6\u6211\uff1axuming: xuming624@qq.com\n- \u5fae\u4fe1\u6211\uff1a \u52a0\u6211*\u5fae\u4fe1\u53f7\uff1axuming624, \u5907\u6ce8\uff1a\u4e2a\u4eba\u540d\u79f0-\u516c\u53f8-NLP* \u8fdbNLP\u4ea4\u6d41\u7fa4\u3002\n\n<img src=\"docs/wechat.jpeg\" width=\"200\" />\n\n## Citation\n\n\u5982\u679c\u4f60\u5728\u7814\u7a76\u4e2d\u4f7f\u7528\u4e86codeassist\uff0c\u8bf7\u6309\u5982\u4e0b\u683c\u5f0f\u5f15\u7528\uff1a\n\nAPA:\n```latex\nXu, M. codeassist: Code AutoComplete with GPT model (Version 1.0.0) [Computer software]. https://github.com/shibing624/codeassist\n```\n\nBibTeX:\n```latex\n@software{Xu_codeassist,\nauthor = {Ming Xu},\ntitle = {CodeAssist: Code AutoComplete with Generation model},\nurl = {https://github.com/shibing624/codeassist},\nversion = {1.0.0}\n}\n```\n\n## License\nThis repository is licensed under the [The Apache License 2.0](LICENSE).\n\nPlease follow the [Attribution-NonCommercial 4.0 International](https://github.com/nlpxucan/WizardLM/blob/main/WizardCoder/MODEL_WEIGHTS_LICENSE) to use the WizardCoder model.\n\n\n## Contribute\n\n\u9879\u76ee\u4ee3\u7801\u8fd8\u5f88\u7c97\u7cd9\uff0c\u5982\u679c\u5927\u5bb6\u5bf9\u4ee3\u7801\u6709\u6240\u6539\u8fdb\uff0c\u6b22\u8fce\u63d0\u4ea4\u56de\u672c\u9879\u76ee\uff0c\u5728\u63d0\u4ea4\u4e4b\u524d\uff0c\u6ce8\u610f\u4ee5\u4e0b\u4e24\u70b9\uff1a\n\n- \u5728`tests`\u6dfb\u52a0\u76f8\u5e94\u7684\u5355\u5143\u6d4b\u8bd5\n- \u4f7f\u7528`python setup.py test`\u6765\u8fd0\u884c\u6240\u6709\u5355\u5143\u6d4b\u8bd5\uff0c\u786e\u4fdd\u6240\u6709\u5355\u6d4b\u90fd\u662f\u901a\u8fc7\u7684\n\n\u4e4b\u540e\u5373\u53ef\u63d0\u4ea4PR\u3002\n\n## Reference\n\n- [gpt-2-simple](https://github.com/minimaxir/gpt-2-simple)\n- [galois-autocompleter](https://github.com/galois-autocompleter/galois-autocompleter)\n- [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Code AutoComplete",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/shibing624/autocoder"
},
"split_keywords": [
"codegenie",
"autocomplete",
"code-autocomplete"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bf6bfda02855792a18070447029027e7877b6b9131aae1ab08f70dac9aa77575",
"md5": "350deb531ea5733de6f80cf177c82078",
"sha256": "0eabb37ee94a879149f0d00ddda6d00c848ca28b9300c1205ea50571466c49d0"
},
"downloads": -1,
"filename": "codeassist-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "350deb531ea5733de6f80cf177c82078",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 25541,
"upload_time": "2023-07-03T12:23:13",
"upload_time_iso_8601": "2023-07-03T12:23:13.525386Z",
"url": "https://files.pythonhosted.org/packages/bf/6b/fda02855792a18070447029027e7877b6b9131aae1ab08f70dac9aa77575/codeassist-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-03 12:23:13",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "shibing624",
"github_project": "autocoder",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "codeassist"
}