Name | coderepoindex JSON |
Version |
0.1.0
JSON |
| download |
home_page | None |
Summary | 通过语义理解提高代码仓库的可发现性和可搜索性 |
upload_time | 2025-07-10 04:02:17 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | MIT License
Copyright (c) 2024 CodeRepoIndex
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE. |
keywords |
code search
vector search
semantic search
code indexing
|
VCS |
 |
bugtrack_url |
|
requirements |
GitPython
pathlib2
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# CodeRepoIndex
<!-- <p align="center">
<img src="https://raw.githubusercontent.com/XingYu-Zhong/CodeRepoIndex/main/assets/logo.png" alt="CodeRepoIndex Logo" width="150">
</p> -->
<p align="center">
<strong>通过语义理解,释放代码仓库的全部潜力</strong>
</p>
<p align="center">
<a href="https://codecov.io/gh/XingYu-Zhong/CodeRepoIndex">
<img src="https://codecov.io/gh/XingYu-Zhong/CodeRepoIndex/branch/main/graph/badge.svg" alt="Code Coverage">
</a>
<a href="https://pypi.org/project/coderepoindex/">
<img src="https://img.shields.io/pypi/v/coderepoindex.svg" alt="PyPI Version">
</a>
<a href="https://github.com/XingYu-Zhong/CodeRepoIndex/blob/main/LICENSE">
<img src="https://img.shields.io/pypi/l/coderepoindex.svg" alt="License">
</a>
</p>
**CodeRepoIndex** 是一个开源的本地化代码语义索引和搜索工具。它能够将完整的代码仓库转换为一个可被语义查询的本地索引,帮助开发者快速在大型代码库中定位相关的代码功能、实现和示例。
## 核心功能
- **🤖 智能代码解析**: 自动将代码文件分解为函数、类、方法等有意义的逻辑单元。
- **🧠 语义嵌入**: 使用先进的嵌入模型(如 OpenAI, 阿里云通义等)将代码块转换为高维向量。
- **💾 统一存储**: 基于embedding模块的统一存储架构,高效管理代码元数据和向量数据。
- **🔍 纯向量搜索**: 专注于语义向量搜索,支持中英文自然语言查询和代码片段查询。
- **⚙️ 灵活配置**: 支持环境变量、JSON配置文件、代码内直接传入等多种配置方式。
- **📦 开箱即用**: 提供简洁的 Python API 和命令行工具。
## 架构与模块设计
项目采用模块化设计,各组件职责清晰,易于扩展。下图展示了核心模块及其关系:
```mermaid
graph TD
subgraph User Interface
A[CLI / Python API]
end
subgraph Core Logic
B(Core Module) -- Manages --> C(Indexer & Searcher)
B -- Uses --> H[Storage Adapter]
end
subgraph Building Blocks
D[Repository] -- Fetches Code --> E[Parsers]
E -- Creates Snippets --> B
H -- Adapts --> F[Embeddings]
F -- Unified Storage --> F
C -- Uses --> H
end
A --> B
style B fill:#cde4ff,stroke:#444,stroke-width:2px
style F fill:#e1f5fe,stroke:#444,stroke-width:2px
```
### 模块详解
想深入了解每个模块的设计和实现吗?请点击下面的链接查看详细文档:
- **[📄 `core` 模块](./docs/core_module.md)**: 项目的中央协调器,整合其他模块提供索引和搜索服务。
- **[📄 `embeddings` 模块](./docs/embeddings_module.md)**: 统一存储模块,负责将代码块转换为向量并管理存储。
- **[📄 `parsers` 模块](./docs/parsers_module.md)**: 代码解析核心,使用 `tree-sitter` 将源文件解析为结构化数据。
- **[📄 `repository` 模块](./docs/repository_module.md)**: 数据源获取层,负责从 Git 或本地目录获取代码。
- **[📄 `models` 模块](./docs/models_module.md)**: 对接外部AI模型(LLM 和 Embedding)的抽象层。
- **[📄 `cli` 模块](./docs/cli_module.md)**: 提供强大的命令行接口。
## 快速开始
### 1. 安装
```bash
# 从 PyPI 安装
pip install coderepoindex
# 或者从源码安装最新版本
git clone https://github.com/XingYu-Zhong/CodeRepoIndex.git
cd CodeRepoIndex
pip install -e .
```
### 2. 配置
CodeRepoIndex 提供了非常灵活的配置系统,支持分别配置 LLM 模型和 Embedding 模型的 API 密钥和基础 URL。您可以根据偏好选择其中一种配置方式。
**配置加载优先级**: `代码中直接传入的参数` > `环境变量` > `coderepoindex.json` > `默认值`。
#### 方式一:JSON 配置文件 (推荐)
在您的项目根目录下创建一个名为 `coderepoindex.json` 的文件。这是管理所有设置的最清晰的方式。CodeRepoIndex 会自动查找并加载此文件。
**`coderepoindex.json` 示例 (分离式配置):**
```json
{
"project_name": "CodeRepoIndex",
"log_level": "INFO",
"llm": {
"provider_type": "api",
"model_name": "qwen-plus",
"api_key": "your-llm-api-key",
"base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"timeout": 30.0,
"extra_params": {
"temperature": 0.7,
"max_tokens": 2000
}
},
"embedding": {
"provider_type": "api",
"model_name": "text-embedding-v3",
"api_key": "your-embedding-api-key",
"base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"timeout": 30.0,
"batch_size": 32
},
"storage": {
"storage_backend": "local",
"vector_backend": "chromadb",
"base_path": "./my_code_index",
"cache_enabled": true,
"cache_size": 1000
}
}
```
**兼容性配置 (统一 API):**
```json
{
"api_key": "your-unified-api-key",
"base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"embedding_model": "text-embedding-v3",
"storage_path": "./my_code_index",
"vector_backend": "chromadb",
"log_level": "INFO"
}
```
#### 方式二:环境变量
您也可以通过设置环境变量来配置,这在 CI/CD 或 Docker 环境中非常有用。
**分离式环境变量配置:**
```bash
# LLM 配置
export CODEREPO_LLM_API_KEY="your-llm-api-key"
export CODEREPO_LLM_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export CODEREPO_LLM_MODEL="qwen-plus"
# Embedding 配置
export CODEREPO_EMBEDDING_API_KEY="your-embedding-api-key"
export CODEREPO_EMBEDDING_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export CODEREPO_EMBEDDING_MODEL="text-embedding-v3"
# 存储配置
export CODEREPO_STORAGE_PATH="./my_code_index"
export CODEREPO_VECTOR_BACKEND="chromadb"
export CODEREPO_LOG_LEVEL="INFO"
```
**兼容性环境变量配置:**
```bash
# 统一 API 配置 (LLM 和 Embedding 使用相同的 API)
export CODEREPO_API_KEY="your-api-key"
export CODEREPO_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export CODEREPO_STORAGE_PATH="./my_code_index"
```
#### 方式三:在代码中直接传入
在快速原型开发或测试时,可以直接在代码中定义配置。
**分离式配置:**
```python
from coderepoindex.config import load_config
config = load_config(
llm_api_key="your-llm-key",
llm_base_url="https://api.openai.com/v1",
llm_model_name="gpt-4",
embedding_api_key="your-embedding-key",
embedding_base_url="https://api.cohere.ai/v1",
embedding_model_name="embed-english-v3.0",
storage_base_path="./temp_index"
)
```
**兼容性配置:**
```python
from coderepoindex.config import load_config
config = load_config(
api_key="your_api_key",
base_url="https://your-api-provider.com/v1",
storage_path="./temp_index"
)
```
> 更多高级配置选项和说明,请参考 [**配置文档 (`docs/configuration.md`)**](./docs/configuration.md)。
### 3. 使用示例
#### 示例 1: 本地项目快速索引与搜索 (推荐)
这是最简单、最核心的用法,展示了如何索引一个本地代码目录并进行搜索。
```python
from coderepoindex.config import load_config
from coderepoindex.core import CodeIndexer, CodeSearcher
from coderepoindex.repository import create_local_config
def main():
# 1. 加载配置 (或使用环境变量/配置文件)
# 注意:请替换为您的真实API密钥和URL
config = load_config(config_dict={
"embedding": {
"api_key": "your-embedding-api-key",
"base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"model_name": "text-embedding-v3"
},
"storage": {
"base_path": "./my_code_index"
}
})
print(f"🔧 配置加载完成,使用模型: {config.embedding.model_name}")
# 2. 初始化核心组件
indexer = CodeIndexer(config=config)
searcher = CodeSearcher(config=config)
# 3. 定义要索引的本地仓库
# 请将 './coderepoindex' 替换为您自己的项目路径
local_repo_path = "./coderepoindex"
repo_config = create_local_config(path=local_repo_path)
# 4. 执行索引
print(f"\n🔍 开始索引本地目录: {local_repo_path}")
# 使用 with 上下文管理器确保资源被正确处理
with indexer:
index_stats = indexer.index_repository(repo_config, repository_id="my_local_project")
print("✅ 索引完成!")
print(f" - 总文件数: {index_stats.get('total_files', 0)}")
print(f" - 代码块数: {index_stats.get('total_blocks', 0)}")
# 5. 执行搜索
print("\n🔎 开始搜索...")
queries = [
"如何处理文件上传",
"数据库连接池配置",
"def get_user_by_id"
]
with searcher:
for query in queries:
print(f"\n▶️ 查询: '{query}'")
results = searcher.search(
query=query,
top_k=3,
repository_id="my_local_project" # 指定在哪个项目中搜索
)
if results:
print(f" 找到 {len(results)} 个相关结果:")
for i, result in enumerate(results, 1):
print(f" {i}. {result.block.file_path}:{result.block.line_start}")
print(f" 函数/类: {result.block.name}")
print(f" 相似度: {result.score:.4f}")
else:
print(" 未找到相关结果。")
if __name__ == "__main__":
main()
```
#### 示例 2: 索引 Git 仓库并使用多项目管理
这个例子展示了更高级的用法,包括从Git仓库拉取代码和管理多个项目。
```python
from coderepoindex.core import create_project_manager
from coderepoindex.repository import create_git_config
# 假设 config 对象已像上一个示例一样加载
# config = load_config(...)
# 1. 创建项目管理器
pm = create_project_manager(config=config)
with pm:
# 2. 定义并索引第一个项目
repo1_url = "https://github.com/requests/requests.git"
repo1_config = create_git_config(repo1_url, branch="main")
pm.create_project(name="Python Requests", repository_url=repo1_url, project_id="requests")
indexer = CodeIndexer(config=config)
with indexer:
indexer.index_repository(repo1_config, repository_id="requests")
print("✅ 'requests' 项目索引完成。")
# 3. 定义并索引第二个项目
repo2_url = "https://github.com/expressjs/express.git"
repo2_config = create_git_config(repo2_url, branch="master")
pm.create_project(name="Node Express", repository_url=repo2_url, project_id="express")
with indexer:
indexer.index_repository(repo2_config, repository_id="express")
print("✅ 'express' 项目索引完成。")
# 4. 在特定项目中搜索
print("\n🔍 在 'requests' 项目中搜索 'session management':")
results = pm.search_in_project(
query="session management",
project_id="requests",
top_k=2
)
for result in results:
print(f" - 找到: {result.block.file_path} - {result.block.name}")
# 5. 列出所有项目
print("\n📋 当前管理的所有项目:")
for proj in pm.list_projects():
print(f" - {proj.name} (ID: {proj.project_id})")
```
#### 配置最佳实践
**生产环境配置 (`coderepoindex.json`)**:
```json
{
"project_name": "MyCompanyProject",
"log_level": "INFO",
"embedding": {
"provider_type": "api",
"model_name": "text-embedding-v3",
"api_key": "${EMBEDDING_API_KEY}",
"base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"timeout": 30.0,
"batch_size": 32
},
"storage": {
"storage_backend": "local",
"vector_backend": "chroma",
"base_path": "./company_code_index",
"cache_enabled": true,
"cache_size": 1000
}
}
```
**开发环境快速配置**:
```bash
# 设置环境变量
export CODEREPO_EMBEDDING_API_KEY="your-key"
export CODEREPO_EMBEDDING_BASE_URL="https://api.provider.com/v1"
export CODEREPO_STORAGE_PATH="./dev_index"
# 运行代码
python your_script.py
```
## 命令行工具
CodeRepoIndex 还提供了强大的命令行工具,方便快速操作。使用前请确保已通过环境变量或配置文件设置好 API 密钥等配置。
```bash
# 索引一个本地目录
coderepoindex index local /path/to/your/project
# 索引一个 Git 仓库
coderepoindex index git https://github.com/requests/requests.git
# 向量语义搜索
coderepoindex search "how to send a post request"
# 中文语义搜索
coderepoindex search "错误处理和异常捕获"
# 代码片段搜索
coderepoindex search "def upload_file(request):"
# 列出已索引的仓库
coderepoindex list
# 查看配置状态
coderepoindex config show
```
## 贡献指南
我们欢迎所有形式的贡献!无论是报告问题、贡献代码还是改进文档,都对我们非常有价值。请查看 [**CONTRIBUTING.md**](CONTRIBUTING.md) 了解详细信息。
## 许可证
本项目基于 [MIT License](LICENSE) 开源。
Raw data
{
"_id": null,
"home_page": null,
"name": "coderepoindex",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "code search, vector search, semantic search, code indexing",
"author": null,
"author_email": "CodeRepoIndex Team <zhongxingyuemail@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/94/f2/4343d845328a0981cfa9780217c1048ac490168aef7bcc796c153386ab7f/coderepoindex-0.1.0.tar.gz",
"platform": null,
"description": "# CodeRepoIndex\n\n<!-- <p align=\"center\">\n <img src=\"https://raw.githubusercontent.com/XingYu-Zhong/CodeRepoIndex/main/assets/logo.png\" alt=\"CodeRepoIndex Logo\" width=\"150\">\n</p> -->\n\n<p align=\"center\">\n <strong>\u901a\u8fc7\u8bed\u4e49\u7406\u89e3\uff0c\u91ca\u653e\u4ee3\u7801\u4ed3\u5e93\u7684\u5168\u90e8\u6f5c\u529b</strong>\n</p>\n\n<p align=\"center\">\n <a href=\"https://codecov.io/gh/XingYu-Zhong/CodeRepoIndex\">\n <img src=\"https://codecov.io/gh/XingYu-Zhong/CodeRepoIndex/branch/main/graph/badge.svg\" alt=\"Code Coverage\">\n </a>\n <a href=\"https://pypi.org/project/coderepoindex/\">\n <img src=\"https://img.shields.io/pypi/v/coderepoindex.svg\" alt=\"PyPI Version\">\n </a>\n <a href=\"https://github.com/XingYu-Zhong/CodeRepoIndex/blob/main/LICENSE\">\n <img src=\"https://img.shields.io/pypi/l/coderepoindex.svg\" alt=\"License\">\n </a>\n</p>\n\n**CodeRepoIndex** \u662f\u4e00\u4e2a\u5f00\u6e90\u7684\u672c\u5730\u5316\u4ee3\u7801\u8bed\u4e49\u7d22\u5f15\u548c\u641c\u7d22\u5de5\u5177\u3002\u5b83\u80fd\u591f\u5c06\u5b8c\u6574\u7684\u4ee3\u7801\u4ed3\u5e93\u8f6c\u6362\u4e3a\u4e00\u4e2a\u53ef\u88ab\u8bed\u4e49\u67e5\u8be2\u7684\u672c\u5730\u7d22\u5f15\uff0c\u5e2e\u52a9\u5f00\u53d1\u8005\u5feb\u901f\u5728\u5927\u578b\u4ee3\u7801\u5e93\u4e2d\u5b9a\u4f4d\u76f8\u5173\u7684\u4ee3\u7801\u529f\u80fd\u3001\u5b9e\u73b0\u548c\u793a\u4f8b\u3002\n\n## \u6838\u5fc3\u529f\u80fd\n\n- **\ud83e\udd16 \u667a\u80fd\u4ee3\u7801\u89e3\u6790**: \u81ea\u52a8\u5c06\u4ee3\u7801\u6587\u4ef6\u5206\u89e3\u4e3a\u51fd\u6570\u3001\u7c7b\u3001\u65b9\u6cd5\u7b49\u6709\u610f\u4e49\u7684\u903b\u8f91\u5355\u5143\u3002\n- **\ud83e\udde0 \u8bed\u4e49\u5d4c\u5165**: \u4f7f\u7528\u5148\u8fdb\u7684\u5d4c\u5165\u6a21\u578b\uff08\u5982 OpenAI, \u963f\u91cc\u4e91\u901a\u4e49\u7b49\uff09\u5c06\u4ee3\u7801\u5757\u8f6c\u6362\u4e3a\u9ad8\u7ef4\u5411\u91cf\u3002\n- **\ud83d\udcbe \u7edf\u4e00\u5b58\u50a8**: \u57fa\u4e8eembedding\u6a21\u5757\u7684\u7edf\u4e00\u5b58\u50a8\u67b6\u6784\uff0c\u9ad8\u6548\u7ba1\u7406\u4ee3\u7801\u5143\u6570\u636e\u548c\u5411\u91cf\u6570\u636e\u3002\n- **\ud83d\udd0d \u7eaf\u5411\u91cf\u641c\u7d22**: \u4e13\u6ce8\u4e8e\u8bed\u4e49\u5411\u91cf\u641c\u7d22\uff0c\u652f\u6301\u4e2d\u82f1\u6587\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u548c\u4ee3\u7801\u7247\u6bb5\u67e5\u8be2\u3002\n- **\u2699\ufe0f \u7075\u6d3b\u914d\u7f6e**: \u652f\u6301\u73af\u5883\u53d8\u91cf\u3001JSON\u914d\u7f6e\u6587\u4ef6\u3001\u4ee3\u7801\u5185\u76f4\u63a5\u4f20\u5165\u7b49\u591a\u79cd\u914d\u7f6e\u65b9\u5f0f\u3002\n- **\ud83d\udce6 \u5f00\u7bb1\u5373\u7528**: \u63d0\u4f9b\u7b80\u6d01\u7684 Python API \u548c\u547d\u4ee4\u884c\u5de5\u5177\u3002\n\n## \u67b6\u6784\u4e0e\u6a21\u5757\u8bbe\u8ba1\n\n\u9879\u76ee\u91c7\u7528\u6a21\u5757\u5316\u8bbe\u8ba1\uff0c\u5404\u7ec4\u4ef6\u804c\u8d23\u6e05\u6670\uff0c\u6613\u4e8e\u6269\u5c55\u3002\u4e0b\u56fe\u5c55\u793a\u4e86\u6838\u5fc3\u6a21\u5757\u53ca\u5176\u5173\u7cfb\uff1a\n\n```mermaid\ngraph TD\n subgraph User Interface\n A[CLI / Python API]\n end\n\n subgraph Core Logic\n B(Core Module) -- Manages --> C(Indexer & Searcher)\n B -- Uses --> H[Storage Adapter]\n end\n\n subgraph Building Blocks\n D[Repository] -- Fetches Code --> E[Parsers]\n E -- Creates Snippets --> B\n H -- Adapts --> F[Embeddings]\n F -- Unified Storage --> F\n C -- Uses --> H\n end\n\n A --> B\n\n style B fill:#cde4ff,stroke:#444,stroke-width:2px\n style F fill:#e1f5fe,stroke:#444,stroke-width:2px\n```\n\n### \u6a21\u5757\u8be6\u89e3\n\n\u60f3\u6df1\u5165\u4e86\u89e3\u6bcf\u4e2a\u6a21\u5757\u7684\u8bbe\u8ba1\u548c\u5b9e\u73b0\u5417\uff1f\u8bf7\u70b9\u51fb\u4e0b\u9762\u7684\u94fe\u63a5\u67e5\u770b\u8be6\u7ec6\u6587\u6863\uff1a\n\n- **[\ud83d\udcc4 `core` \u6a21\u5757](./docs/core_module.md)**: \u9879\u76ee\u7684\u4e2d\u592e\u534f\u8c03\u5668\uff0c\u6574\u5408\u5176\u4ed6\u6a21\u5757\u63d0\u4f9b\u7d22\u5f15\u548c\u641c\u7d22\u670d\u52a1\u3002\n- **[\ud83d\udcc4 `embeddings` \u6a21\u5757](./docs/embeddings_module.md)**: \u7edf\u4e00\u5b58\u50a8\u6a21\u5757\uff0c\u8d1f\u8d23\u5c06\u4ee3\u7801\u5757\u8f6c\u6362\u4e3a\u5411\u91cf\u5e76\u7ba1\u7406\u5b58\u50a8\u3002\n- **[\ud83d\udcc4 `parsers` \u6a21\u5757](./docs/parsers_module.md)**: \u4ee3\u7801\u89e3\u6790\u6838\u5fc3\uff0c\u4f7f\u7528 `tree-sitter` \u5c06\u6e90\u6587\u4ef6\u89e3\u6790\u4e3a\u7ed3\u6784\u5316\u6570\u636e\u3002\n- **[\ud83d\udcc4 `repository` \u6a21\u5757](./docs/repository_module.md)**: \u6570\u636e\u6e90\u83b7\u53d6\u5c42\uff0c\u8d1f\u8d23\u4ece Git \u6216\u672c\u5730\u76ee\u5f55\u83b7\u53d6\u4ee3\u7801\u3002\n- **[\ud83d\udcc4 `models` \u6a21\u5757](./docs/models_module.md)**: \u5bf9\u63a5\u5916\u90e8AI\u6a21\u578b\uff08LLM \u548c Embedding\uff09\u7684\u62bd\u8c61\u5c42\u3002\n- **[\ud83d\udcc4 `cli` \u6a21\u5757](./docs/cli_module.md)**: \u63d0\u4f9b\u5f3a\u5927\u7684\u547d\u4ee4\u884c\u63a5\u53e3\u3002\n\n## \u5feb\u901f\u5f00\u59cb\n\n### 1. \u5b89\u88c5\n\n```bash\n# \u4ece PyPI \u5b89\u88c5\npip install coderepoindex\n\n# \u6216\u8005\u4ece\u6e90\u7801\u5b89\u88c5\u6700\u65b0\u7248\u672c\ngit clone https://github.com/XingYu-Zhong/CodeRepoIndex.git\ncd CodeRepoIndex\npip install -e .\n```\n\n### 2. \u914d\u7f6e\n\nCodeRepoIndex \u63d0\u4f9b\u4e86\u975e\u5e38\u7075\u6d3b\u7684\u914d\u7f6e\u7cfb\u7edf\uff0c\u652f\u6301\u5206\u522b\u914d\u7f6e LLM \u6a21\u578b\u548c Embedding \u6a21\u578b\u7684 API \u5bc6\u94a5\u548c\u57fa\u7840 URL\u3002\u60a8\u53ef\u4ee5\u6839\u636e\u504f\u597d\u9009\u62e9\u5176\u4e2d\u4e00\u79cd\u914d\u7f6e\u65b9\u5f0f\u3002\n\n**\u914d\u7f6e\u52a0\u8f7d\u4f18\u5148\u7ea7**: `\u4ee3\u7801\u4e2d\u76f4\u63a5\u4f20\u5165\u7684\u53c2\u6570` > `\u73af\u5883\u53d8\u91cf` > `coderepoindex.json` > `\u9ed8\u8ba4\u503c`\u3002\n\n#### \u65b9\u5f0f\u4e00\uff1aJSON \u914d\u7f6e\u6587\u4ef6 (\u63a8\u8350)\n\n\u5728\u60a8\u7684\u9879\u76ee\u6839\u76ee\u5f55\u4e0b\u521b\u5efa\u4e00\u4e2a\u540d\u4e3a `coderepoindex.json` \u7684\u6587\u4ef6\u3002\u8fd9\u662f\u7ba1\u7406\u6240\u6709\u8bbe\u7f6e\u7684\u6700\u6e05\u6670\u7684\u65b9\u5f0f\u3002CodeRepoIndex \u4f1a\u81ea\u52a8\u67e5\u627e\u5e76\u52a0\u8f7d\u6b64\u6587\u4ef6\u3002\n\n**`coderepoindex.json` \u793a\u4f8b (\u5206\u79bb\u5f0f\u914d\u7f6e):**\n```json\n{\n \"project_name\": \"CodeRepoIndex\",\n \"log_level\": \"INFO\",\n \n \"llm\": {\n \"provider_type\": \"api\",\n \"model_name\": \"qwen-plus\",\n \"api_key\": \"your-llm-api-key\",\n \"base_url\": \"https://dashscope.aliyuncs.com/compatible-mode/v1\",\n \"timeout\": 30.0,\n \"extra_params\": {\n \"temperature\": 0.7,\n \"max_tokens\": 2000\n }\n },\n \n \"embedding\": {\n \"provider_type\": \"api\",\n \"model_name\": \"text-embedding-v3\",\n \"api_key\": \"your-embedding-api-key\",\n \"base_url\": \"https://dashscope.aliyuncs.com/compatible-mode/v1\",\n \"timeout\": 30.0,\n \"batch_size\": 32\n },\n \n \"storage\": {\n \"storage_backend\": \"local\",\n \"vector_backend\": \"chromadb\",\n \"base_path\": \"./my_code_index\",\n \"cache_enabled\": true,\n \"cache_size\": 1000\n }\n}\n```\n\n**\u517c\u5bb9\u6027\u914d\u7f6e (\u7edf\u4e00 API):**\n```json\n{\n \"api_key\": \"your-unified-api-key\",\n \"base_url\": \"https://dashscope.aliyuncs.com/compatible-mode/v1\",\n \"embedding_model\": \"text-embedding-v3\",\n \"storage_path\": \"./my_code_index\",\n \"vector_backend\": \"chromadb\",\n \"log_level\": \"INFO\"\n}\n```\n\n#### \u65b9\u5f0f\u4e8c\uff1a\u73af\u5883\u53d8\u91cf\n\n\u60a8\u4e5f\u53ef\u4ee5\u901a\u8fc7\u8bbe\u7f6e\u73af\u5883\u53d8\u91cf\u6765\u914d\u7f6e\uff0c\u8fd9\u5728 CI/CD \u6216 Docker \u73af\u5883\u4e2d\u975e\u5e38\u6709\u7528\u3002\n\n**\u5206\u79bb\u5f0f\u73af\u5883\u53d8\u91cf\u914d\u7f6e:**\n```bash\n# LLM \u914d\u7f6e\nexport CODEREPO_LLM_API_KEY=\"your-llm-api-key\"\nexport CODEREPO_LLM_BASE_URL=\"https://dashscope.aliyuncs.com/compatible-mode/v1\"\nexport CODEREPO_LLM_MODEL=\"qwen-plus\"\n\n# Embedding \u914d\u7f6e\nexport CODEREPO_EMBEDDING_API_KEY=\"your-embedding-api-key\"\nexport CODEREPO_EMBEDDING_BASE_URL=\"https://dashscope.aliyuncs.com/compatible-mode/v1\"\nexport CODEREPO_EMBEDDING_MODEL=\"text-embedding-v3\"\n\n# \u5b58\u50a8\u914d\u7f6e\nexport CODEREPO_STORAGE_PATH=\"./my_code_index\"\nexport CODEREPO_VECTOR_BACKEND=\"chromadb\"\nexport CODEREPO_LOG_LEVEL=\"INFO\"\n```\n\n**\u517c\u5bb9\u6027\u73af\u5883\u53d8\u91cf\u914d\u7f6e:**\n```bash\n# \u7edf\u4e00 API \u914d\u7f6e (LLM \u548c Embedding \u4f7f\u7528\u76f8\u540c\u7684 API)\nexport CODEREPO_API_KEY=\"your-api-key\"\nexport CODEREPO_BASE_URL=\"https://dashscope.aliyuncs.com/compatible-mode/v1\"\nexport CODEREPO_STORAGE_PATH=\"./my_code_index\"\n```\n\n#### \u65b9\u5f0f\u4e09\uff1a\u5728\u4ee3\u7801\u4e2d\u76f4\u63a5\u4f20\u5165\n\n\u5728\u5feb\u901f\u539f\u578b\u5f00\u53d1\u6216\u6d4b\u8bd5\u65f6\uff0c\u53ef\u4ee5\u76f4\u63a5\u5728\u4ee3\u7801\u4e2d\u5b9a\u4e49\u914d\u7f6e\u3002\n\n**\u5206\u79bb\u5f0f\u914d\u7f6e:**\n```python\nfrom coderepoindex.config import load_config\n\nconfig = load_config(\n llm_api_key=\"your-llm-key\",\n llm_base_url=\"https://api.openai.com/v1\",\n llm_model_name=\"gpt-4\",\n \n embedding_api_key=\"your-embedding-key\",\n embedding_base_url=\"https://api.cohere.ai/v1\", \n embedding_model_name=\"embed-english-v3.0\",\n \n storage_base_path=\"./temp_index\"\n)\n```\n\n**\u517c\u5bb9\u6027\u914d\u7f6e:**\n```python\nfrom coderepoindex.config import load_config\n\nconfig = load_config(\n api_key=\"your_api_key\",\n base_url=\"https://your-api-provider.com/v1\",\n storage_path=\"./temp_index\"\n)\n```\n\n> \u66f4\u591a\u9ad8\u7ea7\u914d\u7f6e\u9009\u9879\u548c\u8bf4\u660e\uff0c\u8bf7\u53c2\u8003 [**\u914d\u7f6e\u6587\u6863 (`docs/configuration.md`)**](./docs/configuration.md)\u3002\n\n### 3. \u4f7f\u7528\u793a\u4f8b\n\n#### \u793a\u4f8b 1: \u672c\u5730\u9879\u76ee\u5feb\u901f\u7d22\u5f15\u4e0e\u641c\u7d22 (\u63a8\u8350)\n\n\u8fd9\u662f\u6700\u7b80\u5355\u3001\u6700\u6838\u5fc3\u7684\u7528\u6cd5\uff0c\u5c55\u793a\u4e86\u5982\u4f55\u7d22\u5f15\u4e00\u4e2a\u672c\u5730\u4ee3\u7801\u76ee\u5f55\u5e76\u8fdb\u884c\u641c\u7d22\u3002\n\n```python\nfrom coderepoindex.config import load_config\nfrom coderepoindex.core import CodeIndexer, CodeSearcher\nfrom coderepoindex.repository import create_local_config\n\ndef main():\n # 1. \u52a0\u8f7d\u914d\u7f6e (\u6216\u4f7f\u7528\u73af\u5883\u53d8\u91cf/\u914d\u7f6e\u6587\u4ef6)\n # \u6ce8\u610f\uff1a\u8bf7\u66ff\u6362\u4e3a\u60a8\u7684\u771f\u5b9eAPI\u5bc6\u94a5\u548cURL\n config = load_config(config_dict={\n \"embedding\": {\n \"api_key\": \"your-embedding-api-key\",\n \"base_url\": \"https://dashscope.aliyuncs.com/compatible-mode/v1\",\n \"model_name\": \"text-embedding-v3\"\n },\n \"storage\": {\n \"base_path\": \"./my_code_index\"\n }\n })\n print(f\"\ud83d\udd27 \u914d\u7f6e\u52a0\u8f7d\u5b8c\u6210\uff0c\u4f7f\u7528\u6a21\u578b: {config.embedding.model_name}\")\n\n # 2. \u521d\u59cb\u5316\u6838\u5fc3\u7ec4\u4ef6\n indexer = CodeIndexer(config=config)\n searcher = CodeSearcher(config=config)\n \n # 3. \u5b9a\u4e49\u8981\u7d22\u5f15\u7684\u672c\u5730\u4ed3\u5e93\n # \u8bf7\u5c06 './coderepoindex' \u66ff\u6362\u4e3a\u60a8\u81ea\u5df1\u7684\u9879\u76ee\u8def\u5f84\n local_repo_path = \"./coderepoindex\"\n repo_config = create_local_config(path=local_repo_path)\n \n # 4. \u6267\u884c\u7d22\u5f15\n print(f\"\\n\ud83d\udd0d \u5f00\u59cb\u7d22\u5f15\u672c\u5730\u76ee\u5f55: {local_repo_path}\")\n # \u4f7f\u7528 with \u4e0a\u4e0b\u6587\u7ba1\u7406\u5668\u786e\u4fdd\u8d44\u6e90\u88ab\u6b63\u786e\u5904\u7406\n with indexer:\n index_stats = indexer.index_repository(repo_config, repository_id=\"my_local_project\")\n \n print(\"\u2705 \u7d22\u5f15\u5b8c\u6210!\")\n print(f\" - \u603b\u6587\u4ef6\u6570: {index_stats.get('total_files', 0)}\")\n print(f\" - \u4ee3\u7801\u5757\u6570: {index_stats.get('total_blocks', 0)}\")\n\n # 5. \u6267\u884c\u641c\u7d22\n print(\"\\n\ud83d\udd0e \u5f00\u59cb\u641c\u7d22...\")\n queries = [\n \"\u5982\u4f55\u5904\u7406\u6587\u4ef6\u4e0a\u4f20\",\n \"\u6570\u636e\u5e93\u8fde\u63a5\u6c60\u914d\u7f6e\",\n \"def get_user_by_id\"\n ]\n \n with searcher:\n for query in queries:\n print(f\"\\n\u25b6\ufe0f \u67e5\u8be2: '{query}'\")\n results = searcher.search(\n query=query,\n top_k=3,\n repository_id=\"my_local_project\" # \u6307\u5b9a\u5728\u54ea\u4e2a\u9879\u76ee\u4e2d\u641c\u7d22\n )\n \n if results:\n print(f\" \u627e\u5230 {len(results)} \u4e2a\u76f8\u5173\u7ed3\u679c:\")\n for i, result in enumerate(results, 1):\n print(f\" {i}. {result.block.file_path}:{result.block.line_start}\")\n print(f\" \u51fd\u6570/\u7c7b: {result.block.name}\")\n print(f\" \u76f8\u4f3c\u5ea6: {result.score:.4f}\")\n else:\n print(\" \u672a\u627e\u5230\u76f8\u5173\u7ed3\u679c\u3002\")\n\nif __name__ == \"__main__\":\n main()\n```\n\n#### \u793a\u4f8b 2: \u7d22\u5f15 Git \u4ed3\u5e93\u5e76\u4f7f\u7528\u591a\u9879\u76ee\u7ba1\u7406\n\n\u8fd9\u4e2a\u4f8b\u5b50\u5c55\u793a\u4e86\u66f4\u9ad8\u7ea7\u7684\u7528\u6cd5\uff0c\u5305\u62ec\u4eceGit\u4ed3\u5e93\u62c9\u53d6\u4ee3\u7801\u548c\u7ba1\u7406\u591a\u4e2a\u9879\u76ee\u3002\n\n```python\nfrom coderepoindex.core import create_project_manager\nfrom coderepoindex.repository import create_git_config\n\n# \u5047\u8bbe config \u5bf9\u8c61\u5df2\u50cf\u4e0a\u4e00\u4e2a\u793a\u4f8b\u4e00\u6837\u52a0\u8f7d\n# config = load_config(...) \n\n# 1. \u521b\u5efa\u9879\u76ee\u7ba1\u7406\u5668\npm = create_project_manager(config=config)\n\nwith pm:\n # 2. \u5b9a\u4e49\u5e76\u7d22\u5f15\u7b2c\u4e00\u4e2a\u9879\u76ee\n repo1_url = \"https://github.com/requests/requests.git\"\n repo1_config = create_git_config(repo1_url, branch=\"main\")\n pm.create_project(name=\"Python Requests\", repository_url=repo1_url, project_id=\"requests\")\n \n indexer = CodeIndexer(config=config)\n with indexer:\n indexer.index_repository(repo1_config, repository_id=\"requests\")\n print(\"\u2705 'requests' \u9879\u76ee\u7d22\u5f15\u5b8c\u6210\u3002\")\n\n # 3. \u5b9a\u4e49\u5e76\u7d22\u5f15\u7b2c\u4e8c\u4e2a\u9879\u76ee\n repo2_url = \"https://github.com/expressjs/express.git\"\n repo2_config = create_git_config(repo2_url, branch=\"master\")\n pm.create_project(name=\"Node Express\", repository_url=repo2_url, project_id=\"express\")\n \n with indexer:\n indexer.index_repository(repo2_config, repository_id=\"express\")\n print(\"\u2705 'express' \u9879\u76ee\u7d22\u5f15\u5b8c\u6210\u3002\")\n\n # 4. \u5728\u7279\u5b9a\u9879\u76ee\u4e2d\u641c\u7d22\n print(\"\\n\ud83d\udd0d \u5728 'requests' \u9879\u76ee\u4e2d\u641c\u7d22 'session management':\")\n results = pm.search_in_project(\n query=\"session management\",\n project_id=\"requests\",\n top_k=2\n )\n for result in results:\n print(f\" - \u627e\u5230: {result.block.file_path} - {result.block.name}\")\n\n # 5. \u5217\u51fa\u6240\u6709\u9879\u76ee\n print(\"\\n\ud83d\udccb \u5f53\u524d\u7ba1\u7406\u7684\u6240\u6709\u9879\u76ee:\")\n for proj in pm.list_projects():\n print(f\" - {proj.name} (ID: {proj.project_id})\")\n```\n\n#### \u914d\u7f6e\u6700\u4f73\u5b9e\u8df5\n\n**\u751f\u4ea7\u73af\u5883\u914d\u7f6e (`coderepoindex.json`)**:\n```json\n{\n \"project_name\": \"MyCompanyProject\",\n \"log_level\": \"INFO\",\n \n \"embedding\": {\n \"provider_type\": \"api\",\n \"model_name\": \"text-embedding-v3\",\n \"api_key\": \"${EMBEDDING_API_KEY}\",\n \"base_url\": \"https://dashscope.aliyuncs.com/compatible-mode/v1\",\n \"timeout\": 30.0,\n \"batch_size\": 32\n },\n \n \"storage\": {\n \"storage_backend\": \"local\",\n \"vector_backend\": \"chroma\",\n \"base_path\": \"./company_code_index\",\n \"cache_enabled\": true,\n \"cache_size\": 1000\n }\n}\n```\n\n**\u5f00\u53d1\u73af\u5883\u5feb\u901f\u914d\u7f6e**:\n```bash\n# \u8bbe\u7f6e\u73af\u5883\u53d8\u91cf\nexport CODEREPO_EMBEDDING_API_KEY=\"your-key\"\nexport CODEREPO_EMBEDDING_BASE_URL=\"https://api.provider.com/v1\"\nexport CODEREPO_STORAGE_PATH=\"./dev_index\"\n\n# \u8fd0\u884c\u4ee3\u7801\npython your_script.py\n```\n\n## \u547d\u4ee4\u884c\u5de5\u5177\n\nCodeRepoIndex \u8fd8\u63d0\u4f9b\u4e86\u5f3a\u5927\u7684\u547d\u4ee4\u884c\u5de5\u5177\uff0c\u65b9\u4fbf\u5feb\u901f\u64cd\u4f5c\u3002\u4f7f\u7528\u524d\u8bf7\u786e\u4fdd\u5df2\u901a\u8fc7\u73af\u5883\u53d8\u91cf\u6216\u914d\u7f6e\u6587\u4ef6\u8bbe\u7f6e\u597d API \u5bc6\u94a5\u7b49\u914d\u7f6e\u3002\n\n```bash\n# \u7d22\u5f15\u4e00\u4e2a\u672c\u5730\u76ee\u5f55\ncoderepoindex index local /path/to/your/project\n\n# \u7d22\u5f15\u4e00\u4e2a Git \u4ed3\u5e93\ncoderepoindex index git https://github.com/requests/requests.git\n\n# \u5411\u91cf\u8bed\u4e49\u641c\u7d22\ncoderepoindex search \"how to send a post request\"\n\n# \u4e2d\u6587\u8bed\u4e49\u641c\u7d22\ncoderepoindex search \"\u9519\u8bef\u5904\u7406\u548c\u5f02\u5e38\u6355\u83b7\"\n\n# \u4ee3\u7801\u7247\u6bb5\u641c\u7d22\ncoderepoindex search \"def upload_file(request):\"\n\n# \u5217\u51fa\u5df2\u7d22\u5f15\u7684\u4ed3\u5e93\ncoderepoindex list\n\n# \u67e5\u770b\u914d\u7f6e\u72b6\u6001\ncoderepoindex config show\n```\n\n## \u8d21\u732e\u6307\u5357\n\n\u6211\u4eec\u6b22\u8fce\u6240\u6709\u5f62\u5f0f\u7684\u8d21\u732e\uff01\u65e0\u8bba\u662f\u62a5\u544a\u95ee\u9898\u3001\u8d21\u732e\u4ee3\u7801\u8fd8\u662f\u6539\u8fdb\u6587\u6863\uff0c\u90fd\u5bf9\u6211\u4eec\u975e\u5e38\u6709\u4ef7\u503c\u3002\u8bf7\u67e5\u770b [**CONTRIBUTING.md**](CONTRIBUTING.md) \u4e86\u89e3\u8be6\u7ec6\u4fe1\u606f\u3002\n\n## \u8bb8\u53ef\u8bc1\n\n\u672c\u9879\u76ee\u57fa\u4e8e [MIT License](LICENSE) \u5f00\u6e90\u3002\n",
"bugtrack_url": null,
"license": "MIT License\n \n Copyright (c) 2024 CodeRepoIndex\n \n Permission is hereby granted, free of charge, to any person obtaining a copy\n of this software and associated documentation files (the \"Software\"), to deal\n in the Software without restriction, including without limitation the rights\n to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the Software is\n furnished to do so, subject to the following conditions:\n \n The above copyright notice and this permission notice shall be included in all\n copies or substantial portions of the Software.\n \n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n SOFTWARE. ",
"summary": "\u901a\u8fc7\u8bed\u4e49\u7406\u89e3\u63d0\u9ad8\u4ee3\u7801\u4ed3\u5e93\u7684\u53ef\u53d1\u73b0\u6027\u548c\u53ef\u641c\u7d22\u6027",
"version": "0.1.0",
"project_urls": {
"Bug Reports": "https://github.com/XingYu-Zhong/CodeRepoIndex/issues",
"Documentation": "https://coderepoindex.readthedocs.io/",
"Homepage": "https://github.com/XingYu-Zhong/CodeRepoIndex",
"Source": "https://github.com/XingYu-Zhong/CodeRepoIndex"
},
"split_keywords": [
"code search",
" vector search",
" semantic search",
" code indexing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "939a8643ffb462a9f768b6e5b9afff8549a64205cd8278af0c3277b48ab99562",
"md5": "265753b2d5678ccf966478553af2f4e0",
"sha256": "11417da4ebb27f2648ff619a002037a82fab98383f93834bad6b15de5470af86"
},
"downloads": -1,
"filename": "coderepoindex-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "265753b2d5678ccf966478553af2f4e0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 178469,
"upload_time": "2025-07-10T04:02:15",
"upload_time_iso_8601": "2025-07-10T04:02:15.283986Z",
"url": "https://files.pythonhosted.org/packages/93/9a/8643ffb462a9f768b6e5b9afff8549a64205cd8278af0c3277b48ab99562/coderepoindex-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "94f24343d845328a0981cfa9780217c1048ac490168aef7bcc796c153386ab7f",
"md5": "ced7e03d22d0d943a679d1e6bd7b12e4",
"sha256": "a1a2e0b2d3e8bd49178bbae94bb2eb4fdfad764bbead9780ad5f27a5f3983392"
},
"downloads": -1,
"filename": "coderepoindex-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "ced7e03d22d0d943a679d1e6bd7b12e4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 144387,
"upload_time": "2025-07-10T04:02:17",
"upload_time_iso_8601": "2025-07-10T04:02:17.943136Z",
"url": "https://files.pythonhosted.org/packages/94/f2/4343d845328a0981cfa9780217c1048ac490168aef7bcc796c153386ab7f/coderepoindex-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-10 04:02:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "XingYu-Zhong",
"github_project": "CodeRepoIndex",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "GitPython",
"specs": [
[
">=",
"3.1.0"
]
]
},
{
"name": "pathlib2",
"specs": [
[
">=",
"2.3.0"
]
]
}
],
"lcname": "coderepoindex"
}