elasticrag


Nameelasticrag JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryElasticsearch-based RAG system with ingest pipeline processing
upload_time2025-07-17 17:52:14
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords elasticsearch embedding nlp rag vector-search
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ElasticRAG

ElasticRAG 是一个基于 Elasticsearch 的 RAG(Retrieval-Augmented Generation)系统,充分利用 Elasticsearch 的 ingest pipeline 功能来处理整个 RAG 工作流。

## 特性

- 🔍 基于 Elasticsearch 的向量搜索和文本搜索
- 🛠️ 使用 ingest pipeline 进行文档处理和向量化
- 👥 多用户支持和认证
- 🧠 多模型支持(OpenAI、HuggingFace 等)
- 📚 知识库(Collection)管理
- 🔄 混合搜索和 RRF(Reciprocal Rank Fusion)算法
- 📄 支持多种文档格式的文本分割
- ⚙️ 支持环境变量配置和命令行参数
- 🌐 可选的 Web 管理界面

## 安装

### 基础安装

仅安装核心功能(CLI 命令行工具):

```bash
uv add elasticrag
```

### 完整安装

包含 Web 管理界面:

```bash
uv add 'elasticrag[web]'
```

### 开发安装

包含开发工具:

```bash
uv add 'elasticrag[dev]'
```

### 全部安装

包含所有功能:

```bash
uv add 'elasticrag[all]'
```

### 从源码安装

```bash
git clone <repository-url>
cd elasticrag
uv sync
# 或安装包含 web 界面
uv sync --extra web
```

## 配置

### 环境变量配置

创建 `.env` 文件(从 `.env.example` 复制):

```bash
cp .env.example .env
```

编辑 `.env` 文件:

```bash
# Elasticsearch Configuration
ELASTICSEARCH_HOST=http://localhost:9200

# Authentication
ELASTICRAG_USERNAME=your_username
ELASTICRAG_API_KEY=your_api_key

# Text Embedding Service
TEXT_EMBEDDING_URL=http://your-embedding-service:8080/embed
TEXT_EMBEDDING_API_KEY=your_embedding_api_key
```

### 命令行参数

你也可以通过命令行参数覆盖环境变量:

```bash
elasticrag --host localhost:9200 -u admin -k secret setup
```

## 快速开始

### 1. 初始化系统

```bash
elasticrag setup
```

### 2. 启动 Web 管理界面(可选)

⚠️ **注意**: Web 界面需要额外安装 gradio 依赖:

```bash
# 安装 web 依赖
uv add 'elasticrag[web]'

# 启动 web 界面
elasticrag server --port 7860
```

然后访问 http://localhost:7860 进入管理界面。

默认管理员账户:
- 用户名: admin
- 密码: admin123

### 3. 使用命令行工具

```bash
# 列出可用模型
elasticrag list-models

# 添加文档
elasticrag add document.pdf -c my_collection -m my_model

# 搜索文档
elasticrag search "your query" -c my_collection -m my_model -s 10
```

## CLI 命令参考

### 全局选项

- `--host`: Elasticsearch 主机地址
- `-u, --username`: 用户名
- `-k, --api-key`: API 密钥
- `-v, --verbose`: 启用详细日志

### 命令

- `setup`: 初始化系统
- `server`: 启动 Gradio Web 管理界面 **(需要安装 web 依赖)**
- `list-models`: 列出可用模型
- `list-users`: 列出所有用户
- `list-collections`: 列出所有集合
- `list-documents [collection] [model]`: 列出文档
- `add <file_path> [-c collection] [-m model]`: 添加文档
- `search <query> [-c collection] [-m model] [-s size]`: 搜索文档

#### server 命令选项

⚠️ **注意**: server 命令需要安装额外依赖:

```bash
uv add 'elasticrag[web]'
```

然后可以使用:

```bash
elasticrag server [选项]

选项:
  --port PORT           Web界面端口 (默认: 7860)
  --host HOST           Web界面主机 (默认: 0.0.0.0)
  --share               通过 Gradio 创建公共链接
  --admin-username USER 管理员用户名
  --admin-password PASS 管理员密码
```

## 依赖说明

### 核心依赖

- `elasticsearch>=8.0.0`: Elasticsearch 客户端
- `python-dotenv>=1.0.0`: 环境变量管理
- `aiohttp>=3.10.11`: 异步 HTTP 客户端

### 可选依赖

#### Web 界面 (`elasticrag[web]`)

- `gradio>=4.0.0`: Web 界面框架
- `pandas>=1.3.0`: 数据处理

#### 开发工具 (`elasticrag[dev]`)

- `pytest>=7.0.0`: 测试框架
- `pytest-asyncio>=0.21.0`: 异步测试支持
- `black>=23.0.0`: 代码格式化
- `isort>=5.12.0`: 导入排序

## Web 管理界面

### 安装 Web 依赖

```bash
uv add 'elasticrag[web]'
```

### 管理员功能

使用管理员账户登录后可以:

- **用户管理**: 查看、添加、删除用户
- **模型管理**: 查看、添加模型配置
- **系统监控**: 查看系统状态和资源使用

### 用户功能

使用普通用户账户登录后可以:

- **集合管理**: 查看自己的文档集合
- **文档管理**: 添加、删除、查看文档
- **搜索调试**: 在集合中搜索文档并查看结果

### 环境变量配置

Web 界面相关的环境变量:

```bash
# 管理员账户配置
ELASTICRAG_ADMIN_USERNAME=admin
ELASTICRAG_ADMIN_PASSWORD=admin123
```

## API 使用

```python
from elasticrag import Client

# 创建客户端
client = Client('http://localhost:9200')

# 认证用户
user = client.authenticate('username', 'api_key')

# 获取集合
collection = client.get_collection('my_collection', 'my_model')

# 添加文档
collection.add('doc_id', 'Document Name', text_content='Your content here')

# 搜索
results = await collection.query('your query')
```

## 开发

```bash
# 安装开发依赖
uv sync --extra dev

# 安装所有依赖(包括 web)
uv sync --extra all

# 运行测试
uv run pytest

# 代码格式化
uv run black .
uv run isort .
```

## 许可证

MIT License

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "elasticrag",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "elasticsearch, embedding, nlp, rag, vector-search",
    "author": null,
    "author_email": "Lloyd Zhou <lloydzhou@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/66/fb/b8e01ae03d5392d977f6dc877306c1bcdb7d54fea654ca81ac70deb9117d/elasticrag-0.1.1.tar.gz",
    "platform": null,
    "description": "# ElasticRAG\n\nElasticRAG \u662f\u4e00\u4e2a\u57fa\u4e8e Elasticsearch \u7684 RAG\uff08Retrieval-Augmented Generation\uff09\u7cfb\u7edf\uff0c\u5145\u5206\u5229\u7528 Elasticsearch \u7684 ingest pipeline \u529f\u80fd\u6765\u5904\u7406\u6574\u4e2a RAG \u5de5\u4f5c\u6d41\u3002\n\n## \u7279\u6027\n\n- \ud83d\udd0d \u57fa\u4e8e Elasticsearch \u7684\u5411\u91cf\u641c\u7d22\u548c\u6587\u672c\u641c\u7d22\n- \ud83d\udee0\ufe0f \u4f7f\u7528 ingest pipeline \u8fdb\u884c\u6587\u6863\u5904\u7406\u548c\u5411\u91cf\u5316\n- \ud83d\udc65 \u591a\u7528\u6237\u652f\u6301\u548c\u8ba4\u8bc1\n- \ud83e\udde0 \u591a\u6a21\u578b\u652f\u6301\uff08OpenAI\u3001HuggingFace \u7b49\uff09\n- \ud83d\udcda \u77e5\u8bc6\u5e93\uff08Collection\uff09\u7ba1\u7406\n- \ud83d\udd04 \u6df7\u5408\u641c\u7d22\u548c RRF\uff08Reciprocal Rank Fusion\uff09\u7b97\u6cd5\n- \ud83d\udcc4 \u652f\u6301\u591a\u79cd\u6587\u6863\u683c\u5f0f\u7684\u6587\u672c\u5206\u5272\n- \u2699\ufe0f \u652f\u6301\u73af\u5883\u53d8\u91cf\u914d\u7f6e\u548c\u547d\u4ee4\u884c\u53c2\u6570\n- \ud83c\udf10 \u53ef\u9009\u7684 Web \u7ba1\u7406\u754c\u9762\n\n## \u5b89\u88c5\n\n### \u57fa\u7840\u5b89\u88c5\n\n\u4ec5\u5b89\u88c5\u6838\u5fc3\u529f\u80fd\uff08CLI \u547d\u4ee4\u884c\u5de5\u5177\uff09\uff1a\n\n```bash\nuv add elasticrag\n```\n\n### \u5b8c\u6574\u5b89\u88c5\n\n\u5305\u542b Web \u7ba1\u7406\u754c\u9762\uff1a\n\n```bash\nuv add 'elasticrag[web]'\n```\n\n### \u5f00\u53d1\u5b89\u88c5\n\n\u5305\u542b\u5f00\u53d1\u5de5\u5177\uff1a\n\n```bash\nuv add 'elasticrag[dev]'\n```\n\n### \u5168\u90e8\u5b89\u88c5\n\n\u5305\u542b\u6240\u6709\u529f\u80fd\uff1a\n\n```bash\nuv add 'elasticrag[all]'\n```\n\n### \u4ece\u6e90\u7801\u5b89\u88c5\n\n```bash\ngit clone <repository-url>\ncd elasticrag\nuv sync\n# \u6216\u5b89\u88c5\u5305\u542b web \u754c\u9762\nuv sync --extra web\n```\n\n## \u914d\u7f6e\n\n### \u73af\u5883\u53d8\u91cf\u914d\u7f6e\n\n\u521b\u5efa `.env` \u6587\u4ef6\uff08\u4ece `.env.example` \u590d\u5236\uff09\uff1a\n\n```bash\ncp .env.example .env\n```\n\n\u7f16\u8f91 `.env` \u6587\u4ef6\uff1a\n\n```bash\n# Elasticsearch Configuration\nELASTICSEARCH_HOST=http://localhost:9200\n\n# Authentication\nELASTICRAG_USERNAME=your_username\nELASTICRAG_API_KEY=your_api_key\n\n# Text Embedding Service\nTEXT_EMBEDDING_URL=http://your-embedding-service:8080/embed\nTEXT_EMBEDDING_API_KEY=your_embedding_api_key\n```\n\n### \u547d\u4ee4\u884c\u53c2\u6570\n\n\u4f60\u4e5f\u53ef\u4ee5\u901a\u8fc7\u547d\u4ee4\u884c\u53c2\u6570\u8986\u76d6\u73af\u5883\u53d8\u91cf\uff1a\n\n```bash\nelasticrag --host localhost:9200 -u admin -k secret setup\n```\n\n## \u5feb\u901f\u5f00\u59cb\n\n### 1. \u521d\u59cb\u5316\u7cfb\u7edf\n\n```bash\nelasticrag setup\n```\n\n### 2. \u542f\u52a8 Web \u7ba1\u7406\u754c\u9762\uff08\u53ef\u9009\uff09\n\n\u26a0\ufe0f **\u6ce8\u610f**: Web \u754c\u9762\u9700\u8981\u989d\u5916\u5b89\u88c5 gradio \u4f9d\u8d56\uff1a\n\n```bash\n# \u5b89\u88c5 web \u4f9d\u8d56\nuv add 'elasticrag[web]'\n\n# \u542f\u52a8 web \u754c\u9762\nelasticrag server --port 7860\n```\n\n\u7136\u540e\u8bbf\u95ee http://localhost:7860 \u8fdb\u5165\u7ba1\u7406\u754c\u9762\u3002\n\n\u9ed8\u8ba4\u7ba1\u7406\u5458\u8d26\u6237\uff1a\n- \u7528\u6237\u540d: admin\n- \u5bc6\u7801: admin123\n\n### 3. \u4f7f\u7528\u547d\u4ee4\u884c\u5de5\u5177\n\n```bash\n# \u5217\u51fa\u53ef\u7528\u6a21\u578b\nelasticrag list-models\n\n# \u6dfb\u52a0\u6587\u6863\nelasticrag add document.pdf -c my_collection -m my_model\n\n# \u641c\u7d22\u6587\u6863\nelasticrag search \"your query\" -c my_collection -m my_model -s 10\n```\n\n## CLI \u547d\u4ee4\u53c2\u8003\n\n### \u5168\u5c40\u9009\u9879\n\n- `--host`: Elasticsearch \u4e3b\u673a\u5730\u5740\n- `-u, --username`: \u7528\u6237\u540d\n- `-k, --api-key`: API \u5bc6\u94a5\n- `-v, --verbose`: \u542f\u7528\u8be6\u7ec6\u65e5\u5fd7\n\n### \u547d\u4ee4\n\n- `setup`: \u521d\u59cb\u5316\u7cfb\u7edf\n- `server`: \u542f\u52a8 Gradio Web \u7ba1\u7406\u754c\u9762 **\uff08\u9700\u8981\u5b89\u88c5 web \u4f9d\u8d56\uff09**\n- `list-models`: \u5217\u51fa\u53ef\u7528\u6a21\u578b\n- `list-users`: \u5217\u51fa\u6240\u6709\u7528\u6237\n- `list-collections`: \u5217\u51fa\u6240\u6709\u96c6\u5408\n- `list-documents [collection] [model]`: \u5217\u51fa\u6587\u6863\n- `add <file_path> [-c collection] [-m model]`: \u6dfb\u52a0\u6587\u6863\n- `search <query> [-c collection] [-m model] [-s size]`: \u641c\u7d22\u6587\u6863\n\n#### server \u547d\u4ee4\u9009\u9879\n\n\u26a0\ufe0f **\u6ce8\u610f**: server \u547d\u4ee4\u9700\u8981\u5b89\u88c5\u989d\u5916\u4f9d\u8d56\uff1a\n\n```bash\nuv add 'elasticrag[web]'\n```\n\n\u7136\u540e\u53ef\u4ee5\u4f7f\u7528\uff1a\n\n```bash\nelasticrag server [\u9009\u9879]\n\n\u9009\u9879:\n  --port PORT           Web\u754c\u9762\u7aef\u53e3 (\u9ed8\u8ba4: 7860)\n  --host HOST           Web\u754c\u9762\u4e3b\u673a (\u9ed8\u8ba4: 0.0.0.0)\n  --share               \u901a\u8fc7 Gradio \u521b\u5efa\u516c\u5171\u94fe\u63a5\n  --admin-username USER \u7ba1\u7406\u5458\u7528\u6237\u540d\n  --admin-password PASS \u7ba1\u7406\u5458\u5bc6\u7801\n```\n\n## \u4f9d\u8d56\u8bf4\u660e\n\n### \u6838\u5fc3\u4f9d\u8d56\n\n- `elasticsearch>=8.0.0`: Elasticsearch \u5ba2\u6237\u7aef\n- `python-dotenv>=1.0.0`: \u73af\u5883\u53d8\u91cf\u7ba1\u7406\n- `aiohttp>=3.10.11`: \u5f02\u6b65 HTTP \u5ba2\u6237\u7aef\n\n### \u53ef\u9009\u4f9d\u8d56\n\n#### Web \u754c\u9762 (`elasticrag[web]`)\n\n- `gradio>=4.0.0`: Web \u754c\u9762\u6846\u67b6\n- `pandas>=1.3.0`: \u6570\u636e\u5904\u7406\n\n#### \u5f00\u53d1\u5de5\u5177 (`elasticrag[dev]`)\n\n- `pytest>=7.0.0`: \u6d4b\u8bd5\u6846\u67b6\n- `pytest-asyncio>=0.21.0`: \u5f02\u6b65\u6d4b\u8bd5\u652f\u6301\n- `black>=23.0.0`: \u4ee3\u7801\u683c\u5f0f\u5316\n- `isort>=5.12.0`: \u5bfc\u5165\u6392\u5e8f\n\n## Web \u7ba1\u7406\u754c\u9762\n\n### \u5b89\u88c5 Web \u4f9d\u8d56\n\n```bash\nuv add 'elasticrag[web]'\n```\n\n### \u7ba1\u7406\u5458\u529f\u80fd\n\n\u4f7f\u7528\u7ba1\u7406\u5458\u8d26\u6237\u767b\u5f55\u540e\u53ef\u4ee5\uff1a\n\n- **\u7528\u6237\u7ba1\u7406**: \u67e5\u770b\u3001\u6dfb\u52a0\u3001\u5220\u9664\u7528\u6237\n- **\u6a21\u578b\u7ba1\u7406**: \u67e5\u770b\u3001\u6dfb\u52a0\u6a21\u578b\u914d\u7f6e\n- **\u7cfb\u7edf\u76d1\u63a7**: \u67e5\u770b\u7cfb\u7edf\u72b6\u6001\u548c\u8d44\u6e90\u4f7f\u7528\n\n### \u7528\u6237\u529f\u80fd\n\n\u4f7f\u7528\u666e\u901a\u7528\u6237\u8d26\u6237\u767b\u5f55\u540e\u53ef\u4ee5\uff1a\n\n- **\u96c6\u5408\u7ba1\u7406**: \u67e5\u770b\u81ea\u5df1\u7684\u6587\u6863\u96c6\u5408\n- **\u6587\u6863\u7ba1\u7406**: \u6dfb\u52a0\u3001\u5220\u9664\u3001\u67e5\u770b\u6587\u6863\n- **\u641c\u7d22\u8c03\u8bd5**: \u5728\u96c6\u5408\u4e2d\u641c\u7d22\u6587\u6863\u5e76\u67e5\u770b\u7ed3\u679c\n\n### \u73af\u5883\u53d8\u91cf\u914d\u7f6e\n\nWeb \u754c\u9762\u76f8\u5173\u7684\u73af\u5883\u53d8\u91cf\uff1a\n\n```bash\n# \u7ba1\u7406\u5458\u8d26\u6237\u914d\u7f6e\nELASTICRAG_ADMIN_USERNAME=admin\nELASTICRAG_ADMIN_PASSWORD=admin123\n```\n\n## API \u4f7f\u7528\n\n```python\nfrom elasticrag import Client\n\n# \u521b\u5efa\u5ba2\u6237\u7aef\nclient = Client('http://localhost:9200')\n\n# \u8ba4\u8bc1\u7528\u6237\nuser = client.authenticate('username', 'api_key')\n\n# \u83b7\u53d6\u96c6\u5408\ncollection = client.get_collection('my_collection', 'my_model')\n\n# \u6dfb\u52a0\u6587\u6863\ncollection.add('doc_id', 'Document Name', text_content='Your content here')\n\n# \u641c\u7d22\nresults = await collection.query('your query')\n```\n\n## \u5f00\u53d1\n\n```bash\n# \u5b89\u88c5\u5f00\u53d1\u4f9d\u8d56\nuv sync --extra dev\n\n# \u5b89\u88c5\u6240\u6709\u4f9d\u8d56\uff08\u5305\u62ec web\uff09\nuv sync --extra all\n\n# \u8fd0\u884c\u6d4b\u8bd5\nuv run pytest\n\n# \u4ee3\u7801\u683c\u5f0f\u5316\nuv run black .\nuv run isort .\n```\n\n## \u8bb8\u53ef\u8bc1\n\nMIT License\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Elasticsearch-based RAG system with ingest pipeline processing",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://github.com/lloydzhou/elasticrag#readme",
        "Homepage": "https://github.com/lloydzhou/elasticrag",
        "Issues": "https://github.com/lloydzhou/elasticrag/issues",
        "Repository": "https://github.com/lloydzhou/elasticrag"
    },
    "split_keywords": [
        "elasticsearch",
        " embedding",
        " nlp",
        " rag",
        " vector-search"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "34e7d92711260bbfd5890832e1c61ec99010901f6d11a6f5027235b8ed624fdb",
                "md5": "7b9f555cc4cd9d48514db0f86f0f0512",
                "sha256": "7511b60a43ec9bc7eab9f3ee2f9648bbb71ee7ae694a90fa8ab8332e8d9e5ba8"
            },
            "downloads": -1,
            "filename": "elasticrag-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7b9f555cc4cd9d48514db0f86f0f0512",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 28058,
            "upload_time": "2025-07-17T17:52:12",
            "upload_time_iso_8601": "2025-07-17T17:52:12.787677Z",
            "url": "https://files.pythonhosted.org/packages/34/e7/d92711260bbfd5890832e1c61ec99010901f6d11a6f5027235b8ed624fdb/elasticrag-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "66fbb8e01ae03d5392d977f6dc877306c1bcdb7d54fea654ca81ac70deb9117d",
                "md5": "bed0f92150d23d00786272c474cb3648",
                "sha256": "dede1b67e1b07a36c44f5d0039943d35bc2a46e12890a72f73e84a2e69218242"
            },
            "downloads": -1,
            "filename": "elasticrag-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "bed0f92150d23d00786272c474cb3648",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 45291,
            "upload_time": "2025-07-17T17:52:14",
            "upload_time_iso_8601": "2025-07-17T17:52:14.223231Z",
            "url": "https://files.pythonhosted.org/packages/66/fb/b8e01ae03d5392d977f6dc877306c1bcdb7d54fea654ca81ac70deb9117d/elasticrag-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-17 17:52:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lloydzhou",
    "github_project": "elasticrag#readme",
    "github_not_found": true,
    "lcname": "elasticrag"
}
        
Elapsed time: 0.73400s