# DOC/DOCX Image Extractor MCP
一个功能强大的DOC/DOCX文档图片提取工具,支持MCP协议、命令行和Python API。具备智能图片格式检测、中文文档名转拼音、可配置选项等高级功能。
## ✨ 功能特性
- 🖼️ **智能图片提取**: 从DOC/DOCX文档中提取所有图片,支持多种格式
- 📄 **双格式支持**: 同时支持传统DOC格式和现代DOCX格式
- 🔍 **格式自动检测**: 根据文件头自动识别图片格式(PNG、JPEG、GIF等)
- 🔤 **中文支持**: 中文文档名自动转换为拼音目录名
- 📁 **智能目录管理**: 自动创建规范的目录结构
- 🔧 **MCP协议支持**: 与Claude Desktop等AI工具无缝集成
- ⚙️ **灵活配置**: 支持JSON配置文件,可自定义各种参数
- 🐍 **多种接口**: 提供Python API、命令行工具和MCP服务
- 📊 **详细预览**: 支持DOC/DOCX文档结构预览
- 🚀 **高性能**: 优化的处理流程,支持大文件处理
- 📝 **完整日志**: 详细的日志记录和错误处理
## 🚀 快速开始
### 安装
#### 方法 1: 从 PyPI 安装(推荐)
```bash
pip install docx-image-extractor-mcp
```
#### 方法 2: 从源码安装
```bash
git clone https://github.com/docx-image-extractor/docx-image-extractor-mcp.git
cd docx-image-extractor-mcp
pip install -e .
```
#### 验证安装
```bash
# 检查命令行工具
docx-image-extractor-mcp --help
docx-extract --help
# 检查Python模块
python -c "import docx_image_extractor_mcp; print('安装成功!')"
```
### 基本使用
```bash
# 命令行提取图片
docx-extract extract document.docx
docx-extract extract document.doc
# 预览文档结构
docx-extract preview document.docx
docx-extract preview document.doc
# 转换文件名为ASCII
docx-extract convert "测试文档.docx"
```
## 📖 使用方法
### 1. 命令行工具
```bash
# 提取单个文件的图片
docx-extract extract document.docx
docx-extract extract document.doc
# 提取多个文件到指定目录
docx-extract extract -o images/ doc1.docx doc2.doc
# 预览文档结构
docx-extract preview document.docx
docx-extract preview document.doc
# 转换文件名为ASCII
docx-extract convert "测试文档.docx" "另一个文档.doc"
# 显示配置
docx-extract config show
# 创建配置文件
docx-extract config create -o my-config.json
```
### 2. Python API
```python
from docx_image_extractor_mcp import extract_images, Config
# 基本使用 - 支持 DOC 和 DOCX
result = extract_images("document.docx")
print(f"提取了 {result['count']} 张图片到: {result['output_dir']}")
result = extract_images("document.doc")
print(f"提取了 {result['count']} 张图片到: {result['output_dir']}")
# 使用自定义配置
config = Config()
config.base_image_dir = "my_images"
config.image_naming_prefix = "pic"
result = extract_images("document.docx", config=config)
```
### 3. MCP服务
#### 从 PyPI 安装后的配置
在Claude Desktop配置文件中添加:
```json
{
"mcpServers": {
"docx-image-extractor": {
"command": "python",
"args": ["-m", "docx_image_extractor_mcp.main"],
"env": {}
}
}
}
```
#### Windows 用户注意事项
如果使用 `py` 命令:
```json
{
"mcpServers": {
"docx-image-extractor": {
"command": "py",
"args": ["-m", "docx_image_extractor_mcp.main"],
"env": {}
}
}
}
```
> 📖 详细配置指南请参考:[Windows 配置手册](docs/WINDOWS_SETUP_GUIDE.md) 和 [Claude Desktop 修复指南](docs/CLAUDE_DESKTOP_FIX.md)
可用的MCP工具:
- `extract_docx_images`: 提取DOC/DOCX文档中的图片
- `preview_docx_structure`: 预览DOCX文档结构(仅支持DOCX)
- `convert_filename_to_ascii`: 转换文件名为ASCII
## ⚙️ 配置选项
创建 `config.json` 文件来自定义行为:
```json
{
"base_image_dir": "extracted_images",
"image_naming": {
"prefix": "image",
"padding": 3
},
"logging": {
"level": "INFO",
"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
},
"extraction": {
"skip_empty_files": true,
"detect_format": true,
"supported_formats": [".png", ".jpg", ".jpeg", ".gif", ".bmp", ".tiff", ".webp"]
}
}
```
### 配置参数说明
- `base_image_dir`: 图片输出基础目录
- `image_naming.prefix`: 图片文件名前缀
- `image_naming.padding`: 图片编号填充位数
- `logging.level`: 日志级别(DEBUG、INFO、WARNING、ERROR)
- `extraction.skip_empty_files`: 是否跳过空图片文件
- `extraction.detect_format`: 是否自动检测图片格式
- `extraction.supported_formats`: 支持的图片格式列表
## 📊 API参考
### extract_images(doc_path, base_image_dir=None, config=None)
提取DOC/DOCX文档中的图片。
**参数:**
- `doc_path` (str): DOC/DOCX文档路径
- `base_image_dir` (str, 可选): 图片输出基础目录
- `config` (Config, 可选): 配置对象
**返回值:**
```python
{
"success": True, # 是否成功
"count": 5, # 提取的图片数量
"output_dir": "path/to/output", # 输出目录路径
"msg": "成功提取5张图片", # 状态消息
"images": [ # 图片列表
{
"filename": "image_001.png",
"path": "full/path/to/image_001.png",
"size": 12345,
"format": "PNG"
}
]
}
```
### to_ascii_dirname(filename)
将文件名转换为ASCII目录名。
**参数:**
- `filename` (str): 原始文件名
**返回值:**
- `str`: 转换后的ASCII目录名
## 🏗️ 项目结构
```
docx-image-extractor-mcp/
├── src/
│ └── docx_image_extractor_mcp/
│ ├── __init__.py # 主包入口
│ ├── __main__.py # 模块执行入口
│ ├── main.py # MCP服务器启动
│ ├── core/ # 核心功能模块
│ │ ├── __init__.py
│ │ ├── extractor.py # 图片提取核心逻辑
│ │ └── config.py # 配置管理
│ └── interfaces/ # 接口模块
│ ├── __init__.py
│ ├── cli.py # 命令行接口
│ └── mcp_server.py # MCP服务器接口
├── tests/ # 测试模块
│ ├── test_extractor.py # 核心功能测试
│ └── test_performance.py # 性能测试
├── docs/ # 文档目录
│ ├── WINDOWS_SETUP_GUIDE.md # Windows配置手册
│ └── CODE_STRUCTURE_OPTIMIZATION.md # 结构优化说明
├── requirements.txt # 依赖管理
├── pyproject.toml # 项目配置
├── .gitignore # Git忽略规则
└── README.md # 项目说明
```
## 🧪 测试
```bash
# 运行所有测试
python -m pytest tests/
# 运行功能测试
python -m pytest tests/test_extractor.py
# 运行性能测试
python -m pytest tests/test_performance.py
# 运行特定测试
python -m pytest tests/test_extractor.py::TestExtractor::test_extract_images_file_not_exists
```
## 📈 性能特性
- **内存优化**: 流式处理大文件,避免内存溢出
- **格式检测**: 智能识别图片格式,避免错误扩展名
- **并发处理**: 支持批量文件处理
- **错误恢复**: 完善的错误处理和恢复机制
## 🔧 依赖项
- Python >= 3.8
- mcp >= 0.1.0
- pypinyin >= 0.44.0
- python-docx >= 0.8.11
- olefile >= 0.46 (用于DOC文件支持)
- python-docx2txt >= 0.8 (用于DOC文件支持)
## 📝 更新日志
### v1.2.0
- ✨ 新增DOC文件格式支持
- ✨ 新增DOC文件结构预览功能
- 🔧 更新所有接口以支持DOC/DOCX双格式
- 📚 更新文档和示例以反映DOC支持
- 🐛 改进错误处理和兼容性
### v1.1.0
- ✨ 新增智能图片格式检测
- ✨ 新增配置文件支持
- ✨ 新增命令行工具
- ✨ 新增DOCX结构预览功能
- 🐛 改进错误处理和日志记录
- 🚀 性能优化和内存管理改进
- 📚 完善文档和测试用例
### v1.0.0
- 🎉 初始版本发布
- 基本的图片提取功能
- MCP协议支持
- 中文文档名转拼音
## 🤝 贡献
欢迎提交Issue和Pull Request!
## 📄 许可证
MIT License
Raw data
{
"_id": null,
"home_page": "https://github.com/docx-image-extractor/docx-image-extractor-mcp",
"name": "docx-image-extractor-mcp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "doc, docx, image, extractor, mcp, word, document, ole",
"author": "DOCX Image Extractor Team",
"author_email": "docx.extractor@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f2/0f/8fec81d982e384e353745cfd759e92e5bea7992777ef61cd454e6b6f9ac0/docx_image_extractor_mcp-1.2.1.tar.gz",
"platform": null,
"description": "# DOC/DOCX Image Extractor MCP\r\n\r\n\u4e00\u4e2a\u529f\u80fd\u5f3a\u5927\u7684DOC/DOCX\u6587\u6863\u56fe\u7247\u63d0\u53d6\u5de5\u5177\uff0c\u652f\u6301MCP\u534f\u8bae\u3001\u547d\u4ee4\u884c\u548cPython API\u3002\u5177\u5907\u667a\u80fd\u56fe\u7247\u683c\u5f0f\u68c0\u6d4b\u3001\u4e2d\u6587\u6587\u6863\u540d\u8f6c\u62fc\u97f3\u3001\u53ef\u914d\u7f6e\u9009\u9879\u7b49\u9ad8\u7ea7\u529f\u80fd\u3002\r\n\r\n## \u2728 \u529f\u80fd\u7279\u6027\r\n\r\n- \ud83d\uddbc\ufe0f **\u667a\u80fd\u56fe\u7247\u63d0\u53d6**: \u4eceDOC/DOCX\u6587\u6863\u4e2d\u63d0\u53d6\u6240\u6709\u56fe\u7247\uff0c\u652f\u6301\u591a\u79cd\u683c\u5f0f\r\n- \ud83d\udcc4 **\u53cc\u683c\u5f0f\u652f\u6301**: \u540c\u65f6\u652f\u6301\u4f20\u7edfDOC\u683c\u5f0f\u548c\u73b0\u4ee3DOCX\u683c\u5f0f\r\n- \ud83d\udd0d **\u683c\u5f0f\u81ea\u52a8\u68c0\u6d4b**: \u6839\u636e\u6587\u4ef6\u5934\u81ea\u52a8\u8bc6\u522b\u56fe\u7247\u683c\u5f0f\uff08PNG\u3001JPEG\u3001GIF\u7b49\uff09\r\n- \ud83d\udd24 **\u4e2d\u6587\u652f\u6301**: \u4e2d\u6587\u6587\u6863\u540d\u81ea\u52a8\u8f6c\u6362\u4e3a\u62fc\u97f3\u76ee\u5f55\u540d\r\n- \ud83d\udcc1 **\u667a\u80fd\u76ee\u5f55\u7ba1\u7406**: \u81ea\u52a8\u521b\u5efa\u89c4\u8303\u7684\u76ee\u5f55\u7ed3\u6784\r\n- \ud83d\udd27 **MCP\u534f\u8bae\u652f\u6301**: \u4e0eClaude Desktop\u7b49AI\u5de5\u5177\u65e0\u7f1d\u96c6\u6210\r\n- \u2699\ufe0f **\u7075\u6d3b\u914d\u7f6e**: \u652f\u6301JSON\u914d\u7f6e\u6587\u4ef6\uff0c\u53ef\u81ea\u5b9a\u4e49\u5404\u79cd\u53c2\u6570\r\n- \ud83d\udc0d **\u591a\u79cd\u63a5\u53e3**: \u63d0\u4f9bPython API\u3001\u547d\u4ee4\u884c\u5de5\u5177\u548cMCP\u670d\u52a1\r\n- \ud83d\udcca **\u8be6\u7ec6\u9884\u89c8**: \u652f\u6301DOC/DOCX\u6587\u6863\u7ed3\u6784\u9884\u89c8\r\n- \ud83d\ude80 **\u9ad8\u6027\u80fd**: \u4f18\u5316\u7684\u5904\u7406\u6d41\u7a0b\uff0c\u652f\u6301\u5927\u6587\u4ef6\u5904\u7406\r\n- \ud83d\udcdd **\u5b8c\u6574\u65e5\u5fd7**: \u8be6\u7ec6\u7684\u65e5\u5fd7\u8bb0\u5f55\u548c\u9519\u8bef\u5904\u7406\r\n\r\n## \ud83d\ude80 \u5feb\u901f\u5f00\u59cb\r\n\r\n### \u5b89\u88c5\r\n\r\n#### \u65b9\u6cd5 1: \u4ece PyPI \u5b89\u88c5\uff08\u63a8\u8350\uff09\r\n\r\n```bash\r\npip install docx-image-extractor-mcp\r\n```\r\n\r\n#### \u65b9\u6cd5 2: \u4ece\u6e90\u7801\u5b89\u88c5\r\n\r\n```bash\r\ngit clone https://github.com/docx-image-extractor/docx-image-extractor-mcp.git\r\ncd docx-image-extractor-mcp\r\npip install -e .\r\n```\r\n\r\n#### \u9a8c\u8bc1\u5b89\u88c5\r\n\r\n```bash\r\n# \u68c0\u67e5\u547d\u4ee4\u884c\u5de5\u5177\r\ndocx-image-extractor-mcp --help\r\ndocx-extract --help\r\n\r\n# \u68c0\u67e5Python\u6a21\u5757\r\npython -c \"import docx_image_extractor_mcp; print('\u5b89\u88c5\u6210\u529f\uff01')\"\r\n```\r\n\r\n### \u57fa\u672c\u4f7f\u7528\r\n\r\n```bash\r\n# \u547d\u4ee4\u884c\u63d0\u53d6\u56fe\u7247\r\ndocx-extract extract document.docx\r\ndocx-extract extract document.doc\r\n\r\n# \u9884\u89c8\u6587\u6863\u7ed3\u6784\r\ndocx-extract preview document.docx\r\ndocx-extract preview document.doc\r\n\r\n# \u8f6c\u6362\u6587\u4ef6\u540d\u4e3aASCII\r\ndocx-extract convert \"\u6d4b\u8bd5\u6587\u6863.docx\"\r\n```\r\n\r\n## \ud83d\udcd6 \u4f7f\u7528\u65b9\u6cd5\r\n\r\n### 1. \u547d\u4ee4\u884c\u5de5\u5177\r\n\r\n```bash\r\n# \u63d0\u53d6\u5355\u4e2a\u6587\u4ef6\u7684\u56fe\u7247\r\ndocx-extract extract document.docx\r\ndocx-extract extract document.doc\r\n\r\n# \u63d0\u53d6\u591a\u4e2a\u6587\u4ef6\u5230\u6307\u5b9a\u76ee\u5f55\r\ndocx-extract extract -o images/ doc1.docx doc2.doc\r\n\r\n# \u9884\u89c8\u6587\u6863\u7ed3\u6784\r\ndocx-extract preview document.docx\r\ndocx-extract preview document.doc\r\n\r\n# \u8f6c\u6362\u6587\u4ef6\u540d\u4e3aASCII\r\ndocx-extract convert \"\u6d4b\u8bd5\u6587\u6863.docx\" \"\u53e6\u4e00\u4e2a\u6587\u6863.doc\"\r\n\r\n# \u663e\u793a\u914d\u7f6e\r\ndocx-extract config show\r\n\r\n# \u521b\u5efa\u914d\u7f6e\u6587\u4ef6\r\ndocx-extract config create -o my-config.json\r\n```\r\n\r\n### 2. Python API\r\n\r\n```python\r\nfrom docx_image_extractor_mcp import extract_images, Config\r\n\r\n# \u57fa\u672c\u4f7f\u7528 - \u652f\u6301 DOC \u548c DOCX\r\nresult = extract_images(\"document.docx\")\r\nprint(f\"\u63d0\u53d6\u4e86 {result['count']} \u5f20\u56fe\u7247\u5230: {result['output_dir']}\")\r\n\r\nresult = extract_images(\"document.doc\")\r\nprint(f\"\u63d0\u53d6\u4e86 {result['count']} \u5f20\u56fe\u7247\u5230: {result['output_dir']}\")\r\n\r\n# \u4f7f\u7528\u81ea\u5b9a\u4e49\u914d\u7f6e\r\nconfig = Config()\r\nconfig.base_image_dir = \"my_images\"\r\nconfig.image_naming_prefix = \"pic\"\r\n\r\nresult = extract_images(\"document.docx\", config=config)\r\n```\r\n\r\n### 3. MCP\u670d\u52a1\r\n\r\n#### \u4ece PyPI \u5b89\u88c5\u540e\u7684\u914d\u7f6e\r\n\r\n\u5728Claude Desktop\u914d\u7f6e\u6587\u4ef6\u4e2d\u6dfb\u52a0\uff1a\r\n\r\n```json\r\n{\r\n \"mcpServers\": {\r\n \"docx-image-extractor\": {\r\n \"command\": \"python\",\r\n \"args\": [\"-m\", \"docx_image_extractor_mcp.main\"],\r\n \"env\": {}\r\n }\r\n }\r\n}\r\n```\r\n\r\n#### Windows \u7528\u6237\u6ce8\u610f\u4e8b\u9879\r\n\r\n\u5982\u679c\u4f7f\u7528 `py` \u547d\u4ee4\uff1a\r\n\r\n```json\r\n{\r\n \"mcpServers\": {\r\n \"docx-image-extractor\": {\r\n \"command\": \"py\",\r\n \"args\": [\"-m\", \"docx_image_extractor_mcp.main\"],\r\n \"env\": {}\r\n }\r\n }\r\n}\r\n```\r\n\r\n> \ud83d\udcd6 \u8be6\u7ec6\u914d\u7f6e\u6307\u5357\u8bf7\u53c2\u8003\uff1a[Windows \u914d\u7f6e\u624b\u518c](docs/WINDOWS_SETUP_GUIDE.md) \u548c [Claude Desktop \u4fee\u590d\u6307\u5357](docs/CLAUDE_DESKTOP_FIX.md)\r\n\r\n\u53ef\u7528\u7684MCP\u5de5\u5177\uff1a\r\n- `extract_docx_images`: \u63d0\u53d6DOC/DOCX\u6587\u6863\u4e2d\u7684\u56fe\u7247\r\n- `preview_docx_structure`: \u9884\u89c8DOCX\u6587\u6863\u7ed3\u6784\uff08\u4ec5\u652f\u6301DOCX\uff09\r\n- `convert_filename_to_ascii`: \u8f6c\u6362\u6587\u4ef6\u540d\u4e3aASCII\r\n\r\n## \u2699\ufe0f \u914d\u7f6e\u9009\u9879\r\n\r\n\u521b\u5efa `config.json` \u6587\u4ef6\u6765\u81ea\u5b9a\u4e49\u884c\u4e3a\uff1a\r\n\r\n```json\r\n{\r\n \"base_image_dir\": \"extracted_images\",\r\n \"image_naming\": {\r\n \"prefix\": \"image\",\r\n \"padding\": 3\r\n },\r\n \"logging\": {\r\n \"level\": \"INFO\",\r\n \"format\": \"%(asctime)s - %(name)s - %(levelname)s - %(message)s\"\r\n },\r\n \"extraction\": {\r\n \"skip_empty_files\": true,\r\n \"detect_format\": true,\r\n \"supported_formats\": [\".png\", \".jpg\", \".jpeg\", \".gif\", \".bmp\", \".tiff\", \".webp\"]\r\n }\r\n}\r\n```\r\n\r\n### \u914d\u7f6e\u53c2\u6570\u8bf4\u660e\r\n\r\n- `base_image_dir`: \u56fe\u7247\u8f93\u51fa\u57fa\u7840\u76ee\u5f55\r\n- `image_naming.prefix`: \u56fe\u7247\u6587\u4ef6\u540d\u524d\u7f00\r\n- `image_naming.padding`: \u56fe\u7247\u7f16\u53f7\u586b\u5145\u4f4d\u6570\r\n- `logging.level`: \u65e5\u5fd7\u7ea7\u522b\uff08DEBUG\u3001INFO\u3001WARNING\u3001ERROR\uff09\r\n- `extraction.skip_empty_files`: \u662f\u5426\u8df3\u8fc7\u7a7a\u56fe\u7247\u6587\u4ef6\r\n- `extraction.detect_format`: \u662f\u5426\u81ea\u52a8\u68c0\u6d4b\u56fe\u7247\u683c\u5f0f\r\n- `extraction.supported_formats`: \u652f\u6301\u7684\u56fe\u7247\u683c\u5f0f\u5217\u8868\r\n\r\n## \ud83d\udcca API\u53c2\u8003\r\n\r\n### extract_images(doc_path, base_image_dir=None, config=None)\r\n\r\n\u63d0\u53d6DOC/DOCX\u6587\u6863\u4e2d\u7684\u56fe\u7247\u3002\r\n\r\n**\u53c2\u6570:**\r\n- `doc_path` (str): DOC/DOCX\u6587\u6863\u8def\u5f84\r\n- `base_image_dir` (str, \u53ef\u9009): \u56fe\u7247\u8f93\u51fa\u57fa\u7840\u76ee\u5f55\r\n- `config` (Config, \u53ef\u9009): \u914d\u7f6e\u5bf9\u8c61\r\n\r\n**\u8fd4\u56de\u503c:**\r\n```python\r\n{\r\n \"success\": True, # \u662f\u5426\u6210\u529f\r\n \"count\": 5, # \u63d0\u53d6\u7684\u56fe\u7247\u6570\u91cf\r\n \"output_dir\": \"path/to/output\", # \u8f93\u51fa\u76ee\u5f55\u8def\u5f84\r\n \"msg\": \"\u6210\u529f\u63d0\u53d65\u5f20\u56fe\u7247\", # \u72b6\u6001\u6d88\u606f\r\n \"images\": [ # \u56fe\u7247\u5217\u8868\r\n {\r\n \"filename\": \"image_001.png\",\r\n \"path\": \"full/path/to/image_001.png\",\r\n \"size\": 12345,\r\n \"format\": \"PNG\"\r\n }\r\n ]\r\n}\r\n```\r\n\r\n### to_ascii_dirname(filename)\r\n\r\n\u5c06\u6587\u4ef6\u540d\u8f6c\u6362\u4e3aASCII\u76ee\u5f55\u540d\u3002\r\n\r\n**\u53c2\u6570:**\r\n- `filename` (str): \u539f\u59cb\u6587\u4ef6\u540d\r\n\r\n**\u8fd4\u56de\u503c:**\r\n- `str`: \u8f6c\u6362\u540e\u7684ASCII\u76ee\u5f55\u540d\r\n\r\n## \ud83c\udfd7\ufe0f \u9879\u76ee\u7ed3\u6784\r\n\r\n```\r\ndocx-image-extractor-mcp/\r\n\u251c\u2500\u2500 src/\r\n\u2502 \u2514\u2500\u2500 docx_image_extractor_mcp/\r\n\u2502 \u251c\u2500\u2500 __init__.py # \u4e3b\u5305\u5165\u53e3\r\n\u2502 \u251c\u2500\u2500 __main__.py # \u6a21\u5757\u6267\u884c\u5165\u53e3\r\n\u2502 \u251c\u2500\u2500 main.py # MCP\u670d\u52a1\u5668\u542f\u52a8\r\n\u2502 \u251c\u2500\u2500 core/ # \u6838\u5fc3\u529f\u80fd\u6a21\u5757\r\n\u2502 \u2502 \u251c\u2500\u2500 __init__.py\r\n\u2502 \u2502 \u251c\u2500\u2500 extractor.py # \u56fe\u7247\u63d0\u53d6\u6838\u5fc3\u903b\u8f91\r\n\u2502 \u2502 \u2514\u2500\u2500 config.py # \u914d\u7f6e\u7ba1\u7406\r\n\u2502 \u2514\u2500\u2500 interfaces/ # \u63a5\u53e3\u6a21\u5757\r\n\u2502 \u251c\u2500\u2500 __init__.py\r\n\u2502 \u251c\u2500\u2500 cli.py # \u547d\u4ee4\u884c\u63a5\u53e3\r\n\u2502 \u2514\u2500\u2500 mcp_server.py # MCP\u670d\u52a1\u5668\u63a5\u53e3\r\n\u251c\u2500\u2500 tests/ # \u6d4b\u8bd5\u6a21\u5757\r\n\u2502 \u251c\u2500\u2500 test_extractor.py # \u6838\u5fc3\u529f\u80fd\u6d4b\u8bd5\r\n\u2502 \u2514\u2500\u2500 test_performance.py # \u6027\u80fd\u6d4b\u8bd5\r\n\u251c\u2500\u2500 docs/ # \u6587\u6863\u76ee\u5f55\r\n\u2502 \u251c\u2500\u2500 WINDOWS_SETUP_GUIDE.md # Windows\u914d\u7f6e\u624b\u518c\r\n\u2502 \u2514\u2500\u2500 CODE_STRUCTURE_OPTIMIZATION.md # \u7ed3\u6784\u4f18\u5316\u8bf4\u660e\r\n\u251c\u2500\u2500 requirements.txt # \u4f9d\u8d56\u7ba1\u7406\r\n\u251c\u2500\u2500 pyproject.toml # \u9879\u76ee\u914d\u7f6e\r\n\u251c\u2500\u2500 .gitignore # Git\u5ffd\u7565\u89c4\u5219\r\n\u2514\u2500\u2500 README.md # \u9879\u76ee\u8bf4\u660e\r\n```\r\n\r\n## \ud83e\uddea \u6d4b\u8bd5\r\n\r\n```bash\r\n# \u8fd0\u884c\u6240\u6709\u6d4b\u8bd5\r\npython -m pytest tests/\r\n\r\n# \u8fd0\u884c\u529f\u80fd\u6d4b\u8bd5\r\npython -m pytest tests/test_extractor.py\r\n\r\n# \u8fd0\u884c\u6027\u80fd\u6d4b\u8bd5\r\npython -m pytest tests/test_performance.py\r\n\r\n# \u8fd0\u884c\u7279\u5b9a\u6d4b\u8bd5\r\npython -m pytest tests/test_extractor.py::TestExtractor::test_extract_images_file_not_exists\r\n```\r\n\r\n## \ud83d\udcc8 \u6027\u80fd\u7279\u6027\r\n\r\n- **\u5185\u5b58\u4f18\u5316**: \u6d41\u5f0f\u5904\u7406\u5927\u6587\u4ef6\uff0c\u907f\u514d\u5185\u5b58\u6ea2\u51fa\r\n- **\u683c\u5f0f\u68c0\u6d4b**: \u667a\u80fd\u8bc6\u522b\u56fe\u7247\u683c\u5f0f\uff0c\u907f\u514d\u9519\u8bef\u6269\u5c55\u540d\r\n- **\u5e76\u53d1\u5904\u7406**: \u652f\u6301\u6279\u91cf\u6587\u4ef6\u5904\u7406\r\n- **\u9519\u8bef\u6062\u590d**: \u5b8c\u5584\u7684\u9519\u8bef\u5904\u7406\u548c\u6062\u590d\u673a\u5236\r\n\r\n## \ud83d\udd27 \u4f9d\u8d56\u9879\r\n\r\n- Python >= 3.8\r\n- mcp >= 0.1.0\r\n- pypinyin >= 0.44.0\r\n- python-docx >= 0.8.11\r\n- olefile >= 0.46 (\u7528\u4e8eDOC\u6587\u4ef6\u652f\u6301)\r\n- python-docx2txt >= 0.8 (\u7528\u4e8eDOC\u6587\u4ef6\u652f\u6301)\r\n\r\n## \ud83d\udcdd \u66f4\u65b0\u65e5\u5fd7\r\n\r\n### v1.2.0\r\n- \u2728 \u65b0\u589eDOC\u6587\u4ef6\u683c\u5f0f\u652f\u6301\r\n- \u2728 \u65b0\u589eDOC\u6587\u4ef6\u7ed3\u6784\u9884\u89c8\u529f\u80fd\r\n- \ud83d\udd27 \u66f4\u65b0\u6240\u6709\u63a5\u53e3\u4ee5\u652f\u6301DOC/DOCX\u53cc\u683c\u5f0f\r\n- \ud83d\udcda \u66f4\u65b0\u6587\u6863\u548c\u793a\u4f8b\u4ee5\u53cd\u6620DOC\u652f\u6301\r\n- \ud83d\udc1b \u6539\u8fdb\u9519\u8bef\u5904\u7406\u548c\u517c\u5bb9\u6027\r\n\r\n### v1.1.0\r\n- \u2728 \u65b0\u589e\u667a\u80fd\u56fe\u7247\u683c\u5f0f\u68c0\u6d4b\r\n- \u2728 \u65b0\u589e\u914d\u7f6e\u6587\u4ef6\u652f\u6301\r\n- \u2728 \u65b0\u589e\u547d\u4ee4\u884c\u5de5\u5177\r\n- \u2728 \u65b0\u589eDOCX\u7ed3\u6784\u9884\u89c8\u529f\u80fd\r\n- \ud83d\udc1b \u6539\u8fdb\u9519\u8bef\u5904\u7406\u548c\u65e5\u5fd7\u8bb0\u5f55\r\n- \ud83d\ude80 \u6027\u80fd\u4f18\u5316\u548c\u5185\u5b58\u7ba1\u7406\u6539\u8fdb\r\n- \ud83d\udcda \u5b8c\u5584\u6587\u6863\u548c\u6d4b\u8bd5\u7528\u4f8b\r\n\r\n### v1.0.0\r\n- \ud83c\udf89 \u521d\u59cb\u7248\u672c\u53d1\u5e03\r\n- \u57fa\u672c\u7684\u56fe\u7247\u63d0\u53d6\u529f\u80fd\r\n- MCP\u534f\u8bae\u652f\u6301\r\n- \u4e2d\u6587\u6587\u6863\u540d\u8f6c\u62fc\u97f3\r\n\r\n## \ud83e\udd1d \u8d21\u732e\r\n\r\n\u6b22\u8fce\u63d0\u4ea4Issue\u548cPull Request\uff01\r\n\r\n## \ud83d\udcc4 \u8bb8\u53ef\u8bc1\r\n\r\nMIT License\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A powerful DOC/DOCX image extractor with MCP protocol support for Claude Desktop integration",
"version": "1.2.1",
"project_urls": {
"Bug Reports": "https://github.com/docx-image-extractor/docx-image-extractor-mcp/issues",
"Changelog": "https://github.com/docx-image-extractor/docx-image-extractor-mcp/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/docx-image-extractor/docx-image-extractor-mcp/blob/main/README.md",
"Homepage": "https://github.com/docx-image-extractor/docx-image-extractor-mcp",
"Source": "https://github.com/docx-image-extractor/docx-image-extractor-mcp"
},
"split_keywords": [
"doc",
" docx",
" image",
" extractor",
" mcp",
" word",
" document",
" ole"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "431a2b91d54ad8c3b6718e3e54bedbf8b71dee5b2890816b2f875e842a9b9d60",
"md5": "02472c3fae7b2ce9d56e247ed83fc64f",
"sha256": "7e1ca005ad46ee655aaea181c947dffa1065ee08b106351339d137bdf856a0a8"
},
"downloads": -1,
"filename": "docx_image_extractor_mcp-1.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "02472c3fae7b2ce9d56e247ed83fc64f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 18924,
"upload_time": "2025-08-06T13:40:05",
"upload_time_iso_8601": "2025-08-06T13:40:05.615698Z",
"url": "https://files.pythonhosted.org/packages/43/1a/2b91d54ad8c3b6718e3e54bedbf8b71dee5b2890816b2f875e842a9b9d60/docx_image_extractor_mcp-1.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f20f8fec81d982e384e353745cfd759e92e5bea7992777ef61cd454e6b6f9ac0",
"md5": "50f0b85fc00472bac00e37ca50f9f365",
"sha256": "babe01dddddc4b35b1e96e1105bd4c8748f52fc41c840b3c1140fce0a065ba19"
},
"downloads": -1,
"filename": "docx_image_extractor_mcp-1.2.1.tar.gz",
"has_sig": false,
"md5_digest": "50f0b85fc00472bac00e37ca50f9f365",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 55525,
"upload_time": "2025-08-06T13:40:07",
"upload_time_iso_8601": "2025-08-06T13:40:07.322885Z",
"url": "https://files.pythonhosted.org/packages/f2/0f/8fec81d982e384e353745cfd759e92e5bea7992777ef61cd454e6b6f9ac0/docx_image_extractor_mcp-1.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-06 13:40:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "docx-image-extractor",
"github_project": "docx-image-extractor-mcp",
"github_not_found": true,
"lcname": "docx-image-extractor-mcp"
}