<p align="center" id='支付宝'>
<a target="_blank" href='https://mp.weixin.qq.com/s/UsFs6ooDspyhhKMleKTVpw'>
<img src="https://website-python-1300615378.cos.ap-nanjing.myqcloud.com/ads%2F%E8%81%94%E7%9B%9F%E5%B9%BF%E5%91%8A%2Falipay.jpg" width="100%"/>
</a>
</p>
<p align="center" id='外卖'>
<a target="_blank" href='https://mp.weixin.qq.com/s/KfjQBf1n_slziZxeOQnhzQ'>
<img src="https://website-python-1300615378.cos.ap-nanjing.myqcloud.com/ads%2F%E8%81%94%E7%9B%9F%E5%B9%BF%E5%91%8A%2F%E5%A4%96%E5%8D%96-1040-100.jpg" width="100%"/>
</a>
</p>
<p align="center" name="图标-github">
<a target="_blank" href='https://github.com/CoderWanFeng/python-office'>
<img src="https://img.shields.io/github/stars/CoderWanFeng/python-office.svg?style=social" alt="github star"/>
</a>
<a target="_blank" href='https://gitee.com/CoderWanFeng/python-office'>
<img src='https://gitee.com/CoderWanFeng//python-office/badge/star.svg?theme=dark' alt='gitee star'/>
</a>
<a href="https://mp.weixin.qq.com/s/yaSmFKO3RrBpyanW3nvRAQ">
<img src="https://img.shields.io/badge/QQ-163434413-orange"/>
</a>
<a href="https://mp.weixin.qq.com/s/wx-JkgOUoJhb-7ZESxl93w">
<img src="https://img.shields.io/badge/%E5%BE%AE%E4%BF%A1-%E4%BA%A4%E6%B5%81%E7%BE%A4-brightgreen"/>
</a>
</p>
# search4file
pip install search4file
你好,我是Python程序员晚枫。这个库实现的功能:根据文件内容,搜索文件位置。
> 已经集成到python-office里了👉[视频教程](https://www.bilibili.com/video/BV13P411n77G)
开发者微信:[CoderWanFeng](https://mp.weixin.qq.com/s/FgKB-9XEG_KunLfjJbvdYw)
# 功能
- 通过内容查找文件
- 通过名称查找图片
- ocr识别图片内容
- 通过字幕、画面查找视频
# To List
有3类并行推进的任务:
## 1、查找逻辑
### 3个按照内容查找的接口,需要实现
[接口传送门](https://github.com/CoderWanFeng/search4file/blob/main/search4file/core/SearchByContent.py)
负责开发:[@yinzeyuan](https://github.com/yinzeyuan)
```python
def search_pdf_file(self, file_path, search_content):
pass
def search_ppt_file(self, file_path, search_content):
pass
def search_excel_file(self, file_path, search_content):
pass
```
### 1个按照文件名查找的接口,需要实现
[接口传送门](https://github.com/CoderWanFeng/search4file/blob/main/search4file/core/SearchByName.py)
负责开发:[@yinzeyuan](https://github.com/yinzeyuan)
```python
class SearchByName():
# 搜索文件名的逻辑
def search_files(self, search_path, search_content):
pass
```
## 2、优化逻辑
优化内容,目前主要有:
1. 目前的word查找基于python-docx库,而这个库不支持mac、linux库。
- 考虑改为解压docx的方式,对解压后的文件进行查找。
2. 目前对文件的查找,采用单线程同步遍历的方式,速度太慢。
- 考虑改为进程 + 协程的异步方式,提高查询效率。
3. 增加OCR自动根据指定的图片内容,进行图片搜索。
- 例如:用户输入:河流,查找出电脑里所有和河流有关的图片
4. 识别出视频里的内容。
- 例如:用户输入:大山,查找出某个视频里,所有和大山有关的画面、字幕
## 3、交流群

<p align="center" id='腾讯云-banner'>
<a target="_blank" href='https://url.cn/Z4lzPLaF'>
<img src="https://website-python-1300615378.cos.ap-nanjing.myqcloud.com/ads%2F1040x100-tencent.jpg" width="100%"/>
</a>
</p>
Raw data
{
"_id": null,
"home_page": "https://github.com/CoderWanFeng/python-office",
"name": "search4file",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "",
"author": "CoderWanFeng",
"author_email": "1957875073@qq.com",
"download_url": "https://files.pythonhosted.org/packages/b2/c4/15c761d85032ca9efd33dbaf9216ace8e83a1b3c25942aef39aaf9598e9c/search4file-0.1.15.tar.gz",
"platform": "any",
"description": "\n\n<p align=\"center\" id='\u652f\u4ed8\u5b9d'>\n <a target=\"_blank\" href='https://mp.weixin.qq.com/s/UsFs6ooDspyhhKMleKTVpw'>\n <img src=\"https://website-python-1300615378.cos.ap-nanjing.myqcloud.com/ads%2F%E8%81%94%E7%9B%9F%E5%B9%BF%E5%91%8A%2Falipay.jpg\" width=\"100%\"/>\n </a> \n</p>\n\n\n<p align=\"center\" id='\u5916\u5356'>\n <a target=\"_blank\" href='https://mp.weixin.qq.com/s/KfjQBf1n_slziZxeOQnhzQ'>\n <img src=\"https://website-python-1300615378.cos.ap-nanjing.myqcloud.com/ads%2F%E8%81%94%E7%9B%9F%E5%B9%BF%E5%91%8A%2F%E5%A4%96%E5%8D%96-1040-100.jpg\" width=\"100%\"/>\n </a> \n</p>\n\n\n\n\n\n<p align=\"center\" name=\"\u56fe\u6807-github\">\n <a target=\"_blank\" href='https://github.com/CoderWanFeng/python-office'>\n <img src=\"https://img.shields.io/github/stars/CoderWanFeng/python-office.svg?style=social\" alt=\"github star\"/>\n </a>\n \t<a target=\"_blank\" href='https://gitee.com/CoderWanFeng/python-office'>\n\t\t<img src='https://gitee.com/CoderWanFeng//python-office/badge/star.svg?theme=dark' alt='gitee star'/>\n\t</a>\n \t<a href=\"https://mp.weixin.qq.com/s/yaSmFKO3RrBpyanW3nvRAQ\">\n\t<img src=\"https://img.shields.io/badge/QQ-163434413-orange\"/>\n </a>\n \t<a href=\"https://mp.weixin.qq.com/s/wx-JkgOUoJhb-7ZESxl93w\">\n\t<img src=\"https://img.shields.io/badge/%E5%BE%AE%E4%BF%A1-%E4%BA%A4%E6%B5%81%E7%BE%A4-brightgreen\"/>\n </a>\n</p>\n\n\n# search4file\npip install search4file\n\n\u4f60\u597d\uff0c\u6211\u662fPython\u7a0b\u5e8f\u5458\u665a\u67ab\u3002\u8fd9\u4e2a\u5e93\u5b9e\u73b0\u7684\u529f\u80fd\uff1a\u6839\u636e\u6587\u4ef6\u5185\u5bb9\uff0c\u641c\u7d22\u6587\u4ef6\u4f4d\u7f6e\u3002\n> \u5df2\u7ecf\u96c6\u6210\u5230python-office\u91cc\u4e86\ud83d\udc49[\u89c6\u9891\u6559\u7a0b](https://www.bilibili.com/video/BV13P411n77G)\n\u5f00\u53d1\u8005\u5fae\u4fe1\uff1a[CoderWanFeng](https://mp.weixin.qq.com/s/FgKB-9XEG_KunLfjJbvdYw)\n# \u529f\u80fd\n- \u901a\u8fc7\u5185\u5bb9\u67e5\u627e\u6587\u4ef6\n- \u901a\u8fc7\u540d\u79f0\u67e5\u627e\u56fe\u7247\n- ocr\u8bc6\u522b\u56fe\u7247\u5185\u5bb9\n- \u901a\u8fc7\u5b57\u5e55\u3001\u753b\u9762\u67e5\u627e\u89c6\u9891\n\n# To List\n\n\u67093\u7c7b\u5e76\u884c\u63a8\u8fdb\u7684\u4efb\u52a1\uff1a\n\n## 1\u3001\u67e5\u627e\u903b\u8f91\n\n### 3\u4e2a\u6309\u7167\u5185\u5bb9\u67e5\u627e\u7684\u63a5\u53e3\uff0c\u9700\u8981\u5b9e\u73b0\n\n[\u63a5\u53e3\u4f20\u9001\u95e8](https://github.com/CoderWanFeng/search4file/blob/main/search4file/core/SearchByContent.py)\n\u8d1f\u8d23\u5f00\u53d1\uff1a[@yinzeyuan](https://github.com/yinzeyuan)\n\n```python\n def search_pdf_file(self, file_path, search_content):\n pass\n\n def search_ppt_file(self, file_path, search_content):\n pass\n\n def search_excel_file(self, file_path, search_content):\n pass\n```\n\n### 1\u4e2a\u6309\u7167\u6587\u4ef6\u540d\u67e5\u627e\u7684\u63a5\u53e3\uff0c\u9700\u8981\u5b9e\u73b0\n\n[\u63a5\u53e3\u4f20\u9001\u95e8](https://github.com/CoderWanFeng/search4file/blob/main/search4file/core/SearchByName.py)\n\u8d1f\u8d23\u5f00\u53d1\uff1a[@yinzeyuan](https://github.com/yinzeyuan)\n\n```python\nclass SearchByName():\n\n # \u641c\u7d22\u6587\u4ef6\u540d\u7684\u903b\u8f91\n def search_files(self, search_path, search_content):\n pass\n```\n## 2\u3001\u4f18\u5316\u903b\u8f91\n\n\u4f18\u5316\u5185\u5bb9\uff0c\u76ee\u524d\u4e3b\u8981\u6709\uff1a\n\n1. \u76ee\u524d\u7684word\u67e5\u627e\u57fa\u4e8epython-docx\u5e93\uff0c\u800c\u8fd9\u4e2a\u5e93\u4e0d\u652f\u6301mac\u3001linux\u5e93\u3002\n - \u8003\u8651\u6539\u4e3a\u89e3\u538bdocx\u7684\u65b9\u5f0f\uff0c\u5bf9\u89e3\u538b\u540e\u7684\u6587\u4ef6\u8fdb\u884c\u67e5\u627e\u3002\n2. \u76ee\u524d\u5bf9\u6587\u4ef6\u7684\u67e5\u627e\uff0c\u91c7\u7528\u5355\u7ebf\u7a0b\u540c\u6b65\u904d\u5386\u7684\u65b9\u5f0f\uff0c\u901f\u5ea6\u592a\u6162\u3002\n - \u8003\u8651\u6539\u4e3a\u8fdb\u7a0b + \u534f\u7a0b\u7684\u5f02\u6b65\u65b9\u5f0f\uff0c\u63d0\u9ad8\u67e5\u8be2\u6548\u7387\u3002\n3. \u589e\u52a0OCR\u81ea\u52a8\u6839\u636e\u6307\u5b9a\u7684\u56fe\u7247\u5185\u5bb9\uff0c\u8fdb\u884c\u56fe\u7247\u641c\u7d22\u3002\n - \u4f8b\u5982\uff1a\u7528\u6237\u8f93\u5165\uff1a\u6cb3\u6d41\uff0c\u67e5\u627e\u51fa\u7535\u8111\u91cc\u6240\u6709\u548c\u6cb3\u6d41\u6709\u5173\u7684\u56fe\u7247\n4. \u8bc6\u522b\u51fa\u89c6\u9891\u91cc\u7684\u5185\u5bb9\u3002\n - \u4f8b\u5982\uff1a\u7528\u6237\u8f93\u5165\uff1a\u5927\u5c71\uff0c\u67e5\u627e\u51fa\u67d0\u4e2a\u89c6\u9891\u91cc\uff0c\u6240\u6709\u548c\u5927\u5c71\u6709\u5173\u7684\u753b\u9762\u3001\u5b57\u5e55\n\n## 3\u3001\u4ea4\u6d41\u7fa4\n\n\n<p align=\"center\" id='\u817e\u8baf\u4e91-banner'>\n <a target=\"_blank\" href='https://url.cn/Z4lzPLaF'>\n <img src=\"https://website-python-1300615378.cos.ap-nanjing.myqcloud.com/ads%2F1040x100-tencent.jpg\" width=\"100%\"/>\n </a> \n</p>\n",
"bugtrack_url": null,
"license": "Apache-2.0 license",
"summary": "python for office",
"version": "0.1.15",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "de3db59d5642004903817bbdc6ade7786c706f7c9106654171969fe6a05cc216",
"md5": "41a50152e733f38f76310814e345a988",
"sha256": "62b55a4ea7ee2263e39c5e388e61d1eebd7d8ab0dfa238256c8e699b749232c1"
},
"downloads": -1,
"filename": "search4file-0.1.15-py3-none-any.whl",
"has_sig": false,
"md5_digest": "41a50152e733f38f76310814e345a988",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 16090,
"upload_time": "2023-04-05T03:28:37",
"upload_time_iso_8601": "2023-04-05T03:28:37.493140Z",
"url": "https://files.pythonhosted.org/packages/de/3d/b59d5642004903817bbdc6ade7786c706f7c9106654171969fe6a05cc216/search4file-0.1.15-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b2c415c761d85032ca9efd33dbaf9216ace8e83a1b3c25942aef39aaf9598e9c",
"md5": "3409e416d54034e82641fbb3eb075347",
"sha256": "4a72d136a2b0e6cf88b76f4d09f0eea73fe4a30b7373d88377214fc4984d5ba4"
},
"downloads": -1,
"filename": "search4file-0.1.15.tar.gz",
"has_sig": false,
"md5_digest": "3409e416d54034e82641fbb3eb075347",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 15906,
"upload_time": "2023-04-05T03:28:39",
"upload_time_iso_8601": "2023-04-05T03:28:39.765861Z",
"url": "https://files.pythonhosted.org/packages/b2/c4/15c761d85032ca9efd33dbaf9216ace8e83a1b3c25942aef39aaf9598e9c/search4file-0.1.15.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-05 03:28:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "CoderWanFeng",
"github_project": "python-office",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "search4file"
}