search4file


Namesearch4file JSON
Version 0.1.15 PyPI version JSON
download
home_pagehttps://github.com/CoderWanFeng/python-office
Summarypython for office
upload_time2023-04-05 03:28:39
maintainer
docs_urlNone
authorCoderWanFeng
requires_python>=3.6
licenseApache-2.0 license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            

<p align="center" id='支付宝'>
    <a target="_blank" href='https://mp.weixin.qq.com/s/UsFs6ooDspyhhKMleKTVpw'>
    <img src="https://website-python-1300615378.cos.ap-nanjing.myqcloud.com/ads%2F%E8%81%94%E7%9B%9F%E5%B9%BF%E5%91%8A%2Falipay.jpg" width="100%"/>
    </a>   
</p>


<p align="center" id='外卖'>
    <a target="_blank" href='https://mp.weixin.qq.com/s/KfjQBf1n_slziZxeOQnhzQ'>
    <img src="https://website-python-1300615378.cos.ap-nanjing.myqcloud.com/ads%2F%E8%81%94%E7%9B%9F%E5%B9%BF%E5%91%8A%2F%E5%A4%96%E5%8D%96-1040-100.jpg" width="100%"/>
    </a>   
</p>





<p align="center" name="图标-github">
    <a target="_blank" href='https://github.com/CoderWanFeng/python-office'>
    <img src="https://img.shields.io/github/stars/CoderWanFeng/python-office.svg?style=social" alt="github star"/>
    </a>
    	<a target="_blank" href='https://gitee.com/CoderWanFeng/python-office'>
		<img src='https://gitee.com/CoderWanFeng//python-office/badge/star.svg?theme=dark' alt='gitee star'/>
	</a>
  	<a href="https://mp.weixin.qq.com/s/yaSmFKO3RrBpyanW3nvRAQ">
	<img src="https://img.shields.io/badge/QQ-163434413-orange"/>
  </a>
    	<a href="https://mp.weixin.qq.com/s/wx-JkgOUoJhb-7ZESxl93w">
	<img src="https://img.shields.io/badge/%E5%BE%AE%E4%BF%A1-%E4%BA%A4%E6%B5%81%E7%BE%A4-brightgreen"/>
  </a>
</p>


# search4file
pip install search4file

你好,我是Python程序员晚枫。这个库实现的功能:根据文件内容,搜索文件位置。
> 已经集成到python-office里了👉[视频教程](https://www.bilibili.com/video/BV13P411n77G)
开发者微信:[CoderWanFeng](https://mp.weixin.qq.com/s/FgKB-9XEG_KunLfjJbvdYw)
# 功能
- 通过内容查找文件
- 通过名称查找图片
- ocr识别图片内容
- 通过字幕、画面查找视频

# To List

有3类并行推进的任务:

## 1、查找逻辑

### 3个按照内容查找的接口,需要实现

[接口传送门](https://github.com/CoderWanFeng/search4file/blob/main/search4file/core/SearchByContent.py)
负责开发:[@yinzeyuan](https://github.com/yinzeyuan)

```python
    def search_pdf_file(self, file_path, search_content):
        pass

    def search_ppt_file(self, file_path, search_content):
        pass

    def search_excel_file(self, file_path, search_content):
        pass
```

### 1个按照文件名查找的接口,需要实现

[接口传送门](https://github.com/CoderWanFeng/search4file/blob/main/search4file/core/SearchByName.py)
负责开发:[@yinzeyuan](https://github.com/yinzeyuan)

```python
class SearchByName():

    # 搜索文件名的逻辑
    def search_files(self, search_path, search_content):
        pass
```
## 2、优化逻辑

优化内容,目前主要有:

1. 目前的word查找基于python-docx库,而这个库不支持mac、linux库。
    - 考虑改为解压docx的方式,对解压后的文件进行查找。
2. 目前对文件的查找,采用单线程同步遍历的方式,速度太慢。
    - 考虑改为进程 + 协程的异步方式,提高查询效率。
3. 增加OCR自动根据指定的图片内容,进行图片搜索。
    - 例如:用户输入:河流,查找出电脑里所有和河流有关的图片
4. 识别出视频里的内容。
    - 例如:用户输入:大山,查找出某个视频里,所有和大山有关的画面、字幕

## 3、交流群
![CoderWanFeng](https://python-office-1300615378.cos.ap-chongqing.myqcloud.com/python-office-qr.jpg)

<p align="center" id='腾讯云-banner'>
    <a target="_blank" href='https://url.cn/Z4lzPLaF'>
    <img src="https://website-python-1300615378.cos.ap-nanjing.myqcloud.com/ads%2F1040x100-tencent.jpg" width="100%"/>
    </a>   
</p>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/CoderWanFeng/python-office",
    "name": "search4file",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "CoderWanFeng",
    "author_email": "1957875073@qq.com",
    "download_url": "https://files.pythonhosted.org/packages/b2/c4/15c761d85032ca9efd33dbaf9216ace8e83a1b3c25942aef39aaf9598e9c/search4file-0.1.15.tar.gz",
    "platform": "any",
    "description": "\n\n<p align=\"center\" id='\u652f\u4ed8\u5b9d'>\n    <a target=\"_blank\" href='https://mp.weixin.qq.com/s/UsFs6ooDspyhhKMleKTVpw'>\n    <img src=\"https://website-python-1300615378.cos.ap-nanjing.myqcloud.com/ads%2F%E8%81%94%E7%9B%9F%E5%B9%BF%E5%91%8A%2Falipay.jpg\" width=\"100%\"/>\n    </a>   \n</p>\n\n\n<p align=\"center\" id='\u5916\u5356'>\n    <a target=\"_blank\" href='https://mp.weixin.qq.com/s/KfjQBf1n_slziZxeOQnhzQ'>\n    <img src=\"https://website-python-1300615378.cos.ap-nanjing.myqcloud.com/ads%2F%E8%81%94%E7%9B%9F%E5%B9%BF%E5%91%8A%2F%E5%A4%96%E5%8D%96-1040-100.jpg\" width=\"100%\"/>\n    </a>   \n</p>\n\n\n\n\n\n<p align=\"center\" name=\"\u56fe\u6807-github\">\n    <a target=\"_blank\" href='https://github.com/CoderWanFeng/python-office'>\n    <img src=\"https://img.shields.io/github/stars/CoderWanFeng/python-office.svg?style=social\" alt=\"github star\"/>\n    </a>\n    \t<a target=\"_blank\" href='https://gitee.com/CoderWanFeng/python-office'>\n\t\t<img src='https://gitee.com/CoderWanFeng//python-office/badge/star.svg?theme=dark' alt='gitee star'/>\n\t</a>\n  \t<a href=\"https://mp.weixin.qq.com/s/yaSmFKO3RrBpyanW3nvRAQ\">\n\t<img src=\"https://img.shields.io/badge/QQ-163434413-orange\"/>\n  </a>\n    \t<a href=\"https://mp.weixin.qq.com/s/wx-JkgOUoJhb-7ZESxl93w\">\n\t<img src=\"https://img.shields.io/badge/%E5%BE%AE%E4%BF%A1-%E4%BA%A4%E6%B5%81%E7%BE%A4-brightgreen\"/>\n  </a>\n</p>\n\n\n# search4file\npip install search4file\n\n\u4f60\u597d\uff0c\u6211\u662fPython\u7a0b\u5e8f\u5458\u665a\u67ab\u3002\u8fd9\u4e2a\u5e93\u5b9e\u73b0\u7684\u529f\u80fd\uff1a\u6839\u636e\u6587\u4ef6\u5185\u5bb9\uff0c\u641c\u7d22\u6587\u4ef6\u4f4d\u7f6e\u3002\n> \u5df2\u7ecf\u96c6\u6210\u5230python-office\u91cc\u4e86\ud83d\udc49[\u89c6\u9891\u6559\u7a0b](https://www.bilibili.com/video/BV13P411n77G)\n\u5f00\u53d1\u8005\u5fae\u4fe1\uff1a[CoderWanFeng](https://mp.weixin.qq.com/s/FgKB-9XEG_KunLfjJbvdYw)\n# \u529f\u80fd\n- \u901a\u8fc7\u5185\u5bb9\u67e5\u627e\u6587\u4ef6\n- \u901a\u8fc7\u540d\u79f0\u67e5\u627e\u56fe\u7247\n- ocr\u8bc6\u522b\u56fe\u7247\u5185\u5bb9\n- \u901a\u8fc7\u5b57\u5e55\u3001\u753b\u9762\u67e5\u627e\u89c6\u9891\n\n# To List\n\n\u67093\u7c7b\u5e76\u884c\u63a8\u8fdb\u7684\u4efb\u52a1\uff1a\n\n## 1\u3001\u67e5\u627e\u903b\u8f91\n\n### 3\u4e2a\u6309\u7167\u5185\u5bb9\u67e5\u627e\u7684\u63a5\u53e3\uff0c\u9700\u8981\u5b9e\u73b0\n\n[\u63a5\u53e3\u4f20\u9001\u95e8](https://github.com/CoderWanFeng/search4file/blob/main/search4file/core/SearchByContent.py)\n\u8d1f\u8d23\u5f00\u53d1\uff1a[@yinzeyuan](https://github.com/yinzeyuan)\n\n```python\n    def search_pdf_file(self, file_path, search_content):\n        pass\n\n    def search_ppt_file(self, file_path, search_content):\n        pass\n\n    def search_excel_file(self, file_path, search_content):\n        pass\n```\n\n### 1\u4e2a\u6309\u7167\u6587\u4ef6\u540d\u67e5\u627e\u7684\u63a5\u53e3\uff0c\u9700\u8981\u5b9e\u73b0\n\n[\u63a5\u53e3\u4f20\u9001\u95e8](https://github.com/CoderWanFeng/search4file/blob/main/search4file/core/SearchByName.py)\n\u8d1f\u8d23\u5f00\u53d1\uff1a[@yinzeyuan](https://github.com/yinzeyuan)\n\n```python\nclass SearchByName():\n\n    # \u641c\u7d22\u6587\u4ef6\u540d\u7684\u903b\u8f91\n    def search_files(self, search_path, search_content):\n        pass\n```\n## 2\u3001\u4f18\u5316\u903b\u8f91\n\n\u4f18\u5316\u5185\u5bb9\uff0c\u76ee\u524d\u4e3b\u8981\u6709\uff1a\n\n1. \u76ee\u524d\u7684word\u67e5\u627e\u57fa\u4e8epython-docx\u5e93\uff0c\u800c\u8fd9\u4e2a\u5e93\u4e0d\u652f\u6301mac\u3001linux\u5e93\u3002\n    - \u8003\u8651\u6539\u4e3a\u89e3\u538bdocx\u7684\u65b9\u5f0f\uff0c\u5bf9\u89e3\u538b\u540e\u7684\u6587\u4ef6\u8fdb\u884c\u67e5\u627e\u3002\n2. \u76ee\u524d\u5bf9\u6587\u4ef6\u7684\u67e5\u627e\uff0c\u91c7\u7528\u5355\u7ebf\u7a0b\u540c\u6b65\u904d\u5386\u7684\u65b9\u5f0f\uff0c\u901f\u5ea6\u592a\u6162\u3002\n    - \u8003\u8651\u6539\u4e3a\u8fdb\u7a0b + \u534f\u7a0b\u7684\u5f02\u6b65\u65b9\u5f0f\uff0c\u63d0\u9ad8\u67e5\u8be2\u6548\u7387\u3002\n3. \u589e\u52a0OCR\u81ea\u52a8\u6839\u636e\u6307\u5b9a\u7684\u56fe\u7247\u5185\u5bb9\uff0c\u8fdb\u884c\u56fe\u7247\u641c\u7d22\u3002\n    - \u4f8b\u5982\uff1a\u7528\u6237\u8f93\u5165\uff1a\u6cb3\u6d41\uff0c\u67e5\u627e\u51fa\u7535\u8111\u91cc\u6240\u6709\u548c\u6cb3\u6d41\u6709\u5173\u7684\u56fe\u7247\n4. \u8bc6\u522b\u51fa\u89c6\u9891\u91cc\u7684\u5185\u5bb9\u3002\n    - \u4f8b\u5982\uff1a\u7528\u6237\u8f93\u5165\uff1a\u5927\u5c71\uff0c\u67e5\u627e\u51fa\u67d0\u4e2a\u89c6\u9891\u91cc\uff0c\u6240\u6709\u548c\u5927\u5c71\u6709\u5173\u7684\u753b\u9762\u3001\u5b57\u5e55\n\n## 3\u3001\u4ea4\u6d41\u7fa4\n![CoderWanFeng](https://python-office-1300615378.cos.ap-chongqing.myqcloud.com/python-office-qr.jpg)\n\n<p align=\"center\" id='\u817e\u8baf\u4e91-banner'>\n    <a target=\"_blank\" href='https://url.cn/Z4lzPLaF'>\n    <img src=\"https://website-python-1300615378.cos.ap-nanjing.myqcloud.com/ads%2F1040x100-tencent.jpg\" width=\"100%\"/>\n    </a>   \n</p>\n",
    "bugtrack_url": null,
    "license": "Apache-2.0 license",
    "summary": "python for office",
    "version": "0.1.15",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "de3db59d5642004903817bbdc6ade7786c706f7c9106654171969fe6a05cc216",
                "md5": "41a50152e733f38f76310814e345a988",
                "sha256": "62b55a4ea7ee2263e39c5e388e61d1eebd7d8ab0dfa238256c8e699b749232c1"
            },
            "downloads": -1,
            "filename": "search4file-0.1.15-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "41a50152e733f38f76310814e345a988",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 16090,
            "upload_time": "2023-04-05T03:28:37",
            "upload_time_iso_8601": "2023-04-05T03:28:37.493140Z",
            "url": "https://files.pythonhosted.org/packages/de/3d/b59d5642004903817bbdc6ade7786c706f7c9106654171969fe6a05cc216/search4file-0.1.15-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b2c415c761d85032ca9efd33dbaf9216ace8e83a1b3c25942aef39aaf9598e9c",
                "md5": "3409e416d54034e82641fbb3eb075347",
                "sha256": "4a72d136a2b0e6cf88b76f4d09f0eea73fe4a30b7373d88377214fc4984d5ba4"
            },
            "downloads": -1,
            "filename": "search4file-0.1.15.tar.gz",
            "has_sig": false,
            "md5_digest": "3409e416d54034e82641fbb3eb075347",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 15906,
            "upload_time": "2023-04-05T03:28:39",
            "upload_time_iso_8601": "2023-04-05T03:28:39.765861Z",
            "url": "https://files.pythonhosted.org/packages/b2/c4/15c761d85032ca9efd33dbaf9216ace8e83a1b3c25942aef39aaf9598e9c/search4file-0.1.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-05 03:28:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "CoderWanFeng",
    "github_project": "python-office",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "search4file"
}
        
Elapsed time: 0.21523s