| Name | hsuBug JSON |
| Version |
0.1.5
JSON |
| download |
| home_page | https://github.com/lucashsu95/hsuBug |
| Summary | BeautifulSoup + requests + tqdm + pandas的一個庫 |
| upload_time | 2024-08-24 07:27:49 |
| maintainer | None |
| docs_url | None |
| author | LucasHsu |
| requires_python | >=3.7 |
| license | MIT |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# 爬蟲相關庫
這個專案包含了一些用於網頁爬蟲的實用工具和類別。
## 主要功能
1. `Bug` 類別:用於設置和管理爬蟲請求。
2. 檔案下載:支援從網頁下載檔案。
3. Excel 輸出:可將爬取的數據輸出為 Excel 檔案。
4. 環境變數管理:安全地獲取環境變數。
5. 鏈接檢查:驗證 URL 的有效性。
## 安裝指南
1. 確保您的系統已安裝 Python 3.6 或更高版本。
2. 使用 pip 安裝套件:
```shell
pip install hsuBug
```
## 使用示例
### 基本爬蟲設置
```python
from hsuBug import Bug
# 創建 Bug 實例
url = "https://example.com"
bug = Bug(url)
# 設置請求頭
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
# 設置並獲取網頁內容
bug.setup(headers)
# 現在您可以使用 bug.soup 來解析 HTML 內容
```
### 下載檔案
```python
from hsuBug.functions import downloadFile
url = "https://example.com/file.pdf"
filename = "example.pdf"
downloaded_file = downloadFile(url, filename)
print(f"檔案下載至:{downloaded_file}")
```
### 輸出到 Excel
```python
from hsuBug.functions import downloadByExcel
data = {
"名稱": ["項目1", "項目2", "項目3"],
"價格": [100, 200, 300]
}
df = downloadByExcel(data)
if df is not None:
print("數據已成功輸出到 Excel 檔案")
```
### 檢查鏈接有效性
```python
from hsuBug.functions import checkLink
url = "https://example.com"
if checkLink(url):
print("鏈接有效")
else:
print("鏈接無效")
```
## 單元測試
```shell
python -m unittest test_hsuBug.py
```
## 發布
```shell
python setup.py sdist
twine upload dist/*
```
## 未來功能
1. aiohttp 版本
## 許可證
[MIT](https://choosealicense.com/licenses/mit/)
Raw data
{
"_id": null,
"home_page": "https://github.com/lucashsu95/hsuBug",
"name": "hsuBug",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": null,
"author": "LucasHsu",
"author_email": "lucashsu95@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/c7/bf/8a541599fce066f01260c14794767d18cd90ae617c0434ad9aeee687db28/hsuBug-0.1.5.tar.gz",
"platform": null,
"description": "# \u722c\u87f2\u76f8\u95dc\u5eab\r\n\r\n\u9019\u500b\u5c08\u6848\u5305\u542b\u4e86\u4e00\u4e9b\u7528\u65bc\u7db2\u9801\u722c\u87f2\u7684\u5be6\u7528\u5de5\u5177\u548c\u985e\u5225\u3002\r\n\r\n## \u4e3b\u8981\u529f\u80fd\r\n\r\n1. `Bug` \u985e\u5225\uff1a\u7528\u65bc\u8a2d\u7f6e\u548c\u7ba1\u7406\u722c\u87f2\u8acb\u6c42\u3002\r\n2. \u6a94\u6848\u4e0b\u8f09\uff1a\u652f\u63f4\u5f9e\u7db2\u9801\u4e0b\u8f09\u6a94\u6848\u3002\r\n3. Excel \u8f38\u51fa\uff1a\u53ef\u5c07\u722c\u53d6\u7684\u6578\u64da\u8f38\u51fa\u70ba Excel \u6a94\u6848\u3002\r\n4. \u74b0\u5883\u8b8a\u6578\u7ba1\u7406\uff1a\u5b89\u5168\u5730\u7372\u53d6\u74b0\u5883\u8b8a\u6578\u3002\r\n5. \u93c8\u63a5\u6aa2\u67e5\uff1a\u9a57\u8b49 URL \u7684\u6709\u6548\u6027\u3002\r\n\r\n## \u5b89\u88dd\u6307\u5357\r\n\r\n1. \u78ba\u4fdd\u60a8\u7684\u7cfb\u7d71\u5df2\u5b89\u88dd Python 3.6 \u6216\u66f4\u9ad8\u7248\u672c\u3002\r\n\r\n2. \u4f7f\u7528 pip \u5b89\u88dd\u5957\u4ef6\uff1a\r\n\r\n```shell\r\npip install hsuBug\r\n```\r\n\r\n## \u4f7f\u7528\u793a\u4f8b\r\n\r\n### \u57fa\u672c\u722c\u87f2\u8a2d\u7f6e\r\n\r\n```python\r\nfrom hsuBug import Bug\r\n\r\n# \u5275\u5efa Bug \u5be6\u4f8b\r\nurl = \"https://example.com\"\r\nbug = Bug(url)\r\n\r\n# \u8a2d\u7f6e\u8acb\u6c42\u982d\r\nheaders = {\r\n \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36\"\r\n}\r\n\r\n# \u8a2d\u7f6e\u4e26\u7372\u53d6\u7db2\u9801\u5167\u5bb9\r\nbug.setup(headers)\r\n\r\n# \u73fe\u5728\u60a8\u53ef\u4ee5\u4f7f\u7528 bug.soup \u4f86\u89e3\u6790 HTML \u5167\u5bb9\r\n```\r\n\r\n### \u4e0b\u8f09\u6a94\u6848\r\n\r\n```python\r\nfrom hsuBug.functions import downloadFile\r\n\r\nurl = \"https://example.com/file.pdf\"\r\nfilename = \"example.pdf\"\r\ndownloaded_file = downloadFile(url, filename)\r\nprint(f\"\u6a94\u6848\u4e0b\u8f09\u81f3\uff1a{downloaded_file}\")\r\n```\r\n\r\n### \u8f38\u51fa\u5230 Excel\r\n\r\n```python\r\nfrom hsuBug.functions import downloadByExcel\r\n\r\ndata = {\r\n \"\u540d\u7a31\": [\"\u9805\u76ee1\", \"\u9805\u76ee2\", \"\u9805\u76ee3\"],\r\n \"\u50f9\u683c\": [100, 200, 300]\r\n}\r\n\r\ndf = downloadByExcel(data)\r\nif df is not None:\r\n print(\"\u6578\u64da\u5df2\u6210\u529f\u8f38\u51fa\u5230 Excel \u6a94\u6848\")\r\n```\r\n\r\n### \u6aa2\u67e5\u93c8\u63a5\u6709\u6548\u6027\r\n\r\n```python\r\nfrom hsuBug.functions import checkLink\r\n\r\nurl = \"https://example.com\"\r\nif checkLink(url):\r\n print(\"\u93c8\u63a5\u6709\u6548\")\r\nelse:\r\n print(\"\u93c8\u63a5\u7121\u6548\")\r\n```\r\n\r\n## \u55ae\u5143\u6e2c\u8a66\r\n\r\n```shell\r\npython -m unittest test_hsuBug.py\r\n```\r\n\r\n## \u767c\u5e03\r\n\r\n```shell\r\npython setup.py sdist\r\ntwine upload dist/*\r\n```\r\n\r\n## \u672a\u4f86\u529f\u80fd\r\n\r\n1. aiohttp \u7248\u672c\r\n\r\n## \u8a31\u53ef\u8b49\r\n\r\n[MIT](https://choosealicense.com/licenses/mit/)\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "BeautifulSoup + requests + tqdm + pandas\u7684\u4e00\u500b\u5eab",
"version": "0.1.5",
"project_urls": {
"Download": "https://pypi.org/project/hsuBug/",
"Homepage": "https://github.com/lucashsu95/hsuBug"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c7bf8a541599fce066f01260c14794767d18cd90ae617c0434ad9aeee687db28",
"md5": "8cb85f96e4e10584f85a796f3f16a5ce",
"sha256": "324650b49a368c015ea2e6f8e549656d24361f460d2872dbad26105b8be13935"
},
"downloads": -1,
"filename": "hsuBug-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "8cb85f96e4e10584f85a796f3f16a5ce",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 4720,
"upload_time": "2024-08-24T07:27:49",
"upload_time_iso_8601": "2024-08-24T07:27:49.993114Z",
"url": "https://files.pythonhosted.org/packages/c7/bf/8a541599fce066f01260c14794767d18cd90ae617c0434ad9aeee687db28/hsuBug-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-24 07:27:49",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lucashsu95",
"github_project": "hsuBug",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "hsubug"
}