hsuBug


NamehsuBug JSON
Version 0.1.5 PyPI version JSON
download
home_pagehttps://github.com/lucashsu95/hsuBug
SummaryBeautifulSoup + requests + tqdm + pandas的一個庫
upload_time2024-08-24 07:27:49
maintainerNone
docs_urlNone
authorLucasHsu
requires_python>=3.7
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 爬蟲相關庫

這個專案包含了一些用於網頁爬蟲的實用工具和類別。

## 主要功能

1. `Bug` 類別:用於設置和管理爬蟲請求。
2. 檔案下載:支援從網頁下載檔案。
3. Excel 輸出:可將爬取的數據輸出為 Excel 檔案。
4. 環境變數管理:安全地獲取環境變數。
5. 鏈接檢查:驗證 URL 的有效性。

## 安裝指南

1. 確保您的系統已安裝 Python 3.6 或更高版本。

2. 使用 pip 安裝套件:

```shell
pip install hsuBug
```

## 使用示例

### 基本爬蟲設置

```python
from hsuBug import Bug

# 創建 Bug 實例
url = "https://example.com"
bug = Bug(url)

# 設置請求頭
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

# 設置並獲取網頁內容
bug.setup(headers)

# 現在您可以使用 bug.soup 來解析 HTML 內容
```

### 下載檔案

```python
from hsuBug.functions import downloadFile

url = "https://example.com/file.pdf"
filename = "example.pdf"
downloaded_file = downloadFile(url, filename)
print(f"檔案下載至:{downloaded_file}")
```

### 輸出到 Excel

```python
from hsuBug.functions import downloadByExcel

data = {
    "名稱": ["項目1", "項目2", "項目3"],
    "價格": [100, 200, 300]
}

df = downloadByExcel(data)
if df is not None:
    print("數據已成功輸出到 Excel 檔案")
```

### 檢查鏈接有效性

```python
from hsuBug.functions import checkLink

url = "https://example.com"
if checkLink(url):
    print("鏈接有效")
else:
    print("鏈接無效")
```

## 單元測試

```shell
python -m unittest test_hsuBug.py
```

## 發布

```shell
python setup.py sdist
twine upload dist/*
```

## 未來功能

1. aiohttp 版本

## 許可證

[MIT](https://choosealicense.com/licenses/mit/)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lucashsu95/hsuBug",
    "name": "hsuBug",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "LucasHsu",
    "author_email": "lucashsu95@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/c7/bf/8a541599fce066f01260c14794767d18cd90ae617c0434ad9aeee687db28/hsuBug-0.1.5.tar.gz",
    "platform": null,
    "description": "# \u722c\u87f2\u76f8\u95dc\u5eab\r\n\r\n\u9019\u500b\u5c08\u6848\u5305\u542b\u4e86\u4e00\u4e9b\u7528\u65bc\u7db2\u9801\u722c\u87f2\u7684\u5be6\u7528\u5de5\u5177\u548c\u985e\u5225\u3002\r\n\r\n## \u4e3b\u8981\u529f\u80fd\r\n\r\n1. `Bug` \u985e\u5225\uff1a\u7528\u65bc\u8a2d\u7f6e\u548c\u7ba1\u7406\u722c\u87f2\u8acb\u6c42\u3002\r\n2. \u6a94\u6848\u4e0b\u8f09\uff1a\u652f\u63f4\u5f9e\u7db2\u9801\u4e0b\u8f09\u6a94\u6848\u3002\r\n3. Excel \u8f38\u51fa\uff1a\u53ef\u5c07\u722c\u53d6\u7684\u6578\u64da\u8f38\u51fa\u70ba Excel \u6a94\u6848\u3002\r\n4. \u74b0\u5883\u8b8a\u6578\u7ba1\u7406\uff1a\u5b89\u5168\u5730\u7372\u53d6\u74b0\u5883\u8b8a\u6578\u3002\r\n5. \u93c8\u63a5\u6aa2\u67e5\uff1a\u9a57\u8b49 URL \u7684\u6709\u6548\u6027\u3002\r\n\r\n## \u5b89\u88dd\u6307\u5357\r\n\r\n1. \u78ba\u4fdd\u60a8\u7684\u7cfb\u7d71\u5df2\u5b89\u88dd Python 3.6 \u6216\u66f4\u9ad8\u7248\u672c\u3002\r\n\r\n2. \u4f7f\u7528 pip \u5b89\u88dd\u5957\u4ef6\uff1a\r\n\r\n```shell\r\npip install hsuBug\r\n```\r\n\r\n## \u4f7f\u7528\u793a\u4f8b\r\n\r\n### \u57fa\u672c\u722c\u87f2\u8a2d\u7f6e\r\n\r\n```python\r\nfrom hsuBug import Bug\r\n\r\n# \u5275\u5efa Bug \u5be6\u4f8b\r\nurl = \"https://example.com\"\r\nbug = Bug(url)\r\n\r\n# \u8a2d\u7f6e\u8acb\u6c42\u982d\r\nheaders = {\r\n    \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36\"\r\n}\r\n\r\n# \u8a2d\u7f6e\u4e26\u7372\u53d6\u7db2\u9801\u5167\u5bb9\r\nbug.setup(headers)\r\n\r\n# \u73fe\u5728\u60a8\u53ef\u4ee5\u4f7f\u7528 bug.soup \u4f86\u89e3\u6790 HTML \u5167\u5bb9\r\n```\r\n\r\n### \u4e0b\u8f09\u6a94\u6848\r\n\r\n```python\r\nfrom hsuBug.functions import downloadFile\r\n\r\nurl = \"https://example.com/file.pdf\"\r\nfilename = \"example.pdf\"\r\ndownloaded_file = downloadFile(url, filename)\r\nprint(f\"\u6a94\u6848\u4e0b\u8f09\u81f3\uff1a{downloaded_file}\")\r\n```\r\n\r\n### \u8f38\u51fa\u5230 Excel\r\n\r\n```python\r\nfrom hsuBug.functions import downloadByExcel\r\n\r\ndata = {\r\n    \"\u540d\u7a31\": [\"\u9805\u76ee1\", \"\u9805\u76ee2\", \"\u9805\u76ee3\"],\r\n    \"\u50f9\u683c\": [100, 200, 300]\r\n}\r\n\r\ndf = downloadByExcel(data)\r\nif df is not None:\r\n    print(\"\u6578\u64da\u5df2\u6210\u529f\u8f38\u51fa\u5230 Excel \u6a94\u6848\")\r\n```\r\n\r\n### \u6aa2\u67e5\u93c8\u63a5\u6709\u6548\u6027\r\n\r\n```python\r\nfrom hsuBug.functions import checkLink\r\n\r\nurl = \"https://example.com\"\r\nif checkLink(url):\r\n    print(\"\u93c8\u63a5\u6709\u6548\")\r\nelse:\r\n    print(\"\u93c8\u63a5\u7121\u6548\")\r\n```\r\n\r\n## \u55ae\u5143\u6e2c\u8a66\r\n\r\n```shell\r\npython -m unittest test_hsuBug.py\r\n```\r\n\r\n## \u767c\u5e03\r\n\r\n```shell\r\npython setup.py sdist\r\ntwine upload dist/*\r\n```\r\n\r\n## \u672a\u4f86\u529f\u80fd\r\n\r\n1. aiohttp \u7248\u672c\r\n\r\n## \u8a31\u53ef\u8b49\r\n\r\n[MIT](https://choosealicense.com/licenses/mit/)\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "BeautifulSoup + requests + tqdm + pandas\u7684\u4e00\u500b\u5eab",
    "version": "0.1.5",
    "project_urls": {
        "Download": "https://pypi.org/project/hsuBug/",
        "Homepage": "https://github.com/lucashsu95/hsuBug"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c7bf8a541599fce066f01260c14794767d18cd90ae617c0434ad9aeee687db28",
                "md5": "8cb85f96e4e10584f85a796f3f16a5ce",
                "sha256": "324650b49a368c015ea2e6f8e549656d24361f460d2872dbad26105b8be13935"
            },
            "downloads": -1,
            "filename": "hsuBug-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "8cb85f96e4e10584f85a796f3f16a5ce",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 4720,
            "upload_time": "2024-08-24T07:27:49",
            "upload_time_iso_8601": "2024-08-24T07:27:49.993114Z",
            "url": "https://files.pythonhosted.org/packages/c7/bf/8a541599fce066f01260c14794767d18cd90ae617c0434ad9aeee687db28/hsuBug-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-24 07:27:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lucashsu95",
    "github_project": "hsuBug",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "hsubug"
}
        
Elapsed time: 0.75335s