# hget-audio - Website Audio Downloader
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
[](https://scrapy.org/)
[English]
## Comprehensive Error Handling
hget-audio implements robust error handling throughout the application. When errors occur:
1. **Non-verbose mode (default)**:
- Captures all exceptions and displays a user-friendly message
- Recommends using `--verbose` for detailed error information
- Provides a unique error code for reference
- Logs full error details to a file for later analysis
2. **Verbose mode (`--verbose`)**:
- Displays complete error tracebacks
- Shows internal state information for debugging
- Includes additional diagnostic data
- Does not capture exceptions - allows full error propagation
### Error Handling Examples
**Without verbose flag**:
2023-10-15 14:30:25 [ERROR] Download failed (Error Code: DL-102)
Error: Connection timeout while downloading audio file.
Solution: Try increasing timeout with --timeout option
For more details, run with --verbose flag or check error log: errors_20231015_143025.log
text
**With verbose flag**:
2023-10-15 14:30:25 [ERROR] Full traceback:
File "/path/to/hget_audio/pipelines.py", line 215, in media_downloaded
response = super().media_downloaded(response, request, info, item=item)
File "/path/to/scrapy/pipelines/files.py", line 320, in media_downloaded
raise FileException("Connection timeout")
scrapy.exceptions.FileException: Connection timeout
Request details:
URL: https://example.com/audio/large.mp3
Referer: https://example.com/audio-page
Size: 150 MB (exceeds max size of 100 MB)
Format: audio/mpeg
Retry count: 2/3
System information:
Python: 3.9.12
Scrapy: 2.7.1
Platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.31
text
### Error Code Reference
| Code Range | Error Type | Example Codes |
|------------|--------------------------|--------------------|
| 100-199 | Network Errors | 101: Connection, 102: Timeout |
| 200-299 | File Validation Errors | 201: Invalid type, 202: Size |
| 300-399 | Configuration Errors | 301: Invalid URL, 302: Invalid depth |
| 400-499 | Scraping Errors | 401: Parser, 402: Spider |
| 500-599 | System Errors | 501: Disk full, 502: Permissions |
### Error Logging
All errors are logged to timestamped files in the `error_logs` directory:
error_logs/
├── errors_20231015_143025.log
├── errors_20231016_093412.log
└── errors_20231017_154723.log
text
Each log file contains:
1. Full error traceback
2. Request and response details
3. System environment information
4. Configuration settings at time of error
5. Memory usage statistics
## Installation
### Using pip
```bash
pip install hget-audio
From source
bash
git clone https://github.com/hyy-PROG/hget_audio.git
cd hget_audio
pip install .
Command Line Usage
Basic command
bash
hget-audio "https://example.com/audio-page" -o "my_audios"
Advanced options
bash
hget-audio "https://example.com" \
-d 3 \
-c 8 \
-f "mp3,wav" \
--exclude "admin,private" \
--max-size 50 \
--timeout 30 \
--retries 3 \
-o "filtered_audios"
Full options
bash
hget-audio --help
API Usage
python
from hget_audio.api import download_audio
# Download website audio
result = download_audio(
url="https://example.com/audio-page",
output_dir="my_audios",
depth=2,
formats="mp3,wav",
verbose=True # Enable detailed error reporting
)
print(f"Downloaded {result['audio_downloaded']} audio files")
print(f"Total size: {result['total_size'] / (1024*1024):.2f} MB")
Configuration Options
Option Description Default
-o, --output Output directory hget.output
-d, --depth Crawl depth 2
-c, --concurrency Concurrent requests 16
-f, --formats Audio formats (comma-separated) mp3,wav,ogg,m4a,flac,aac
--ignore-robots Ignore robots.txt rules False
--user-agent Custom User-Agent Default UA
--delay Request delay (seconds) 0.5
--timeout Request timeout (seconds) 30
--retries Max retry attempts 3
--max-size Max file size (MB) 100
--min-size Min file size (KB) 1
--include Include URL patterns (regex) Empty
--exclude Exclude URL patterns (regex) logout,admin,login
--dry-run Simulation mode (no download) False
-v, --verbose Verbose output and error reporting False
Example Output
text
2023-10-15 14:30:25 [INFO] Starting crawl: https://example.com/audio-page
2023-10-15 14:30:26 [DEBUG] Parsing page (depth=0): https://example.com/audio-page
2023-10-15 14:30:27 [INFO] Audio found: https://example.com/audio/sample1.mp3
2023-10-15 14:30:28 [INFO] Download successful: my_audios/example_com/sample1.mp3
...
2023-10-15 14:31:05 [INFO] Spider closed
==================================================
Scraping Summary
==================================================
Website: https://example.com/audio-page
Output Directory: /path/to/my_audios
Total Pages Crawled: 42
Audio Files Found: 15
Audio Files Downloaded: 12
Audio Files Skipped: 3
Errors Encountered: 0
Total Download Size: 245.7 MB
Contribution Guidelines
Fork the repository
Create your feature branch (git checkout -b feature/your-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin feature/your-feature)
Create a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contact
For issues or suggestions: support@hget-audio.example
[中文]
全面的错误处理
hget-audio 在整个应用程序中实现了强大的错误处理机制。当发生错误时:
非详细模式(默认):
捕获所有异常并显示用户友好的消息
建议使用 --verbose 参数获取详细错误信息
提供唯一的错误代码供参考
将完整错误详情记录到文件以供后续分析
详细模式 (--verbose):
显示完整的错误跟踪信息
显示内部状态信息用于调试
包含额外的诊断数据
不捕获异常 - 允许错误完全传播
错误处理示例
不使用详细标志:
text
2023-10-15 14:30:25 [ERROR] 下载失败 (错误代码: DL-102)
错误: 下载音频文件时连接超时
解决方案: 尝试使用 --timeout 选项增加超时时间
更多详情请使用 --verbose 参数运行或查看错误日志: errors_20231015_143025.log
使用详细标志:
text
2023-10-15 14:30:25 [ERROR] 完整错误跟踪:
File "/path/to/hget_audio/pipelines.py", line 215, in media_downloaded
response = super().media_downloaded(response, request, info, item=item)
File "/path/to/scrapy/pipelines/files.py", line 320, in media_downloaded
raise FileException("连接超时")
scrapy.exceptions.FileException: 连接超时
请求详情:
- URL: https://example.com/audio/large.mp3
- 来源页面: https://example.com/audio-page
- 大小: 150 MB (超过最大 100 MB 限制)
- 格式: audio/mpeg
- 重试次数: 2/3
系统信息:
- Python: 3.9.12
- Scrapy: 2.7.1
- 平台: Linux-5.15.0-86-generic-x86_64-with-glibc2.31
错误代码参考
代码范围 错误类型 示例代码
100-199 网络错误 101: 连接错误, 102: 超时
200-299 文件验证错误 201: 无效类型, 202: 大小不符
300-399 配置错误 301: 无效URL, 302: 无效深度
400-499 抓取错误 401: 解析错误, 402: 爬虫错误
500-599 系统错误 501: 磁盘已满, 502: 权限错误
错误日志记录
所有错误都记录在 error_logs 目录的时间戳文件中:
text
error_logs/
├── errors_20231015_143025.log
├── errors_20231016_093412.log
└── errors_20231017_154723.log
每个日志文件包含:
完整的错误跟踪信息
请求和响应详情
系统环境信息
错误发生时的配置设置
内存使用统计
安装
使用 pip 安装
bash
pip install hget-audio
从源码安装
bash
git clone https://github.com/hyy-PROG/hget_audio.git
cd hget-audio
pip install .
命令行使用
基本命令
bash
hget-audio "https://example.com/audio-page" -o "my_audios"
高级选项
bash
hget-audio "https://example.com" \
-d 3 \
-c 8 \
-f "mp3,wav" \
--exclude "admin,private" \
--max-size 50 \
--timeout 30 \
--retries 3 \
-o "filtered_audios"
完整选项
bash
hget-audio --help
API 使用
python
from hget_audio.api import download_audio
# 下载网站音频
result = download_audio(
url="https://example.com/audio-page",
output_dir="my_audios",
depth=2,
formats="mp3,wav",
verbose=True # 启用详细错误报告
)
print(f"下载了 {result['audio_downloaded']} 个音频文件")
print(f"总大小: {result['total_size'] / (1024*1024):.2f} MB")
配置选项
选项 描述 默认值
-o, --output 输出目录 hget.output
-d, --depth 爬取深度 2
-c, --concurrency 并发请求数 16
-f, --formats 音频格式 (逗号分隔) mp3,wav,ogg,m4a,flac,aac
--ignore-robots 忽略 robots.txt 规则 False
--user-agent 自定义 User-Agent 默认 UA
--delay 请求延迟 (秒) 0.5
--timeout 请求超时时间 (秒) 30
--retries 最大重试次数 3
--max-size 最大文件大小 (MB) 100
--min-size 最小文件大小 (KB) 1
--include 包含的 URL 模式 (正则) 空
--exclude 排除的 URL 模式 (正则) logout,admin,login
--dry-run 模拟运行模式 (不下载) False
-v, --verbose 详细输出和错误报告 False
示例输出
text
2023-10-15 14:30:25 [INFO] 开始爬取: https://example.com/audio-page
2023-10-15 14:30:26 [DEBUG] 解析页面 (depth=0): https://example.com/audio-page
2023-10-15 14:30:27 [INFO] 发现音频: https://example.com/audio/sample1.mp3
2023-10-15 14:30:28 [INFO] 下载成功: my_audios/example_com/sample1.mp3
...
2023-10-15 14:31:05 [INFO] 爬虫结束
==================================================
爬取统计
==================================================
网站: https://example.com/audio-page
输出目录: /path/to/my_audios
爬取页面: 42
发现音频: 15
下载音频: 12
跳过音频: 3
错误: 0
总下载大小: 245.7 MB
贡献指南
Fork 项目仓库
创建特性分支 (git checkout -b feature/your-feature)
提交更改 (git commit -am '添加新功能')
推送到分支 (git push origin feature/your-feature)
创建 Pull Request
许可证
本项目采用 MIT 许可证 - 详情请见 LICENSE 文件。
Raw data
{
"_id": null,
"home_page": "https://github.com/hyy-PROG/hget_audio",
"name": "hget-audio",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "audio scraping downloader web crawler mp3 wav ogg podcast",
"author": "huang yi yi",
"author_email": "363766687@qq.com",
"download_url": "https://files.pythonhosted.org/packages/cd/bd/61e4329a920860839233a0b4843c7e121e6363cb568271dd3254e8ea6449/hget_audio-2025.7.24a0.tar.gz",
"platform": null,
"description": "# hget-audio - Website Audio Downloader\r\n\r\n[](https://opensource.org/licenses/MIT)\r\n[](https://www.python.org/downloads/)\r\n[](https://scrapy.org/)\r\n\r\n[English]\r\n## Comprehensive Error Handling\r\n\r\nhget-audio implements robust error handling throughout the application. When errors occur:\r\n\r\n1. **Non-verbose mode (default)**:\r\n - Captures all exceptions and displays a user-friendly message\r\n - Recommends using `--verbose` for detailed error information\r\n - Provides a unique error code for reference\r\n - Logs full error details to a file for later analysis\r\n\r\n2. **Verbose mode (`--verbose`)**:\r\n - Displays complete error tracebacks\r\n - Shows internal state information for debugging\r\n - Includes additional diagnostic data\r\n - Does not capture exceptions - allows full error propagation\r\n\r\n### Error Handling Examples\r\n\r\n**Without verbose flag**:\r\n2023-10-15 14:30:25 [ERROR] Download failed (Error Code: DL-102)\r\nError: Connection timeout while downloading audio file.\r\nSolution: Try increasing timeout with --timeout option\r\nFor more details, run with --verbose flag or check error log: errors_20231015_143025.log\r\n\r\ntext\r\n\r\n**With verbose flag**:\r\n2023-10-15 14:30:25 [ERROR] Full traceback:\r\nFile \"/path/to/hget_audio/pipelines.py\", line 215, in media_downloaded\r\nresponse = super().media_downloaded(response, request, info, item=item)\r\nFile \"/path/to/scrapy/pipelines/files.py\", line 320, in media_downloaded\r\nraise FileException(\"Connection timeout\")\r\n\r\nscrapy.exceptions.FileException: Connection timeout\r\n\r\nRequest details:\r\n\r\nURL: https://example.com/audio/large.mp3\r\n\r\nReferer: https://example.com/audio-page\r\n\r\nSize: 150 MB (exceeds max size of 100 MB)\r\n\r\nFormat: audio/mpeg\r\n\r\nRetry count: 2/3\r\n\r\nSystem information:\r\n\r\nPython: 3.9.12\r\n\r\nScrapy: 2.7.1\r\n\r\nPlatform: Linux-5.15.0-86-generic-x86_64-with-glibc2.31\r\n\r\ntext\r\n\r\n### Error Code Reference\r\n\r\n| Code Range | Error Type | Example Codes |\r\n|------------|--------------------------|--------------------|\r\n| 100-199 | Network Errors | 101: Connection, 102: Timeout |\r\n| 200-299 | File Validation Errors | 201: Invalid type, 202: Size |\r\n| 300-399 | Configuration Errors | 301: Invalid URL, 302: Invalid depth |\r\n| 400-499 | Scraping Errors | 401: Parser, 402: Spider |\r\n| 500-599 | System Errors | 501: Disk full, 502: Permissions |\r\n\r\n### Error Logging\r\n\r\nAll errors are logged to timestamped files in the `error_logs` directory:\r\nerror_logs/\r\n\u251c\u2500\u2500 errors_20231015_143025.log\r\n\u251c\u2500\u2500 errors_20231016_093412.log\r\n\u2514\u2500\u2500 errors_20231017_154723.log\r\n\r\ntext\r\n\r\nEach log file contains:\r\n1. Full error traceback\r\n2. Request and response details\r\n3. System environment information\r\n4. Configuration settings at time of error\r\n5. Memory usage statistics\r\n\r\n## Installation\r\n\r\n### Using pip\r\n```bash\r\npip install hget-audio\r\nFrom source\r\nbash\r\ngit clone https://github.com/hyy-PROG/hget_audio.git\r\ncd hget_audio\r\npip install .\r\nCommand Line Usage\r\nBasic command\r\nbash\r\nhget-audio \"https://example.com/audio-page\" -o \"my_audios\"\r\nAdvanced options\r\nbash\r\nhget-audio \"https://example.com\" \\\r\n -d 3 \\\r\n -c 8 \\\r\n -f \"mp3,wav\" \\\r\n --exclude \"admin,private\" \\\r\n --max-size 50 \\\r\n --timeout 30 \\\r\n --retries 3 \\\r\n -o \"filtered_audios\"\r\nFull options\r\nbash\r\nhget-audio --help\r\nAPI Usage\r\npython\r\nfrom hget_audio.api import download_audio\r\n\r\n# Download website audio\r\nresult = download_audio(\r\n url=\"https://example.com/audio-page\",\r\n output_dir=\"my_audios\",\r\n depth=2,\r\n formats=\"mp3,wav\",\r\n verbose=True # Enable detailed error reporting\r\n)\r\n\r\nprint(f\"Downloaded {result['audio_downloaded']} audio files\")\r\nprint(f\"Total size: {result['total_size'] / (1024*1024):.2f} MB\")\r\nConfiguration Options\r\nOption\tDescription\tDefault\r\n-o, --output\tOutput directory\thget.output\r\n-d, --depth\tCrawl depth\t2\r\n-c, --concurrency\tConcurrent requests\t16\r\n-f, --formats\tAudio formats (comma-separated)\tmp3,wav,ogg,m4a,flac,aac\r\n--ignore-robots\tIgnore robots.txt rules\tFalse\r\n--user-agent\tCustom User-Agent\tDefault UA\r\n--delay\tRequest delay (seconds)\t0.5\r\n--timeout\tRequest timeout (seconds)\t30\r\n--retries\tMax retry attempts\t3\r\n--max-size\tMax file size (MB)\t100\r\n--min-size\tMin file size (KB)\t1\r\n--include\tInclude URL patterns (regex)\tEmpty\r\n--exclude\tExclude URL patterns (regex)\tlogout,admin,login\r\n--dry-run\tSimulation mode (no download)\tFalse\r\n-v, --verbose\tVerbose output and error reporting\tFalse\r\nExample Output\r\ntext\r\n2023-10-15 14:30:25 [INFO] Starting crawl: https://example.com/audio-page\r\n2023-10-15 14:30:26 [DEBUG] Parsing page (depth=0): https://example.com/audio-page\r\n2023-10-15 14:30:27 [INFO] Audio found: https://example.com/audio/sample1.mp3\r\n2023-10-15 14:30:28 [INFO] Download successful: my_audios/example_com/sample1.mp3\r\n...\r\n2023-10-15 14:31:05 [INFO] Spider closed\r\n==================================================\r\nScraping Summary\r\n==================================================\r\nWebsite: https://example.com/audio-page\r\nOutput Directory: /path/to/my_audios\r\nTotal Pages Crawled: 42\r\nAudio Files Found: 15\r\nAudio Files Downloaded: 12\r\nAudio Files Skipped: 3\r\nErrors Encountered: 0\r\nTotal Download Size: 245.7 MB\r\nContribution Guidelines\r\nFork the repository\r\n\r\nCreate your feature branch (git checkout -b feature/your-feature)\r\n\r\nCommit your changes (git commit -am 'Add some feature')\r\n\r\nPush to the branch (git push origin feature/your-feature)\r\n\r\nCreate a Pull Request\r\n\r\nLicense\r\nThis project is licensed under the MIT License - see the LICENSE file for details.\r\n\r\nContact\r\nFor issues or suggestions: support@hget-audio.example\r\n\r\n[\u4e2d\u6587]\r\n\r\n\u5168\u9762\u7684\u9519\u8bef\u5904\u7406\r\nhget-audio \u5728\u6574\u4e2a\u5e94\u7528\u7a0b\u5e8f\u4e2d\u5b9e\u73b0\u4e86\u5f3a\u5927\u7684\u9519\u8bef\u5904\u7406\u673a\u5236\u3002\u5f53\u53d1\u751f\u9519\u8bef\u65f6\uff1a\r\n\r\n\u975e\u8be6\u7ec6\u6a21\u5f0f\uff08\u9ed8\u8ba4\uff09:\r\n\r\n\u6355\u83b7\u6240\u6709\u5f02\u5e38\u5e76\u663e\u793a\u7528\u6237\u53cb\u597d\u7684\u6d88\u606f\r\n\r\n\u5efa\u8bae\u4f7f\u7528 --verbose \u53c2\u6570\u83b7\u53d6\u8be6\u7ec6\u9519\u8bef\u4fe1\u606f\r\n\r\n\u63d0\u4f9b\u552f\u4e00\u7684\u9519\u8bef\u4ee3\u7801\u4f9b\u53c2\u8003\r\n\r\n\u5c06\u5b8c\u6574\u9519\u8bef\u8be6\u60c5\u8bb0\u5f55\u5230\u6587\u4ef6\u4ee5\u4f9b\u540e\u7eed\u5206\u6790\r\n\r\n\u8be6\u7ec6\u6a21\u5f0f (--verbose):\r\n\r\n\u663e\u793a\u5b8c\u6574\u7684\u9519\u8bef\u8ddf\u8e2a\u4fe1\u606f\r\n\r\n\u663e\u793a\u5185\u90e8\u72b6\u6001\u4fe1\u606f\u7528\u4e8e\u8c03\u8bd5\r\n\r\n\u5305\u542b\u989d\u5916\u7684\u8bca\u65ad\u6570\u636e\r\n\r\n\u4e0d\u6355\u83b7\u5f02\u5e38 - \u5141\u8bb8\u9519\u8bef\u5b8c\u5168\u4f20\u64ad\r\n\r\n\u9519\u8bef\u5904\u7406\u793a\u4f8b\r\n\u4e0d\u4f7f\u7528\u8be6\u7ec6\u6807\u5fd7:\r\n\r\ntext\r\n2023-10-15 14:30:25 [ERROR] \u4e0b\u8f7d\u5931\u8d25 (\u9519\u8bef\u4ee3\u7801: DL-102)\r\n\u9519\u8bef: \u4e0b\u8f7d\u97f3\u9891\u6587\u4ef6\u65f6\u8fde\u63a5\u8d85\u65f6\r\n\u89e3\u51b3\u65b9\u6848: \u5c1d\u8bd5\u4f7f\u7528 --timeout \u9009\u9879\u589e\u52a0\u8d85\u65f6\u65f6\u95f4\r\n\u66f4\u591a\u8be6\u60c5\u8bf7\u4f7f\u7528 --verbose \u53c2\u6570\u8fd0\u884c\u6216\u67e5\u770b\u9519\u8bef\u65e5\u5fd7: errors_20231015_143025.log\r\n\u4f7f\u7528\u8be6\u7ec6\u6807\u5fd7:\r\n\r\ntext\r\n2023-10-15 14:30:25 [ERROR] \u5b8c\u6574\u9519\u8bef\u8ddf\u8e2a:\r\n File \"/path/to/hget_audio/pipelines.py\", line 215, in media_downloaded\r\n response = super().media_downloaded(response, request, info, item=item)\r\n File \"/path/to/scrapy/pipelines/files.py\", line 320, in media_downloaded\r\n raise FileException(\"\u8fde\u63a5\u8d85\u65f6\")\r\n \r\nscrapy.exceptions.FileException: \u8fde\u63a5\u8d85\u65f6\r\n\r\n\u8bf7\u6c42\u8be6\u60c5:\r\n- URL: https://example.com/audio/large.mp3\r\n- \u6765\u6e90\u9875\u9762: https://example.com/audio-page\r\n- \u5927\u5c0f: 150 MB (\u8d85\u8fc7\u6700\u5927 100 MB \u9650\u5236)\r\n- \u683c\u5f0f: audio/mpeg\r\n- \u91cd\u8bd5\u6b21\u6570: 2/3\r\n\r\n\u7cfb\u7edf\u4fe1\u606f:\r\n- Python: 3.9.12\r\n- Scrapy: 2.7.1\r\n- \u5e73\u53f0: Linux-5.15.0-86-generic-x86_64-with-glibc2.31\r\n\u9519\u8bef\u4ee3\u7801\u53c2\u8003\r\n\u4ee3\u7801\u8303\u56f4\t\u9519\u8bef\u7c7b\u578b\t\u793a\u4f8b\u4ee3\u7801\r\n100-199\t\u7f51\u7edc\u9519\u8bef\t101: \u8fde\u63a5\u9519\u8bef, 102: \u8d85\u65f6\r\n200-299\t\u6587\u4ef6\u9a8c\u8bc1\u9519\u8bef\t201: \u65e0\u6548\u7c7b\u578b, 202: \u5927\u5c0f\u4e0d\u7b26\r\n300-399\t\u914d\u7f6e\u9519\u8bef\t301: \u65e0\u6548URL, 302: \u65e0\u6548\u6df1\u5ea6\r\n400-499\t\u6293\u53d6\u9519\u8bef\t401: \u89e3\u6790\u9519\u8bef, 402: \u722c\u866b\u9519\u8bef\r\n500-599\t\u7cfb\u7edf\u9519\u8bef\t501: \u78c1\u76d8\u5df2\u6ee1, 502: \u6743\u9650\u9519\u8bef\r\n\u9519\u8bef\u65e5\u5fd7\u8bb0\u5f55\r\n\u6240\u6709\u9519\u8bef\u90fd\u8bb0\u5f55\u5728 error_logs \u76ee\u5f55\u7684\u65f6\u95f4\u6233\u6587\u4ef6\u4e2d\uff1a\r\n\r\ntext\r\nerror_logs/\r\n\u251c\u2500\u2500 errors_20231015_143025.log\r\n\u251c\u2500\u2500 errors_20231016_093412.log\r\n\u2514\u2500\u2500 errors_20231017_154723.log\r\n\u6bcf\u4e2a\u65e5\u5fd7\u6587\u4ef6\u5305\u542b\uff1a\r\n\r\n\u5b8c\u6574\u7684\u9519\u8bef\u8ddf\u8e2a\u4fe1\u606f\r\n\r\n\u8bf7\u6c42\u548c\u54cd\u5e94\u8be6\u60c5\r\n\r\n\u7cfb\u7edf\u73af\u5883\u4fe1\u606f\r\n\r\n\u9519\u8bef\u53d1\u751f\u65f6\u7684\u914d\u7f6e\u8bbe\u7f6e\r\n\r\n\u5185\u5b58\u4f7f\u7528\u7edf\u8ba1\r\n\r\n\u5b89\u88c5\r\n\u4f7f\u7528 pip \u5b89\u88c5\r\nbash\r\npip install hget-audio\r\n\u4ece\u6e90\u7801\u5b89\u88c5\r\nbash\r\ngit clone https://github.com/hyy-PROG/hget_audio.git\r\ncd hget-audio\r\npip install .\r\n\u547d\u4ee4\u884c\u4f7f\u7528\r\n\u57fa\u672c\u547d\u4ee4\r\nbash\r\nhget-audio \"https://example.com/audio-page\" -o \"my_audios\"\r\n\u9ad8\u7ea7\u9009\u9879\r\nbash\r\nhget-audio \"https://example.com\" \\\r\n -d 3 \\\r\n -c 8 \\\r\n -f \"mp3,wav\" \\\r\n --exclude \"admin,private\" \\\r\n --max-size 50 \\\r\n --timeout 30 \\\r\n --retries 3 \\\r\n -o \"filtered_audios\"\r\n\u5b8c\u6574\u9009\u9879\r\nbash\r\nhget-audio --help\r\nAPI \u4f7f\u7528\r\npython\r\nfrom hget_audio.api import download_audio\r\n\r\n# \u4e0b\u8f7d\u7f51\u7ad9\u97f3\u9891\r\nresult = download_audio(\r\n url=\"https://example.com/audio-page\",\r\n output_dir=\"my_audios\",\r\n depth=2,\r\n formats=\"mp3,wav\",\r\n verbose=True # \u542f\u7528\u8be6\u7ec6\u9519\u8bef\u62a5\u544a\r\n)\r\n\r\nprint(f\"\u4e0b\u8f7d\u4e86 {result['audio_downloaded']} \u4e2a\u97f3\u9891\u6587\u4ef6\")\r\nprint(f\"\u603b\u5927\u5c0f: {result['total_size'] / (1024*1024):.2f} MB\")\r\n\u914d\u7f6e\u9009\u9879\r\n\u9009\u9879\t\u63cf\u8ff0\t\u9ed8\u8ba4\u503c\r\n-o, --output\t\u8f93\u51fa\u76ee\u5f55\thget.output\r\n-d, --depth\t\u722c\u53d6\u6df1\u5ea6\t2\r\n-c, --concurrency\t\u5e76\u53d1\u8bf7\u6c42\u6570\t16\r\n-f, --formats\t\u97f3\u9891\u683c\u5f0f (\u9017\u53f7\u5206\u9694)\tmp3,wav,ogg,m4a,flac,aac\r\n--ignore-robots\t\u5ffd\u7565 robots.txt \u89c4\u5219\tFalse\r\n--user-agent\t\u81ea\u5b9a\u4e49 User-Agent\t\u9ed8\u8ba4 UA\r\n--delay\t\u8bf7\u6c42\u5ef6\u8fdf (\u79d2)\t0.5\r\n--timeout\t\u8bf7\u6c42\u8d85\u65f6\u65f6\u95f4 (\u79d2)\t30\r\n--retries\t\u6700\u5927\u91cd\u8bd5\u6b21\u6570\t3\r\n--max-size\t\u6700\u5927\u6587\u4ef6\u5927\u5c0f (MB)\t100\r\n--min-size\t\u6700\u5c0f\u6587\u4ef6\u5927\u5c0f (KB)\t1\r\n--include\t\u5305\u542b\u7684 URL \u6a21\u5f0f (\u6b63\u5219)\t\u7a7a\r\n--exclude\t\u6392\u9664\u7684 URL \u6a21\u5f0f (\u6b63\u5219)\tlogout,admin,login\r\n--dry-run\t\u6a21\u62df\u8fd0\u884c\u6a21\u5f0f (\u4e0d\u4e0b\u8f7d)\tFalse\r\n-v, --verbose\t\u8be6\u7ec6\u8f93\u51fa\u548c\u9519\u8bef\u62a5\u544a\tFalse\r\n\u793a\u4f8b\u8f93\u51fa\r\ntext\r\n2023-10-15 14:30:25 [INFO] \u5f00\u59cb\u722c\u53d6: https://example.com/audio-page\r\n2023-10-15 14:30:26 [DEBUG] \u89e3\u6790\u9875\u9762 (depth=0): https://example.com/audio-page\r\n2023-10-15 14:30:27 [INFO] \u53d1\u73b0\u97f3\u9891: https://example.com/audio/sample1.mp3\r\n2023-10-15 14:30:28 [INFO] \u4e0b\u8f7d\u6210\u529f: my_audios/example_com/sample1.mp3\r\n...\r\n2023-10-15 14:31:05 [INFO] \u722c\u866b\u7ed3\u675f\r\n==================================================\r\n\u722c\u53d6\u7edf\u8ba1\r\n==================================================\r\n\u7f51\u7ad9: https://example.com/audio-page\r\n\u8f93\u51fa\u76ee\u5f55: /path/to/my_audios\r\n\u722c\u53d6\u9875\u9762: 42\r\n\u53d1\u73b0\u97f3\u9891: 15\r\n\u4e0b\u8f7d\u97f3\u9891: 12\r\n\u8df3\u8fc7\u97f3\u9891: 3\r\n\u9519\u8bef: 0\r\n\u603b\u4e0b\u8f7d\u5927\u5c0f: 245.7 MB\r\n\u8d21\u732e\u6307\u5357\r\nFork \u9879\u76ee\u4ed3\u5e93\r\n\r\n\u521b\u5efa\u7279\u6027\u5206\u652f (git checkout -b feature/your-feature)\r\n\r\n\u63d0\u4ea4\u66f4\u6539 (git commit -am '\u6dfb\u52a0\u65b0\u529f\u80fd')\r\n\r\n\u63a8\u9001\u5230\u5206\u652f (git push origin feature/your-feature)\r\n\r\n\u521b\u5efa Pull Request\r\n\r\n\u8bb8\u53ef\u8bc1\r\n\u672c\u9879\u76ee\u91c7\u7528 MIT \u8bb8\u53ef\u8bc1 - \u8be6\u60c5\u8bf7\u89c1 LICENSE \u6587\u4ef6\u3002\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Comprehensive audio scraping tool for websites.",
"version": "2025.7.24a0",
"project_urls": {
"Bug Tracker": "https://github.com/hyy-PROG/hget_audio/issues",
"Documentation": "https://github.com/hyy-PROG/hget_audio/wiki",
"Homepage": "https://github.com/hyy-PROG/hget_audio",
"Source Code": "https://github.com/hyy-PROG/hget_audio"
},
"split_keywords": [
"audio",
"scraping",
"downloader",
"web",
"crawler",
"mp3",
"wav",
"ogg",
"podcast"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0ac7038ae863255cfc2e16dd13cc9011f9bdad06b1190c4005030cc9ce7cc8f9",
"md5": "9391bd3fb75a896da44a1b6078ba2c50",
"sha256": "e42ab04fb07f799376f367cd1c69c7ad2e7779666e83d8462324e53476df3228"
},
"downloads": -1,
"filename": "hget_audio-2025.7.24a0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9391bd3fb75a896da44a1b6078ba2c50",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 24582,
"upload_time": "2025-07-24T12:47:04",
"upload_time_iso_8601": "2025-07-24T12:47:04.138162Z",
"url": "https://files.pythonhosted.org/packages/0a/c7/038ae863255cfc2e16dd13cc9011f9bdad06b1190c4005030cc9ce7cc8f9/hget_audio-2025.7.24a0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cdbd61e4329a920860839233a0b4843c7e121e6363cb568271dd3254e8ea6449",
"md5": "0063a86ca3c5c4f4261a0b03ce4867f2",
"sha256": "fba6118de4d90c8fa04c3bcab1b97542fc03cd3333962de98993d0a8ed1ddafc"
},
"downloads": -1,
"filename": "hget_audio-2025.7.24a0.tar.gz",
"has_sig": false,
"md5_digest": "0063a86ca3c5c4f4261a0b03ce4867f2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 25754,
"upload_time": "2025-07-24T12:47:05",
"upload_time_iso_8601": "2025-07-24T12:47:05.540158Z",
"url": "https://files.pythonhosted.org/packages/cd/bd/61e4329a920860839233a0b4843c7e121e6363cb568271dd3254e8ea6449/hget_audio-2025.7.24a0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-24 12:47:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hyy-PROG",
"github_project": "hget_audio",
"github_not_found": true,
"lcname": "hget-audio"
}