# Miniapp Scraping v1.4.0
A lightweight, efficient Python tool for web scraping press releases with advanced email extraction capabilities.
## 🎯 Features v1.4.0
### ✨ Core Functionality
- **📧 Email Extraction**: Advanced email address detection from press releases
- **🔍 Contact Information**: Extract phone numbers, FAX, and company details
- **🚀 Pure Python**: Uses only requests + BeautifulSoup4 (no external APIs)
- **⚡ Lightweight**: Minimal dependencies, fast execution
- **🛠️ CLI Interface**: Easy command-line integration
### 🏆 Proven Technology
- **✅ Real Web Scraping**: Tested on actual PRTimes pages
- **📧 Email Success Rate**: Validated email extraction patterns
- **🐍 Pure Python**: No browser automation or heavy frameworks
- **💾 JSON Output**: Structured data format for easy integration
## Version Information
- **Version**: 1.4.0 - PyPI Release
- **Released**: September 2025
- **Language**: Python 3.8+
- **Dependencies**: 3 core packages only
## 🚀 Quick Start
### Installation
```bash
pip install miniapp-scraping
```
### Command Line Usage
```bash
# Basic scraping
miniapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html
# Verbose output with custom filename
miniapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html -v -o my_data.json
```
### Python Integration
```python
from miniapp_scraping import MiniappScraper
# Initialize scraper
scraper = MiniappScraper()
# Scrape a press release
url = "https://prtimes.jp/main/html/rd/p/000000001.000000001.html"
result = scraper.scrape(url)
# Access extracted data
print(f"Title: {result['title']}")
print(f"Company: {result['company']}")
print(f"Emails found: {result['emails']}")
# Save to JSON
scraper.save_json(result, "output.json")
```
## 📊 Data Structure
### Extracted Information
```json
{
"url": "https://prtimes.jp/...",
"title": "Press Release Title",
"company": "Company Name Ltd.",
"date": "2025年9月10日",
"content": "Full press release content...",
"emails": ["contact@company.com", "info@company.com"],
"scraped_at": "2025-09-10T12:00:00",
"version": "1.4.0"
}
```
## 🔧 Advanced Usage
### CLI Options
```bash
# Help
miniapp-scraping --help
# Verbose mode
miniapp-scraping URL --verbose
# Custom output file
miniapp-scraping URL --output custom_name.json
```
### Python API
```python
from miniapp_scraping import MiniappScraper
scraper = MiniappScraper()
# Scrape multiple URLs
urls = [
"https://prtimes.jp/main/html/rd/p/000000001.000000001.html",
"https://prtimes.jp/main/html/rd/p/000000002.000000001.html"
]
results = []
for url in urls:
data = scraper.scrape(url)
if data:
results.append(data)
# Batch save
for i, result in enumerate(results):
scraper.save_json(result, f"release_{i+1}.json")
```
## 📁 Project Structure
```
miniapp_scraping/
├── __init__.py # Package initialization
├── scraper.py # Core scraping functionality
└── cli.py # Command-line interface
```
## 🧪 Tested Email Patterns
### Supported Formats
- **Context emails**: お問い合わせ:contact@example.com
- **Standard format**: info@company.co.jp
- **Special purpose**: press@, support@, inquiry@
- **HTML entities**: Encoded email addresses
### Validation
- ✅ RFC-compliant email format checking
- ✅ Duplicate removal
- ✅ Priority-based extraction (contact info first)
## 🎯 Use Cases
### Business Applications
- **Media Monitoring**: Track company announcements
- **Lead Generation**: Extract contact information for outreach
- **Market Research**: Analyze industry press releases
- **Journalism**: Gather information for news stories
### Technical Integration
- **Data Pipelines**: Integrate with existing workflows
- **Automation**: Schedule regular scraping tasks
- **Analysis**: Feed data into business intelligence tools
- **Archives**: Build press release databases
## ⚙️ System Requirements
- **Python**: 3.8 or higher
- **OS**: Windows, macOS, Linux
- **Internet**: Required for scraping
- **Memory**: Minimal (< 50MB typical usage)
## 🔄 Changelog
### v1.4.0 (September 2025) - PyPI Release
- **🎯 PyPI Publication**: Official package release
- **📧 Enhanced Email Extraction**: Multi-pattern detection
- **🛠️ CLI Interface**: Command-line tool included
- **⚡ Performance**: Optimized for speed and efficiency
- **📝 Documentation**: Comprehensive English documentation
### Previous Versions
- v1.3.0: Email extraction enhancement
- v1.2.1: JSONL database integration
- v1.1.0: Structured data output
## 🛠️ Development
### Local Development
```bash
# Clone repository
git clone https://github.com/TECHFUND/miniapp-scraping
cd miniapp-scraping
# Install in development mode
pip install -e .
# Run tests
python -m pytest
```
### Building Package
```bash
# Build distribution
python setup.py sdist bdist_wheel
# Upload to PyPI (maintainers only)
twine upload dist/*
```
## 📞 Support & Contributing
- **GitHub Issues**: [Report bugs or request features](https://github.com/TECHFUND/miniapp-scraping/issues)
- **Documentation**: [Full documentation](https://github.com/TECHFUND/miniapp-scraping#readme)
- **Contributing**: Pull requests welcome
## 📋 License
MIT License - see [LICENSE](LICENSE) file for details.
## ⚠️ Legal Notice
This tool is designed for legitimate research and business purposes. Please respect website terms of service and robots.txt when using this scraper.
---
**✨ v1.4.0 Highlights**: Production-ready PyPI package with enhanced email extraction and CLI interface for seamless integration into any workflow.
# Miniapp Scraping v1.4.0
プレスリリースを効率的にスクレイピングし、メールアドレスの抽出に特化した軽量Pythonツールです。
## 🎯 特徴 v1.4.0
### ✨ 主要機能
- **📧 メールアドレス抽出**: プレスリリースから高精度でメール抽出
- **🔍 連絡先情報**: 電話番号・FAX・企業詳細も自動取得
- **🚀 純粋Python**: requests + BeautifulSoup4のみ使用
- **⚡ 軽量設計**: 最小限の依存関係、高速実行
- **🛠️ CLI対応**: コマンドライン統合が簡単
### 🏆 実証済み技術
- **✅ リアルWebスクレイピング**: 実際のPRTimesページでテスト済み
- **📧 メール抽出精度**: 検証済みパターンマッチング
- **🐍 Pure Python**: ブラウザ自動化や重いフレームワーク不要
- **💾 JSON出力**: 構造化データで簡単統合
## バージョン情報
- **Version**: 1.4.0 - PyPI公開版
- **Released**: 2025年9月
- **Language**: Python 3.8+
- **Dependencies**: 核となる3パッケージのみ
## 🚀 クイックスタート
### インストール
```bash
pip install miniapp-scraping
```
### コマンドライン使用
```bash
# 基本的なスクレイピング
miniapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html
# 詳細出力でカスタムファイル名指定
miniapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html -v -o my_data.json
```
### Python統合
```python
from miniapp_scraping import MiniappScraper
# スクレイパー初期化
scraper = MiniappScraper()
# プレスリリース取得
url = "https://prtimes.jp/main/html/rd/p/000000001.000000001.html"
result = scraper.scrape(url)
# 抽出データアクセス
print(f"タイトル: {result['title']}")
print(f"会社名: {result['company']}")
print(f"メール発見数: {result['emails']}")
# JSON保存
scraper.save_json(result, "output.json")
```
## 🔧 高度な使用方法
### CLIオプション
```bash
# ヘルプ
miniapp-scraping --help
# 詳細モード
miniapp-scraping URL --verbose
# カスタム出力ファイル
miniapp-scraping URL --output custom_name.json
```
## 📞 サポート・貢献
- **GitHub Issues**: [バグレポートや機能リクエスト](https://github.com/TECHFUND/miniapp-scraping/issues)
- **ドキュメント**: [完全ドキュメント](https://github.com/TECHFUND/miniapp-scraping#readme)
- **貢献**: プルリクエスト歓迎
---
**✨ v1.4.0の特徴**: メール抽出とCLIインターフェースを強化した本格的なPyPIパッケージ、あらゆるワークフローへのシームレスな統合が可能。
# Miniapp Scraping v1.4.0
一个轻量级、高效的Python工具,用于抓取新闻稿并提取电子邮件地址。
## 🎯 特性 v1.4.0
### ✨ 核心功能
- **📧 电子邮件提取**: 从新闻稿中高精度提取电子邮件
- **🔍 联系信息**: 自动获取电话号码、传真、公司详情
- **🚀 纯Python**: 仅使用requests + BeautifulSoup4
- **⚡ 轻量设计**: 最小依赖关系,快速执行
- **🛠️ CLI支持**: 轻松命令行集成
## 🚀 快速开始
### 安装
```bash
pip install miniapp-scraping
```
### 命令行使用
```bash
# 基本抓取
miniapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html
# 详细输出自定义文件名
miniapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html -v -o my_data.json
```
---
**✨ v1.4.0亮点**: 生产就绪的PyPI包,增强的电子邮件提取和CLI界面,可无缝集成到任何工作流程中。
Raw data
{
"_id": null,
"home_page": "https://github.com/TECHFUND/miniapp-scraping",
"name": "miniapp-scraping",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "scraping, miniapp, press-release, email-extraction, web-scraping, data-extraction, journalism, media, automation",
"author": "TECHFUND Development Team",
"author_email": "dev@techfund.jp",
"download_url": "https://files.pythonhosted.org/packages/80/2a/4da1ee3ccc08c85efec360022031a514b1ac0f601225bc2b7486c76a5c81/miniapp_scraping-1.4.0.tar.gz",
"platform": null,
"description": "# Miniapp Scraping v1.4.0\n\nA lightweight, efficient Python tool for web scraping press releases with advanced email extraction capabilities.\n\n## \ud83c\udfaf Features v1.4.0\n\n### \u2728 Core Functionality\n- **\ud83d\udce7 Email Extraction**: Advanced email address detection from press releases\n- **\ud83d\udd0d Contact Information**: Extract phone numbers, FAX, and company details \n- **\ud83d\ude80 Pure Python**: Uses only requests + BeautifulSoup4 (no external APIs)\n- **\u26a1 Lightweight**: Minimal dependencies, fast execution\n- **\ud83d\udee0\ufe0f CLI Interface**: Easy command-line integration\n\n### \ud83c\udfc6 Proven Technology\n- **\u2705 Real Web Scraping**: Tested on actual PRTimes pages\n- **\ud83d\udce7 Email Success Rate**: Validated email extraction patterns\n- **\ud83d\udc0d Pure Python**: No browser automation or heavy frameworks\n- **\ud83d\udcbe JSON Output**: Structured data format for easy integration\n\n## Version Information\n- **Version**: 1.4.0 - PyPI Release\n- **Released**: September 2025\n- **Language**: Python 3.8+\n- **Dependencies**: 3 core packages only\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n```bash\npip install miniapp-scraping\n```\n\n### Command Line Usage\n```bash\n# Basic scraping\nminiapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html\n\n# Verbose output with custom filename\nminiapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html -v -o my_data.json\n```\n\n### Python Integration\n```python\nfrom miniapp_scraping import MiniappScraper\n\n# Initialize scraper\nscraper = MiniappScraper()\n\n# Scrape a press release\nurl = \"https://prtimes.jp/main/html/rd/p/000000001.000000001.html\"\nresult = scraper.scrape(url)\n\n# Access extracted data\nprint(f\"Title: {result['title']}\")\nprint(f\"Company: {result['company']}\")\nprint(f\"Emails found: {result['emails']}\")\n\n# Save to JSON\nscraper.save_json(result, \"output.json\")\n```\n\n## \ud83d\udcca Data Structure\n\n### Extracted Information\n```json\n{\n \"url\": \"https://prtimes.jp/...\",\n \"title\": \"Press Release Title\",\n \"company\": \"Company Name Ltd.\",\n \"date\": \"2025\u5e749\u670810\u65e5\",\n \"content\": \"Full press release content...\",\n \"emails\": [\"contact@company.com\", \"info@company.com\"],\n \"scraped_at\": \"2025-09-10T12:00:00\",\n \"version\": \"1.4.0\"\n}\n```\n\n## \ud83d\udd27 Advanced Usage\n\n### CLI Options\n```bash\n# Help\nminiapp-scraping --help\n\n# Verbose mode\nminiapp-scraping URL --verbose\n\n# Custom output file\nminiapp-scraping URL --output custom_name.json\n```\n\n### Python API\n```python\nfrom miniapp_scraping import MiniappScraper\n\nscraper = MiniappScraper()\n\n# Scrape multiple URLs\nurls = [\n \"https://prtimes.jp/main/html/rd/p/000000001.000000001.html\",\n \"https://prtimes.jp/main/html/rd/p/000000002.000000001.html\"\n]\n\nresults = []\nfor url in urls:\n data = scraper.scrape(url)\n if data:\n results.append(data)\n\n# Batch save\nfor i, result in enumerate(results):\n scraper.save_json(result, f\"release_{i+1}.json\")\n```\n\n## \ud83d\udcc1 Project Structure\n\n```\nminiapp_scraping/\n\u251c\u2500\u2500 __init__.py # Package initialization\n\u251c\u2500\u2500 scraper.py # Core scraping functionality \n\u2514\u2500\u2500 cli.py # Command-line interface\n```\n\n## \ud83e\uddea Tested Email Patterns\n\n### Supported Formats\n- **Context emails**: \u304a\u554f\u3044\u5408\u308f\u305b\uff1acontact@example.com\n- **Standard format**: info@company.co.jp\n- **Special purpose**: press@, support@, inquiry@\n- **HTML entities**: Encoded email addresses\n\n### Validation\n- \u2705 RFC-compliant email format checking\n- \u2705 Duplicate removal\n- \u2705 Priority-based extraction (contact info first)\n\n## \ud83c\udfaf Use Cases\n\n### Business Applications\n- **Media Monitoring**: Track company announcements\n- **Lead Generation**: Extract contact information for outreach\n- **Market Research**: Analyze industry press releases\n- **Journalism**: Gather information for news stories\n\n### Technical Integration\n- **Data Pipelines**: Integrate with existing workflows\n- **Automation**: Schedule regular scraping tasks\n- **Analysis**: Feed data into business intelligence tools\n- **Archives**: Build press release databases\n\n## \u2699\ufe0f System Requirements\n\n- **Python**: 3.8 or higher\n- **OS**: Windows, macOS, Linux\n- **Internet**: Required for scraping\n- **Memory**: Minimal (< 50MB typical usage)\n\n## \ud83d\udd04 Changelog\n\n### v1.4.0 (September 2025) - PyPI Release\n- **\ud83c\udfaf PyPI Publication**: Official package release\n- **\ud83d\udce7 Enhanced Email Extraction**: Multi-pattern detection\n- **\ud83d\udee0\ufe0f CLI Interface**: Command-line tool included\n- **\u26a1 Performance**: Optimized for speed and efficiency\n- **\ud83d\udcdd Documentation**: Comprehensive English documentation\n\n### Previous Versions\n- v1.3.0: Email extraction enhancement\n- v1.2.1: JSONL database integration\n- v1.1.0: Structured data output\n\n## \ud83d\udee0\ufe0f Development\n\n### Local Development\n```bash\n# Clone repository\ngit clone https://github.com/TECHFUND/miniapp-scraping\ncd miniapp-scraping\n\n# Install in development mode\npip install -e .\n\n# Run tests\npython -m pytest\n```\n\n### Building Package\n```bash\n# Build distribution\npython setup.py sdist bdist_wheel\n\n# Upload to PyPI (maintainers only)\ntwine upload dist/*\n```\n\n## \ud83d\udcde Support & Contributing\n\n- **GitHub Issues**: [Report bugs or request features](https://github.com/TECHFUND/miniapp-scraping/issues)\n- **Documentation**: [Full documentation](https://github.com/TECHFUND/miniapp-scraping#readme)\n- **Contributing**: Pull requests welcome\n\n## \ud83d\udccb License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## \u26a0\ufe0f Legal Notice\n\nThis tool is designed for legitimate research and business purposes. Please respect website terms of service and robots.txt when using this scraper.\n\n---\n\n**\u2728 v1.4.0 Highlights**: Production-ready PyPI package with enhanced email extraction and CLI interface for seamless integration into any workflow.\n\n# Miniapp Scraping v1.4.0\n\n\u30d7\u30ec\u30b9\u30ea\u30ea\u30fc\u30b9\u3092\u52b9\u7387\u7684\u306b\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3057\u3001\u30e1\u30fc\u30eb\u30a2\u30c9\u30ec\u30b9\u306e\u62bd\u51fa\u306b\u7279\u5316\u3057\u305f\u8efd\u91cfPython\u30c4\u30fc\u30eb\u3067\u3059\u3002\n\n## \ud83c\udfaf \u7279\u5fb4 v1.4.0\n\n### \u2728 \u4e3b\u8981\u6a5f\u80fd\n- **\ud83d\udce7 \u30e1\u30fc\u30eb\u30a2\u30c9\u30ec\u30b9\u62bd\u51fa**: \u30d7\u30ec\u30b9\u30ea\u30ea\u30fc\u30b9\u304b\u3089\u9ad8\u7cbe\u5ea6\u3067\u30e1\u30fc\u30eb\u62bd\u51fa\n- **\ud83d\udd0d \u9023\u7d61\u5148\u60c5\u5831**: \u96fb\u8a71\u756a\u53f7\u30fbFAX\u30fb\u4f01\u696d\u8a73\u7d30\u3082\u81ea\u52d5\u53d6\u5f97\n- **\ud83d\ude80 \u7d14\u7c8bPython**: requests + BeautifulSoup4\u306e\u307f\u4f7f\u7528\n- **\u26a1 \u8efd\u91cf\u8a2d\u8a08**: \u6700\u5c0f\u9650\u306e\u4f9d\u5b58\u95a2\u4fc2\u3001\u9ad8\u901f\u5b9f\u884c\n- **\ud83d\udee0\ufe0f CLI\u5bfe\u5fdc**: \u30b3\u30de\u30f3\u30c9\u30e9\u30a4\u30f3\u7d71\u5408\u304c\u7c21\u5358\n\n### \ud83c\udfc6 \u5b9f\u8a3c\u6e08\u307f\u6280\u8853\n- **\u2705 \u30ea\u30a2\u30ebWeb\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0**: \u5b9f\u969b\u306ePRTimes\u30da\u30fc\u30b8\u3067\u30c6\u30b9\u30c8\u6e08\u307f\n- **\ud83d\udce7 \u30e1\u30fc\u30eb\u62bd\u51fa\u7cbe\u5ea6**: \u691c\u8a3c\u6e08\u307f\u30d1\u30bf\u30fc\u30f3\u30de\u30c3\u30c1\u30f3\u30b0\n- **\ud83d\udc0d Pure Python**: \u30d6\u30e9\u30a6\u30b6\u81ea\u52d5\u5316\u3084\u91cd\u3044\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u4e0d\u8981\n- **\ud83d\udcbe JSON\u51fa\u529b**: \u69cb\u9020\u5316\u30c7\u30fc\u30bf\u3067\u7c21\u5358\u7d71\u5408\n\n## \u30d0\u30fc\u30b8\u30e7\u30f3\u60c5\u5831\n- **Version**: 1.4.0 - PyPI\u516c\u958b\u7248\n- **Released**: 2025\u5e749\u6708\n- **Language**: Python 3.8+\n- **Dependencies**: \u6838\u3068\u306a\u308b3\u30d1\u30c3\u30b1\u30fc\u30b8\u306e\u307f\n\n## \ud83d\ude80 \u30af\u30a4\u30c3\u30af\u30b9\u30bf\u30fc\u30c8\n\n### \u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\n```bash\npip install miniapp-scraping\n```\n\n### \u30b3\u30de\u30f3\u30c9\u30e9\u30a4\u30f3\u4f7f\u7528\n```bash\n# \u57fa\u672c\u7684\u306a\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\nminiapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html\n\n# \u8a73\u7d30\u51fa\u529b\u3067\u30ab\u30b9\u30bf\u30e0\u30d5\u30a1\u30a4\u30eb\u540d\u6307\u5b9a\nminiapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html -v -o my_data.json\n```\n\n### Python\u7d71\u5408\n```python\nfrom miniapp_scraping import MiniappScraper\n\n# \u30b9\u30af\u30ec\u30a4\u30d1\u30fc\u521d\u671f\u5316\nscraper = MiniappScraper()\n\n# \u30d7\u30ec\u30b9\u30ea\u30ea\u30fc\u30b9\u53d6\u5f97\nurl = \"https://prtimes.jp/main/html/rd/p/000000001.000000001.html\"\nresult = scraper.scrape(url)\n\n# \u62bd\u51fa\u30c7\u30fc\u30bf\u30a2\u30af\u30bb\u30b9\nprint(f\"\u30bf\u30a4\u30c8\u30eb: {result['title']}\")\nprint(f\"\u4f1a\u793e\u540d: {result['company']}\")\nprint(f\"\u30e1\u30fc\u30eb\u767a\u898b\u6570: {result['emails']}\")\n\n# JSON\u4fdd\u5b58\nscraper.save_json(result, \"output.json\")\n```\n\n## \ud83d\udd27 \u9ad8\u5ea6\u306a\u4f7f\u7528\u65b9\u6cd5\n\n### CLI\u30aa\u30d7\u30b7\u30e7\u30f3\n```bash\n# \u30d8\u30eb\u30d7\nminiapp-scraping --help\n\n# \u8a73\u7d30\u30e2\u30fc\u30c9\nminiapp-scraping URL --verbose\n\n# \u30ab\u30b9\u30bf\u30e0\u51fa\u529b\u30d5\u30a1\u30a4\u30eb\nminiapp-scraping URL --output custom_name.json\n```\n\n## \ud83d\udcde \u30b5\u30dd\u30fc\u30c8\u30fb\u8ca2\u732e\n\n- **GitHub Issues**: [\u30d0\u30b0\u30ec\u30dd\u30fc\u30c8\u3084\u6a5f\u80fd\u30ea\u30af\u30a8\u30b9\u30c8](https://github.com/TECHFUND/miniapp-scraping/issues)\n- **\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8**: [\u5b8c\u5168\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8](https://github.com/TECHFUND/miniapp-scraping#readme)\n- **\u8ca2\u732e**: \u30d7\u30eb\u30ea\u30af\u30a8\u30b9\u30c8\u6b53\u8fce\n\n---\n\n**\u2728 v1.4.0\u306e\u7279\u5fb4**: \u30e1\u30fc\u30eb\u62bd\u51fa\u3068CLI\u30a4\u30f3\u30bf\u30fc\u30d5\u30a7\u30fc\u30b9\u3092\u5f37\u5316\u3057\u305f\u672c\u683c\u7684\u306aPyPI\u30d1\u30c3\u30b1\u30fc\u30b8\u3001\u3042\u3089\u3086\u308b\u30ef\u30fc\u30af\u30d5\u30ed\u30fc\u3078\u306e\u30b7\u30fc\u30e0\u30ec\u30b9\u306a\u7d71\u5408\u304c\u53ef\u80fd\u3002\n\n# Miniapp Scraping v1.4.0\n\n\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u3001\u9ad8\u6548\u7684Python\u5de5\u5177\uff0c\u7528\u4e8e\u6293\u53d6\u65b0\u95fb\u7a3f\u5e76\u63d0\u53d6\u7535\u5b50\u90ae\u4ef6\u5730\u5740\u3002\n\n## \ud83c\udfaf \u7279\u6027 v1.4.0\n\n### \u2728 \u6838\u5fc3\u529f\u80fd\n- **\ud83d\udce7 \u7535\u5b50\u90ae\u4ef6\u63d0\u53d6**: \u4ece\u65b0\u95fb\u7a3f\u4e2d\u9ad8\u7cbe\u5ea6\u63d0\u53d6\u7535\u5b50\u90ae\u4ef6\n- **\ud83d\udd0d \u8054\u7cfb\u4fe1\u606f**: \u81ea\u52a8\u83b7\u53d6\u7535\u8bdd\u53f7\u7801\u3001\u4f20\u771f\u3001\u516c\u53f8\u8be6\u60c5\n- **\ud83d\ude80 \u7eafPython**: \u4ec5\u4f7f\u7528requests + BeautifulSoup4\n- **\u26a1 \u8f7b\u91cf\u8bbe\u8ba1**: \u6700\u5c0f\u4f9d\u8d56\u5173\u7cfb\uff0c\u5feb\u901f\u6267\u884c\n- **\ud83d\udee0\ufe0f CLI\u652f\u6301**: \u8f7b\u677e\u547d\u4ee4\u884c\u96c6\u6210\n\n## \ud83d\ude80 \u5feb\u901f\u5f00\u59cb\n\n### \u5b89\u88c5\n```bash\npip install miniapp-scraping\n```\n\n### \u547d\u4ee4\u884c\u4f7f\u7528\n```bash\n# \u57fa\u672c\u6293\u53d6\nminiapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html\n\n# \u8be6\u7ec6\u8f93\u51fa\u81ea\u5b9a\u4e49\u6587\u4ef6\u540d\nminiapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html -v -o my_data.json\n```\n\n---\n\n**\u2728 v1.4.0\u4eae\u70b9**: \u751f\u4ea7\u5c31\u7eea\u7684PyPI\u5305\uff0c\u589e\u5f3a\u7684\u7535\u5b50\u90ae\u4ef6\u63d0\u53d6\u548cCLI\u754c\u9762\uff0c\u53ef\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55\u5de5\u4f5c\u6d41\u7a0b\u4e2d\u3002\n",
"bugtrack_url": null,
"license": null,
"summary": "Lightweight web scraping tool with email extraction",
"version": "1.4.0",
"project_urls": {
"Bug Tracker": "https://github.com/TECHFUND/miniapp-scraping/issues",
"Changelog": "https://github.com/TECHFUND/miniapp-scraping/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/TECHFUND/miniapp-scraping#readme",
"Homepage": "https://github.com/TECHFUND/miniapp-scraping",
"Source Code": "https://github.com/TECHFUND/miniapp-scraping"
},
"split_keywords": [
"scraping",
" miniapp",
" press-release",
" email-extraction",
" web-scraping",
" data-extraction",
" journalism",
" media",
" automation"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f826b63e5264ad625944b7692d9024cadaf8feb608c911d11d31149d6f7e5fcb",
"md5": "ddccb1b0f54819f1f149a19e3dd82f53",
"sha256": "b344c012e64fb2830bc13d35ebded781951f1ddf1cdbd768ef05c921e735dced"
},
"downloads": -1,
"filename": "miniapp_scraping-1.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ddccb1b0f54819f1f149a19e3dd82f53",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 10260,
"upload_time": "2025-09-10T08:30:16",
"upload_time_iso_8601": "2025-09-10T08:30:16.046924Z",
"url": "https://files.pythonhosted.org/packages/f8/26/b63e5264ad625944b7692d9024cadaf8feb608c911d11d31149d6f7e5fcb/miniapp_scraping-1.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "802a4da1ee3ccc08c85efec360022031a514b1ac0f601225bc2b7486c76a5c81",
"md5": "a157c9a1bf6ef3f39e27990832c63fc8",
"sha256": "ab897ebe5e0e698194a5fecaf6677c1c7ff04d2a21fd20005732d5d39deec20d"
},
"downloads": -1,
"filename": "miniapp_scraping-1.4.0.tar.gz",
"has_sig": false,
"md5_digest": "a157c9a1bf6ef3f39e27990832c63fc8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 11113,
"upload_time": "2025-09-10T08:30:17",
"upload_time_iso_8601": "2025-09-10T08:30:17.392878Z",
"url": "https://files.pythonhosted.org/packages/80/2a/4da1ee3ccc08c85efec360022031a514b1ac0f601225bc2b7486c76a5c81/miniapp_scraping-1.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-10 08:30:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "TECHFUND",
"github_project": "miniapp-scraping",
"github_not_found": true,
"lcname": "miniapp-scraping"
}