miniapp-scraping


Nameminiapp-scraping JSON
Version 1.4.0 PyPI version JSON
download
home_pagehttps://github.com/TECHFUND/miniapp-scraping
SummaryLightweight web scraping tool with email extraction
upload_time2025-09-10 08:30:17
maintainerNone
docs_urlNone
authorTECHFUND Development Team
requires_python>=3.8
licenseNone
keywords scraping miniapp press-release email-extraction web-scraping data-extraction journalism media automation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Miniapp Scraping v1.4.0

A lightweight, efficient Python tool for web scraping press releases with advanced email extraction capabilities.

## 🎯 Features v1.4.0

### ✨ Core Functionality
- **📧 Email Extraction**: Advanced email address detection from press releases
- **🔍 Contact Information**: Extract phone numbers, FAX, and company details  
- **🚀 Pure Python**: Uses only requests + BeautifulSoup4 (no external APIs)
- **⚡ Lightweight**: Minimal dependencies, fast execution
- **🛠️ CLI Interface**: Easy command-line integration

### 🏆 Proven Technology
- **✅ Real Web Scraping**: Tested on actual PRTimes pages
- **📧 Email Success Rate**: Validated email extraction patterns
- **🐍 Pure Python**: No browser automation or heavy frameworks
- **💾 JSON Output**: Structured data format for easy integration

## Version Information
- **Version**: 1.4.0 - PyPI Release
- **Released**: September 2025
- **Language**: Python 3.8+
- **Dependencies**: 3 core packages only

## 🚀 Quick Start

### Installation
```bash
pip install miniapp-scraping
```

### Command Line Usage
```bash
# Basic scraping
miniapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html

# Verbose output with custom filename
miniapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html -v -o my_data.json
```

### Python Integration
```python
from miniapp_scraping import MiniappScraper

# Initialize scraper
scraper = MiniappScraper()

# Scrape a press release
url = "https://prtimes.jp/main/html/rd/p/000000001.000000001.html"
result = scraper.scrape(url)

# Access extracted data
print(f"Title: {result['title']}")
print(f"Company: {result['company']}")
print(f"Emails found: {result['emails']}")

# Save to JSON
scraper.save_json(result, "output.json")
```

## 📊 Data Structure

### Extracted Information
```json
{
  "url": "https://prtimes.jp/...",
  "title": "Press Release Title",
  "company": "Company Name Ltd.",
  "date": "2025年9月10日",
  "content": "Full press release content...",
  "emails": ["contact@company.com", "info@company.com"],
  "scraped_at": "2025-09-10T12:00:00",
  "version": "1.4.0"
}
```

## 🔧 Advanced Usage

### CLI Options
```bash
# Help
miniapp-scraping --help

# Verbose mode
miniapp-scraping URL --verbose

# Custom output file
miniapp-scraping URL --output custom_name.json
```

### Python API
```python
from miniapp_scraping import MiniappScraper

scraper = MiniappScraper()

# Scrape multiple URLs
urls = [
    "https://prtimes.jp/main/html/rd/p/000000001.000000001.html",
    "https://prtimes.jp/main/html/rd/p/000000002.000000001.html"
]

results = []
for url in urls:
    data = scraper.scrape(url)
    if data:
        results.append(data)

# Batch save
for i, result in enumerate(results):
    scraper.save_json(result, f"release_{i+1}.json")
```

## 📁 Project Structure

```
miniapp_scraping/
├── __init__.py       # Package initialization
├── scraper.py        # Core scraping functionality  
└── cli.py           # Command-line interface
```

## 🧪 Tested Email Patterns

### Supported Formats
- **Context emails**: お問い合わせ:contact@example.com
- **Standard format**: info@company.co.jp
- **Special purpose**: press@, support@, inquiry@
- **HTML entities**: Encoded email addresses

### Validation
- ✅ RFC-compliant email format checking
- ✅ Duplicate removal
- ✅ Priority-based extraction (contact info first)

## 🎯 Use Cases

### Business Applications
- **Media Monitoring**: Track company announcements
- **Lead Generation**: Extract contact information for outreach
- **Market Research**: Analyze industry press releases
- **Journalism**: Gather information for news stories

### Technical Integration
- **Data Pipelines**: Integrate with existing workflows
- **Automation**: Schedule regular scraping tasks
- **Analysis**: Feed data into business intelligence tools
- **Archives**: Build press release databases

## ⚙️ System Requirements

- **Python**: 3.8 or higher
- **OS**: Windows, macOS, Linux
- **Internet**: Required for scraping
- **Memory**: Minimal (< 50MB typical usage)

## 🔄 Changelog

### v1.4.0 (September 2025) - PyPI Release
- **🎯 PyPI Publication**: Official package release
- **📧 Enhanced Email Extraction**: Multi-pattern detection
- **🛠️ CLI Interface**: Command-line tool included
- **⚡ Performance**: Optimized for speed and efficiency
- **📝 Documentation**: Comprehensive English documentation

### Previous Versions
- v1.3.0: Email extraction enhancement
- v1.2.1: JSONL database integration
- v1.1.0: Structured data output

## 🛠️ Development

### Local Development
```bash
# Clone repository
git clone https://github.com/TECHFUND/miniapp-scraping
cd miniapp-scraping

# Install in development mode
pip install -e .

# Run tests
python -m pytest
```

### Building Package
```bash
# Build distribution
python setup.py sdist bdist_wheel

# Upload to PyPI (maintainers only)
twine upload dist/*
```

## 📞 Support & Contributing

- **GitHub Issues**: [Report bugs or request features](https://github.com/TECHFUND/miniapp-scraping/issues)
- **Documentation**: [Full documentation](https://github.com/TECHFUND/miniapp-scraping#readme)
- **Contributing**: Pull requests welcome

## 📋 License

MIT License - see [LICENSE](LICENSE) file for details.

## ⚠️ Legal Notice

This tool is designed for legitimate research and business purposes. Please respect website terms of service and robots.txt when using this scraper.

---

**✨ v1.4.0 Highlights**: Production-ready PyPI package with enhanced email extraction and CLI interface for seamless integration into any workflow.

# Miniapp Scraping v1.4.0

プレスリリースを効率的にスクレイピングし、メールアドレスの抽出に特化した軽量Pythonツールです。

## 🎯 特徴 v1.4.0

### ✨ 主要機能
- **📧 メールアドレス抽出**: プレスリリースから高精度でメール抽出
- **🔍 連絡先情報**: 電話番号・FAX・企業詳細も自動取得
- **🚀 純粋Python**: requests + BeautifulSoup4のみ使用
- **⚡ 軽量設計**: 最小限の依存関係、高速実行
- **🛠️ CLI対応**: コマンドライン統合が簡単

### 🏆 実証済み技術
- **✅ リアルWebスクレイピング**: 実際のPRTimesページでテスト済み
- **📧 メール抽出精度**: 検証済みパターンマッチング
- **🐍 Pure Python**: ブラウザ自動化や重いフレームワーク不要
- **💾 JSON出力**: 構造化データで簡単統合

## バージョン情報
- **Version**: 1.4.0 - PyPI公開版
- **Released**: 2025年9月
- **Language**: Python 3.8+
- **Dependencies**: 核となる3パッケージのみ

## 🚀 クイックスタート

### インストール
```bash
pip install miniapp-scraping
```

### コマンドライン使用
```bash
# 基本的なスクレイピング
miniapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html

# 詳細出力でカスタムファイル名指定
miniapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html -v -o my_data.json
```

### Python統合
```python
from miniapp_scraping import MiniappScraper

# スクレイパー初期化
scraper = MiniappScraper()

# プレスリリース取得
url = "https://prtimes.jp/main/html/rd/p/000000001.000000001.html"
result = scraper.scrape(url)

# 抽出データアクセス
print(f"タイトル: {result['title']}")
print(f"会社名: {result['company']}")
print(f"メール発見数: {result['emails']}")

# JSON保存
scraper.save_json(result, "output.json")
```

## 🔧 高度な使用方法

### CLIオプション
```bash
# ヘルプ
miniapp-scraping --help

# 詳細モード
miniapp-scraping URL --verbose

# カスタム出力ファイル
miniapp-scraping URL --output custom_name.json
```

## 📞 サポート・貢献

- **GitHub Issues**: [バグレポートや機能リクエスト](https://github.com/TECHFUND/miniapp-scraping/issues)
- **ドキュメント**: [完全ドキュメント](https://github.com/TECHFUND/miniapp-scraping#readme)
- **貢献**: プルリクエスト歓迎

---

**✨ v1.4.0の特徴**: メール抽出とCLIインターフェースを強化した本格的なPyPIパッケージ、あらゆるワークフローへのシームレスな統合が可能。

# Miniapp Scraping v1.4.0

一个轻量级、高效的Python工具,用于抓取新闻稿并提取电子邮件地址。

## 🎯 特性 v1.4.0

### ✨ 核心功能
- **📧 电子邮件提取**: 从新闻稿中高精度提取电子邮件
- **🔍 联系信息**: 自动获取电话号码、传真、公司详情
- **🚀 纯Python**: 仅使用requests + BeautifulSoup4
- **⚡ 轻量设计**: 最小依赖关系,快速执行
- **🛠️ CLI支持**: 轻松命令行集成

## 🚀 快速开始

### 安装
```bash
pip install miniapp-scraping
```

### 命令行使用
```bash
# 基本抓取
miniapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html

# 详细输出自定义文件名
miniapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html -v -o my_data.json
```

---

**✨ v1.4.0亮点**: 生产就绪的PyPI包,增强的电子邮件提取和CLI界面,可无缝集成到任何工作流程中。

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/TECHFUND/miniapp-scraping",
    "name": "miniapp-scraping",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "scraping, miniapp, press-release, email-extraction, web-scraping, data-extraction, journalism, media, automation",
    "author": "TECHFUND Development Team",
    "author_email": "dev@techfund.jp",
    "download_url": "https://files.pythonhosted.org/packages/80/2a/4da1ee3ccc08c85efec360022031a514b1ac0f601225bc2b7486c76a5c81/miniapp_scraping-1.4.0.tar.gz",
    "platform": null,
    "description": "# Miniapp Scraping v1.4.0\n\nA lightweight, efficient Python tool for web scraping press releases with advanced email extraction capabilities.\n\n## \ud83c\udfaf Features v1.4.0\n\n### \u2728 Core Functionality\n- **\ud83d\udce7 Email Extraction**: Advanced email address detection from press releases\n- **\ud83d\udd0d Contact Information**: Extract phone numbers, FAX, and company details  \n- **\ud83d\ude80 Pure Python**: Uses only requests + BeautifulSoup4 (no external APIs)\n- **\u26a1 Lightweight**: Minimal dependencies, fast execution\n- **\ud83d\udee0\ufe0f CLI Interface**: Easy command-line integration\n\n### \ud83c\udfc6 Proven Technology\n- **\u2705 Real Web Scraping**: Tested on actual PRTimes pages\n- **\ud83d\udce7 Email Success Rate**: Validated email extraction patterns\n- **\ud83d\udc0d Pure Python**: No browser automation or heavy frameworks\n- **\ud83d\udcbe JSON Output**: Structured data format for easy integration\n\n## Version Information\n- **Version**: 1.4.0 - PyPI Release\n- **Released**: September 2025\n- **Language**: Python 3.8+\n- **Dependencies**: 3 core packages only\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n```bash\npip install miniapp-scraping\n```\n\n### Command Line Usage\n```bash\n# Basic scraping\nminiapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html\n\n# Verbose output with custom filename\nminiapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html -v -o my_data.json\n```\n\n### Python Integration\n```python\nfrom miniapp_scraping import MiniappScraper\n\n# Initialize scraper\nscraper = MiniappScraper()\n\n# Scrape a press release\nurl = \"https://prtimes.jp/main/html/rd/p/000000001.000000001.html\"\nresult = scraper.scrape(url)\n\n# Access extracted data\nprint(f\"Title: {result['title']}\")\nprint(f\"Company: {result['company']}\")\nprint(f\"Emails found: {result['emails']}\")\n\n# Save to JSON\nscraper.save_json(result, \"output.json\")\n```\n\n## \ud83d\udcca Data Structure\n\n### Extracted Information\n```json\n{\n  \"url\": \"https://prtimes.jp/...\",\n  \"title\": \"Press Release Title\",\n  \"company\": \"Company Name Ltd.\",\n  \"date\": \"2025\u5e749\u670810\u65e5\",\n  \"content\": \"Full press release content...\",\n  \"emails\": [\"contact@company.com\", \"info@company.com\"],\n  \"scraped_at\": \"2025-09-10T12:00:00\",\n  \"version\": \"1.4.0\"\n}\n```\n\n## \ud83d\udd27 Advanced Usage\n\n### CLI Options\n```bash\n# Help\nminiapp-scraping --help\n\n# Verbose mode\nminiapp-scraping URL --verbose\n\n# Custom output file\nminiapp-scraping URL --output custom_name.json\n```\n\n### Python API\n```python\nfrom miniapp_scraping import MiniappScraper\n\nscraper = MiniappScraper()\n\n# Scrape multiple URLs\nurls = [\n    \"https://prtimes.jp/main/html/rd/p/000000001.000000001.html\",\n    \"https://prtimes.jp/main/html/rd/p/000000002.000000001.html\"\n]\n\nresults = []\nfor url in urls:\n    data = scraper.scrape(url)\n    if data:\n        results.append(data)\n\n# Batch save\nfor i, result in enumerate(results):\n    scraper.save_json(result, f\"release_{i+1}.json\")\n```\n\n## \ud83d\udcc1 Project Structure\n\n```\nminiapp_scraping/\n\u251c\u2500\u2500 __init__.py       # Package initialization\n\u251c\u2500\u2500 scraper.py        # Core scraping functionality  \n\u2514\u2500\u2500 cli.py           # Command-line interface\n```\n\n## \ud83e\uddea Tested Email Patterns\n\n### Supported Formats\n- **Context emails**: \u304a\u554f\u3044\u5408\u308f\u305b\uff1acontact@example.com\n- **Standard format**: info@company.co.jp\n- **Special purpose**: press@, support@, inquiry@\n- **HTML entities**: Encoded email addresses\n\n### Validation\n- \u2705 RFC-compliant email format checking\n- \u2705 Duplicate removal\n- \u2705 Priority-based extraction (contact info first)\n\n## \ud83c\udfaf Use Cases\n\n### Business Applications\n- **Media Monitoring**: Track company announcements\n- **Lead Generation**: Extract contact information for outreach\n- **Market Research**: Analyze industry press releases\n- **Journalism**: Gather information for news stories\n\n### Technical Integration\n- **Data Pipelines**: Integrate with existing workflows\n- **Automation**: Schedule regular scraping tasks\n- **Analysis**: Feed data into business intelligence tools\n- **Archives**: Build press release databases\n\n## \u2699\ufe0f System Requirements\n\n- **Python**: 3.8 or higher\n- **OS**: Windows, macOS, Linux\n- **Internet**: Required for scraping\n- **Memory**: Minimal (< 50MB typical usage)\n\n## \ud83d\udd04 Changelog\n\n### v1.4.0 (September 2025) - PyPI Release\n- **\ud83c\udfaf PyPI Publication**: Official package release\n- **\ud83d\udce7 Enhanced Email Extraction**: Multi-pattern detection\n- **\ud83d\udee0\ufe0f CLI Interface**: Command-line tool included\n- **\u26a1 Performance**: Optimized for speed and efficiency\n- **\ud83d\udcdd Documentation**: Comprehensive English documentation\n\n### Previous Versions\n- v1.3.0: Email extraction enhancement\n- v1.2.1: JSONL database integration\n- v1.1.0: Structured data output\n\n## \ud83d\udee0\ufe0f Development\n\n### Local Development\n```bash\n# Clone repository\ngit clone https://github.com/TECHFUND/miniapp-scraping\ncd miniapp-scraping\n\n# Install in development mode\npip install -e .\n\n# Run tests\npython -m pytest\n```\n\n### Building Package\n```bash\n# Build distribution\npython setup.py sdist bdist_wheel\n\n# Upload to PyPI (maintainers only)\ntwine upload dist/*\n```\n\n## \ud83d\udcde Support & Contributing\n\n- **GitHub Issues**: [Report bugs or request features](https://github.com/TECHFUND/miniapp-scraping/issues)\n- **Documentation**: [Full documentation](https://github.com/TECHFUND/miniapp-scraping#readme)\n- **Contributing**: Pull requests welcome\n\n## \ud83d\udccb License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## \u26a0\ufe0f Legal Notice\n\nThis tool is designed for legitimate research and business purposes. Please respect website terms of service and robots.txt when using this scraper.\n\n---\n\n**\u2728 v1.4.0 Highlights**: Production-ready PyPI package with enhanced email extraction and CLI interface for seamless integration into any workflow.\n\n# Miniapp Scraping v1.4.0\n\n\u30d7\u30ec\u30b9\u30ea\u30ea\u30fc\u30b9\u3092\u52b9\u7387\u7684\u306b\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3057\u3001\u30e1\u30fc\u30eb\u30a2\u30c9\u30ec\u30b9\u306e\u62bd\u51fa\u306b\u7279\u5316\u3057\u305f\u8efd\u91cfPython\u30c4\u30fc\u30eb\u3067\u3059\u3002\n\n## \ud83c\udfaf \u7279\u5fb4 v1.4.0\n\n### \u2728 \u4e3b\u8981\u6a5f\u80fd\n- **\ud83d\udce7 \u30e1\u30fc\u30eb\u30a2\u30c9\u30ec\u30b9\u62bd\u51fa**: \u30d7\u30ec\u30b9\u30ea\u30ea\u30fc\u30b9\u304b\u3089\u9ad8\u7cbe\u5ea6\u3067\u30e1\u30fc\u30eb\u62bd\u51fa\n- **\ud83d\udd0d \u9023\u7d61\u5148\u60c5\u5831**: \u96fb\u8a71\u756a\u53f7\u30fbFAX\u30fb\u4f01\u696d\u8a73\u7d30\u3082\u81ea\u52d5\u53d6\u5f97\n- **\ud83d\ude80 \u7d14\u7c8bPython**: requests + BeautifulSoup4\u306e\u307f\u4f7f\u7528\n- **\u26a1 \u8efd\u91cf\u8a2d\u8a08**: \u6700\u5c0f\u9650\u306e\u4f9d\u5b58\u95a2\u4fc2\u3001\u9ad8\u901f\u5b9f\u884c\n- **\ud83d\udee0\ufe0f CLI\u5bfe\u5fdc**: \u30b3\u30de\u30f3\u30c9\u30e9\u30a4\u30f3\u7d71\u5408\u304c\u7c21\u5358\n\n### \ud83c\udfc6 \u5b9f\u8a3c\u6e08\u307f\u6280\u8853\n- **\u2705 \u30ea\u30a2\u30ebWeb\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0**: \u5b9f\u969b\u306ePRTimes\u30da\u30fc\u30b8\u3067\u30c6\u30b9\u30c8\u6e08\u307f\n- **\ud83d\udce7 \u30e1\u30fc\u30eb\u62bd\u51fa\u7cbe\u5ea6**: \u691c\u8a3c\u6e08\u307f\u30d1\u30bf\u30fc\u30f3\u30de\u30c3\u30c1\u30f3\u30b0\n- **\ud83d\udc0d Pure Python**: \u30d6\u30e9\u30a6\u30b6\u81ea\u52d5\u5316\u3084\u91cd\u3044\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u4e0d\u8981\n- **\ud83d\udcbe JSON\u51fa\u529b**: \u69cb\u9020\u5316\u30c7\u30fc\u30bf\u3067\u7c21\u5358\u7d71\u5408\n\n## \u30d0\u30fc\u30b8\u30e7\u30f3\u60c5\u5831\n- **Version**: 1.4.0 - PyPI\u516c\u958b\u7248\n- **Released**: 2025\u5e749\u6708\n- **Language**: Python 3.8+\n- **Dependencies**: \u6838\u3068\u306a\u308b3\u30d1\u30c3\u30b1\u30fc\u30b8\u306e\u307f\n\n## \ud83d\ude80 \u30af\u30a4\u30c3\u30af\u30b9\u30bf\u30fc\u30c8\n\n### \u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\n```bash\npip install miniapp-scraping\n```\n\n### \u30b3\u30de\u30f3\u30c9\u30e9\u30a4\u30f3\u4f7f\u7528\n```bash\n# \u57fa\u672c\u7684\u306a\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\nminiapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html\n\n# \u8a73\u7d30\u51fa\u529b\u3067\u30ab\u30b9\u30bf\u30e0\u30d5\u30a1\u30a4\u30eb\u540d\u6307\u5b9a\nminiapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html -v -o my_data.json\n```\n\n### Python\u7d71\u5408\n```python\nfrom miniapp_scraping import MiniappScraper\n\n# \u30b9\u30af\u30ec\u30a4\u30d1\u30fc\u521d\u671f\u5316\nscraper = MiniappScraper()\n\n# \u30d7\u30ec\u30b9\u30ea\u30ea\u30fc\u30b9\u53d6\u5f97\nurl = \"https://prtimes.jp/main/html/rd/p/000000001.000000001.html\"\nresult = scraper.scrape(url)\n\n# \u62bd\u51fa\u30c7\u30fc\u30bf\u30a2\u30af\u30bb\u30b9\nprint(f\"\u30bf\u30a4\u30c8\u30eb: {result['title']}\")\nprint(f\"\u4f1a\u793e\u540d: {result['company']}\")\nprint(f\"\u30e1\u30fc\u30eb\u767a\u898b\u6570: {result['emails']}\")\n\n# JSON\u4fdd\u5b58\nscraper.save_json(result, \"output.json\")\n```\n\n## \ud83d\udd27 \u9ad8\u5ea6\u306a\u4f7f\u7528\u65b9\u6cd5\n\n### CLI\u30aa\u30d7\u30b7\u30e7\u30f3\n```bash\n# \u30d8\u30eb\u30d7\nminiapp-scraping --help\n\n# \u8a73\u7d30\u30e2\u30fc\u30c9\nminiapp-scraping URL --verbose\n\n# \u30ab\u30b9\u30bf\u30e0\u51fa\u529b\u30d5\u30a1\u30a4\u30eb\nminiapp-scraping URL --output custom_name.json\n```\n\n## \ud83d\udcde \u30b5\u30dd\u30fc\u30c8\u30fb\u8ca2\u732e\n\n- **GitHub Issues**: [\u30d0\u30b0\u30ec\u30dd\u30fc\u30c8\u3084\u6a5f\u80fd\u30ea\u30af\u30a8\u30b9\u30c8](https://github.com/TECHFUND/miniapp-scraping/issues)\n- **\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8**: [\u5b8c\u5168\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8](https://github.com/TECHFUND/miniapp-scraping#readme)\n- **\u8ca2\u732e**: \u30d7\u30eb\u30ea\u30af\u30a8\u30b9\u30c8\u6b53\u8fce\n\n---\n\n**\u2728 v1.4.0\u306e\u7279\u5fb4**: \u30e1\u30fc\u30eb\u62bd\u51fa\u3068CLI\u30a4\u30f3\u30bf\u30fc\u30d5\u30a7\u30fc\u30b9\u3092\u5f37\u5316\u3057\u305f\u672c\u683c\u7684\u306aPyPI\u30d1\u30c3\u30b1\u30fc\u30b8\u3001\u3042\u3089\u3086\u308b\u30ef\u30fc\u30af\u30d5\u30ed\u30fc\u3078\u306e\u30b7\u30fc\u30e0\u30ec\u30b9\u306a\u7d71\u5408\u304c\u53ef\u80fd\u3002\n\n# Miniapp Scraping v1.4.0\n\n\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u3001\u9ad8\u6548\u7684Python\u5de5\u5177\uff0c\u7528\u4e8e\u6293\u53d6\u65b0\u95fb\u7a3f\u5e76\u63d0\u53d6\u7535\u5b50\u90ae\u4ef6\u5730\u5740\u3002\n\n## \ud83c\udfaf \u7279\u6027 v1.4.0\n\n### \u2728 \u6838\u5fc3\u529f\u80fd\n- **\ud83d\udce7 \u7535\u5b50\u90ae\u4ef6\u63d0\u53d6**: \u4ece\u65b0\u95fb\u7a3f\u4e2d\u9ad8\u7cbe\u5ea6\u63d0\u53d6\u7535\u5b50\u90ae\u4ef6\n- **\ud83d\udd0d \u8054\u7cfb\u4fe1\u606f**: \u81ea\u52a8\u83b7\u53d6\u7535\u8bdd\u53f7\u7801\u3001\u4f20\u771f\u3001\u516c\u53f8\u8be6\u60c5\n- **\ud83d\ude80 \u7eafPython**: \u4ec5\u4f7f\u7528requests + BeautifulSoup4\n- **\u26a1 \u8f7b\u91cf\u8bbe\u8ba1**: \u6700\u5c0f\u4f9d\u8d56\u5173\u7cfb\uff0c\u5feb\u901f\u6267\u884c\n- **\ud83d\udee0\ufe0f CLI\u652f\u6301**: \u8f7b\u677e\u547d\u4ee4\u884c\u96c6\u6210\n\n## \ud83d\ude80 \u5feb\u901f\u5f00\u59cb\n\n### \u5b89\u88c5\n```bash\npip install miniapp-scraping\n```\n\n### \u547d\u4ee4\u884c\u4f7f\u7528\n```bash\n# \u57fa\u672c\u6293\u53d6\nminiapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html\n\n# \u8be6\u7ec6\u8f93\u51fa\u81ea\u5b9a\u4e49\u6587\u4ef6\u540d\nminiapp-scraping https://prtimes.jp/main/html/rd/p/000000001.000000001.html -v -o my_data.json\n```\n\n---\n\n**\u2728 v1.4.0\u4eae\u70b9**: \u751f\u4ea7\u5c31\u7eea\u7684PyPI\u5305\uff0c\u589e\u5f3a\u7684\u7535\u5b50\u90ae\u4ef6\u63d0\u53d6\u548cCLI\u754c\u9762\uff0c\u53ef\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55\u5de5\u4f5c\u6d41\u7a0b\u4e2d\u3002\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Lightweight web scraping tool with email extraction",
    "version": "1.4.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/TECHFUND/miniapp-scraping/issues",
        "Changelog": "https://github.com/TECHFUND/miniapp-scraping/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/TECHFUND/miniapp-scraping#readme",
        "Homepage": "https://github.com/TECHFUND/miniapp-scraping",
        "Source Code": "https://github.com/TECHFUND/miniapp-scraping"
    },
    "split_keywords": [
        "scraping",
        " miniapp",
        " press-release",
        " email-extraction",
        " web-scraping",
        " data-extraction",
        " journalism",
        " media",
        " automation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f826b63e5264ad625944b7692d9024cadaf8feb608c911d11d31149d6f7e5fcb",
                "md5": "ddccb1b0f54819f1f149a19e3dd82f53",
                "sha256": "b344c012e64fb2830bc13d35ebded781951f1ddf1cdbd768ef05c921e735dced"
            },
            "downloads": -1,
            "filename": "miniapp_scraping-1.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ddccb1b0f54819f1f149a19e3dd82f53",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 10260,
            "upload_time": "2025-09-10T08:30:16",
            "upload_time_iso_8601": "2025-09-10T08:30:16.046924Z",
            "url": "https://files.pythonhosted.org/packages/f8/26/b63e5264ad625944b7692d9024cadaf8feb608c911d11d31149d6f7e5fcb/miniapp_scraping-1.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "802a4da1ee3ccc08c85efec360022031a514b1ac0f601225bc2b7486c76a5c81",
                "md5": "a157c9a1bf6ef3f39e27990832c63fc8",
                "sha256": "ab897ebe5e0e698194a5fecaf6677c1c7ff04d2a21fd20005732d5d39deec20d"
            },
            "downloads": -1,
            "filename": "miniapp_scraping-1.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a157c9a1bf6ef3f39e27990832c63fc8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 11113,
            "upload_time": "2025-09-10T08:30:17",
            "upload_time_iso_8601": "2025-09-10T08:30:17.392878Z",
            "url": "https://files.pythonhosted.org/packages/80/2a/4da1ee3ccc08c85efec360022031a514b1ac0f601225bc2b7486c76a5c81/miniapp_scraping-1.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-10 08:30:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TECHFUND",
    "github_project": "miniapp-scraping",
    "github_not_found": true,
    "lcname": "miniapp-scraping"
}
        
Elapsed time: 3.07649s