kaggle-discussion-extractor


Namekaggle-discussion-extractor JSON
Version 1.2.0 PyPI version JSON
download
home_pagehttps://github.com/Letemoin/kaggle-discussion-extractor
SummaryA professional-grade tool for extracting and analyzing discussions from Kaggle competitions
upload_time2025-09-08 20:03:14
maintainerKaggle Discussion Extractor Contributors
docs_urlNone
authorKaggle Discussion Extractor Contributors
requires_python>=3.8
licenseMIT
keywords kaggle discussion extractor web-scraping data-extraction machine-learning competition playwright async
VCS
bugtrack_url
requirements playwright asyncio-extras beautifulsoup4 nbformat nbconvert
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Kaggle Discussion Extractor

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Playwright](https://img.shields.io/badge/Playwright-45ba4b?style=flat&logo=playwright&logoColor=white)](https://playwright.dev/python/)

A professional-grade Python tool for extracting and analyzing discussions and solution writeups from Kaggle competitions. Features hierarchical reply extraction, automatic writeup extraction from leaderboards, and clean markdown output with rich metadata.

## ๐Ÿš€ Key Features

### Competition Writeup Extraction
- **Leaderboard Scraping**: Automatically extracts writeup URLs from competition leaderboards
- **Private/Public Leaderboards**: Supports both private and public leaderboard tabs
- **Custom Naming**: Files saved as `{contest_name}_{rank}_{team_name}.md`
- **Rich Metadata**: Includes rank, team members, scores, and extraction timestamps
- **Top-N Selection**: Extract only top performers (e.g., top 10)

### Hierarchical Discussion Extraction
- **Complete Thread Preservation**: Maintains the full discussion structure with parent-child relationships
- **Smart Reply Numbering**: Automatic hierarchical numbering (1, 1.1, 1.2, 2, 2.1, etc.)
- **No Content Duplication**: Intelligently separates parent and nested reply content
- **Deep Nesting Support**: Handles multiple levels of nested replies

### Rich Metadata Extraction
- **Author Information**: Names, usernames, profile URLs
- **Competition Rankings**: Extracts "Nth in this Competition" rankings
- **User Badges**: Competition Host, Expert, Master, Grandmaster badges
- **Engagement Metrics**: Upvotes/downvotes for all posts and replies
- **Timestamps**: Full timestamp extraction for temporal analysis

### Advanced Capabilities
- **Pagination Support**: Automatically handles multi-page discussion lists
- **Batch Processing**: Extract all discussions from a competition at once
- **Rate Limiting**: Built-in delays to respect server resources
- **Error Recovery**: Robust error handling with detailed logging
- **Multiple Output Formats**: Clean Markdown export with proper formatting

## ๐Ÿ“ฆ Installation

### Method 1: Install from PyPI (Recommended)

```bash
pip install kaggle-discussion-extractor
playwright install chromium
```

### Method 2: Install from Source

```bash
# Clone the repository
git clone https://github.com/Letemoin/kaggle-discussion-extractor.git
cd kaggle-discussion-extractor

# Install in development mode
pip install -e .
playwright install chromium
```

## ๐ŸŽฏ Quick Start

### Command Line Usage

```bash
# Extract all discussions from a competition
kaggle-discussion-extractor https://www.kaggle.com/competitions/neurips-2025

# Extract only 10 discussions
kaggle-discussion-extractor https://www.kaggle.com/competitions/neurips-2025 --limit 10

# Enable development mode for detailed logging
kaggle-discussion-extractor https://www.kaggle.com/competitions/neurips-2025 --dev-mode

# Run with visible browser (useful for debugging)
kaggle-discussion-extractor https://www.kaggle.com/competitions/neurips-2025 --no-headless

# Extract top 10 writeups from private leaderboard
kaggle-discussion-extractor https://www.kaggle.com/competitions/cmi-detect-behavior --extract-writeups --limit 10

# Extract from public leaderboard with development mode
kaggle-discussion-extractor https://www.kaggle.com/competitions/cmi-detect-behavior --extract-writeups --leaderboard-tab public --dev-mode
```

### Python API Usage

#### Extract Discussions
```python
import asyncio
from kaggle_discussion_extractor import KaggleDiscussionExtractor

async def extract_discussions():
    # Initialize extractor
    extractor = KaggleDiscussionExtractor(dev_mode=True)
    
    # Extract discussions
    success = await extractor.extract_competition_discussions(
        competition_url="https://www.kaggle.com/competitions/neurips-2025",
        limit=5  # Optional: limit number of discussions
    )
    
    if success:
        print("Extraction completed successfully!")
    else:
        print("Extraction failed!")

# Run the extraction
asyncio.run(extract_discussions())
```

#### Extract Writeups
```python
import asyncio
from kaggle_discussion_extractor import KaggleWriteupExtractor

async def extract_writeups():
    # Initialize writeup extractor
    extractor = KaggleWriteupExtractor(dev_mode=True)
    
    # Extract top 5 writeups from private leaderboard
    success = await extractor.extract_writeups(
        competition_url="https://www.kaggle.com/competitions/cmi-detect-behavior",
        limit=5,
        leaderboard_tab="private"
    )
    
    if success:
        print("Writeup extraction completed successfully!")
    else:
        print("Writeup extraction failed!")

# Run the extraction
asyncio.run(extract_writeups())
```

## ๐Ÿ“‹ CLI Options

| Option | Description | Default |
|--------|-------------|---------|
| `competition_url` | URL of the Kaggle competition (required) | - |
| `--limit, -l` | Number of discussions/writeups to extract | All |
| `--dev-mode, -d` | Enable detailed logging | False |
| `--no-headless` | Run browser in visible mode | False (headless) |
| `--date-format` | Include YYMMDD date in filename | False |
| `--date-position` | Position of date (prefix/suffix) | suffix |
| `--extract-writeups` | Extract writeups from leaderboard | False |
| `--leaderboard-tab` | Leaderboard tab (private/public) | private |
| `--version, -v` | Show version information | - |

## ๐Ÿ“ Output Structure

### Writeup Extraction
The writeup extractor creates a `writeups_extracted` directory with:

```
writeups_extracted/
โ”œโ”€โ”€ contest-name_01_TeamName.md
โ”œโ”€โ”€ contest-name_02_AnotherTeam.md 
โ”œโ”€โ”€ contest-name_03_ThirdPlace.md
โ””โ”€โ”€ ...
```

### Discussion Extraction
The discussion extractor creates a `kaggle_discussions_extracted` directory with:

```
kaggle_discussions_extracted/
โ”œโ”€โ”€ 01_Discussion_Title.md
โ”œโ”€โ”€ 02_Another_Discussion.md
โ”œโ”€โ”€ 03_Third_Discussion.md
โ””โ”€โ”€ ...
```

### Sample Output Format

```markdown
# Discussion Title

**URL**: https://www.kaggle.com/competitions/neurips-2025/discussion/123456
**Total Comments**: 15
**Extracted**: 2025-01-15T10:30:00

---

## Main Post

**Author**: username (@username)
**Rank**: 27th in this Competition
**Badges**: Competition Host
**Upvotes**: 36

Main discussion content goes here...

---

## Replies

### Reply 1

- **Author**: user1 (@user1)
- **Rank**: 154th in this Competition
- **Upvotes**: 11
- **Timestamp**: Tue Jun 17 2025 11:54:57 GMT+0300

Content of reply 1...

  #### Reply 1.1

  - **Author**: user2 (@user2)
  - **Upvotes**: 6
  - **Timestamp**: Sun Jun 29 2025 04:20:43 GMT+0300

  Nested reply content...

  #### Reply 1.2

  - **Author**: user3 (@user3)
  - **Upvotes**: 2
  - **Timestamp**: Wed Jul 16 2025 12:50:34 GMT+0300

  Another nested reply...

---

### Reply 2

- **Author**: user4 (@user4)
- **Upvotes**: -3

Content of reply 2...

---
```

## โš™๏ธ Configuration

### Development Mode

Enable development mode to see detailed logs and debugging information:

```python
extractor = KaggleDiscussionExtractor(dev_mode=True)
```

**What dev_mode does:**
- Enables DEBUG level logging
- Shows detailed progress information
- Displays browser automation steps
- Provides error stack traces
- Logs DOM element detection details

### Browser Mode

Run with visible browser for debugging:

```python
extractor = KaggleDiscussionExtractor(headless=False)
```

## ๐Ÿงช Examples

### Basic Example

```python
from kaggle_discussion_extractor import KaggleDiscussionExtractor
import asyncio

async def main():
    extractor = KaggleDiscussionExtractor()
    
    await extractor.extract_competition_discussions(
        "https://www.kaggle.com/competitions/neurips-2025"
    )

asyncio.run(main())
```

### Advanced Example with Logging

```python
import asyncio
import logging
from kaggle_discussion_extractor import KaggleDiscussionExtractor

# Setup custom logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

async def extract_with_monitoring():
    extractor = KaggleDiscussionExtractor(
        dev_mode=True,  # Enable detailed logging
        headless=True   # Run in background
    )
    
    logger.info("Starting extraction...")
    
    success = await extractor.extract_competition_discussions(
        competition_url="https://www.kaggle.com/competitions/neurips-2025",
        limit=20  # Extract first 20 discussions
    )
    
    if success:
        logger.info("โœ… Extraction completed successfully!")
        logger.info("Check 'kaggle_discussions_extracted' directory for results")
    else:
        logger.error("โŒ Extraction failed!")

if __name__ == "__main__":
    asyncio.run(extract_with_monitoring())
```

## ๐Ÿ”ง Development

### Setup Development Environment

```bash
# Clone repository
git clone https://github.com/Letemoin/kaggle-discussion-extractor.git
cd kaggle-discussion-extractor

# Install development dependencies
pip install -e ".[dev]"
playwright install chromium

# Run tests
pytest tests/
```

### Project Structure

```
kaggle_discussion_extractor/
โ”œโ”€โ”€ __init__.py          # Package initialization
โ”œโ”€โ”€ core.py             # Main extraction logic
โ””โ”€โ”€ cli.py              # Command-line interface
```

## ๐Ÿค Contributing

Contributions are welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) for details on how to submit pull requests, report issues, and contribute to the project.


## ๐Ÿ™ Acknowledgments

- Built with [Playwright](https://playwright.dev/) for reliable browser automation
- Inspired by the need for better Kaggle competition analysis tools
- Thanks to the open-source community for continuous support


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Letemoin/kaggle-discussion-extractor",
    "name": "kaggle-discussion-extractor",
    "maintainer": "Kaggle Discussion Extractor Contributors",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "kaggle, discussion, extractor, web-scraping, data-extraction, machine-learning, competition, playwright, async",
    "author": "Kaggle Discussion Extractor Contributors",
    "author_email": "contact@kaggle-extractor.com",
    "download_url": "https://files.pythonhosted.org/packages/87/58/d96ad6962b407040ea48d2941902b2e7ceb44cecbad9ddfa567a3d236364/kaggle_discussion_extractor-1.2.0.tar.gz",
    "platform": null,
    "description": "# Kaggle Discussion Extractor\r\n\r\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n[![Playwright](https://img.shields.io/badge/Playwright-45ba4b?style=flat&logo=playwright&logoColor=white)](https://playwright.dev/python/)\r\n\r\nA professional-grade Python tool for extracting and analyzing discussions and solution writeups from Kaggle competitions. Features hierarchical reply extraction, automatic writeup extraction from leaderboards, and clean markdown output with rich metadata.\r\n\r\n## \ud83d\ude80 Key Features\r\n\r\n### Competition Writeup Extraction\r\n- **Leaderboard Scraping**: Automatically extracts writeup URLs from competition leaderboards\r\n- **Private/Public Leaderboards**: Supports both private and public leaderboard tabs\r\n- **Custom Naming**: Files saved as `{contest_name}_{rank}_{team_name}.md`\r\n- **Rich Metadata**: Includes rank, team members, scores, and extraction timestamps\r\n- **Top-N Selection**: Extract only top performers (e.g., top 10)\r\n\r\n### Hierarchical Discussion Extraction\r\n- **Complete Thread Preservation**: Maintains the full discussion structure with parent-child relationships\r\n- **Smart Reply Numbering**: Automatic hierarchical numbering (1, 1.1, 1.2, 2, 2.1, etc.)\r\n- **No Content Duplication**: Intelligently separates parent and nested reply content\r\n- **Deep Nesting Support**: Handles multiple levels of nested replies\r\n\r\n### Rich Metadata Extraction\r\n- **Author Information**: Names, usernames, profile URLs\r\n- **Competition Rankings**: Extracts \"Nth in this Competition\" rankings\r\n- **User Badges**: Competition Host, Expert, Master, Grandmaster badges\r\n- **Engagement Metrics**: Upvotes/downvotes for all posts and replies\r\n- **Timestamps**: Full timestamp extraction for temporal analysis\r\n\r\n### Advanced Capabilities\r\n- **Pagination Support**: Automatically handles multi-page discussion lists\r\n- **Batch Processing**: Extract all discussions from a competition at once\r\n- **Rate Limiting**: Built-in delays to respect server resources\r\n- **Error Recovery**: Robust error handling with detailed logging\r\n- **Multiple Output Formats**: Clean Markdown export with proper formatting\r\n\r\n## \ud83d\udce6 Installation\r\n\r\n### Method 1: Install from PyPI (Recommended)\r\n\r\n```bash\r\npip install kaggle-discussion-extractor\r\nplaywright install chromium\r\n```\r\n\r\n### Method 2: Install from Source\r\n\r\n```bash\r\n# Clone the repository\r\ngit clone https://github.com/Letemoin/kaggle-discussion-extractor.git\r\ncd kaggle-discussion-extractor\r\n\r\n# Install in development mode\r\npip install -e .\r\nplaywright install chromium\r\n```\r\n\r\n## \ud83c\udfaf Quick Start\r\n\r\n### Command Line Usage\r\n\r\n```bash\r\n# Extract all discussions from a competition\r\nkaggle-discussion-extractor https://www.kaggle.com/competitions/neurips-2025\r\n\r\n# Extract only 10 discussions\r\nkaggle-discussion-extractor https://www.kaggle.com/competitions/neurips-2025 --limit 10\r\n\r\n# Enable development mode for detailed logging\r\nkaggle-discussion-extractor https://www.kaggle.com/competitions/neurips-2025 --dev-mode\r\n\r\n# Run with visible browser (useful for debugging)\r\nkaggle-discussion-extractor https://www.kaggle.com/competitions/neurips-2025 --no-headless\r\n\r\n# Extract top 10 writeups from private leaderboard\r\nkaggle-discussion-extractor https://www.kaggle.com/competitions/cmi-detect-behavior --extract-writeups --limit 10\r\n\r\n# Extract from public leaderboard with development mode\r\nkaggle-discussion-extractor https://www.kaggle.com/competitions/cmi-detect-behavior --extract-writeups --leaderboard-tab public --dev-mode\r\n```\r\n\r\n### Python API Usage\r\n\r\n#### Extract Discussions\r\n```python\r\nimport asyncio\r\nfrom kaggle_discussion_extractor import KaggleDiscussionExtractor\r\n\r\nasync def extract_discussions():\r\n    # Initialize extractor\r\n    extractor = KaggleDiscussionExtractor(dev_mode=True)\r\n    \r\n    # Extract discussions\r\n    success = await extractor.extract_competition_discussions(\r\n        competition_url=\"https://www.kaggle.com/competitions/neurips-2025\",\r\n        limit=5  # Optional: limit number of discussions\r\n    )\r\n    \r\n    if success:\r\n        print(\"Extraction completed successfully!\")\r\n    else:\r\n        print(\"Extraction failed!\")\r\n\r\n# Run the extraction\r\nasyncio.run(extract_discussions())\r\n```\r\n\r\n#### Extract Writeups\r\n```python\r\nimport asyncio\r\nfrom kaggle_discussion_extractor import KaggleWriteupExtractor\r\n\r\nasync def extract_writeups():\r\n    # Initialize writeup extractor\r\n    extractor = KaggleWriteupExtractor(dev_mode=True)\r\n    \r\n    # Extract top 5 writeups from private leaderboard\r\n    success = await extractor.extract_writeups(\r\n        competition_url=\"https://www.kaggle.com/competitions/cmi-detect-behavior\",\r\n        limit=5,\r\n        leaderboard_tab=\"private\"\r\n    )\r\n    \r\n    if success:\r\n        print(\"Writeup extraction completed successfully!\")\r\n    else:\r\n        print(\"Writeup extraction failed!\")\r\n\r\n# Run the extraction\r\nasyncio.run(extract_writeups())\r\n```\r\n\r\n## \ud83d\udccb CLI Options\r\n\r\n| Option | Description | Default |\r\n|--------|-------------|---------|\r\n| `competition_url` | URL of the Kaggle competition (required) | - |\r\n| `--limit, -l` | Number of discussions/writeups to extract | All |\r\n| `--dev-mode, -d` | Enable detailed logging | False |\r\n| `--no-headless` | Run browser in visible mode | False (headless) |\r\n| `--date-format` | Include YYMMDD date in filename | False |\r\n| `--date-position` | Position of date (prefix/suffix) | suffix |\r\n| `--extract-writeups` | Extract writeups from leaderboard | False |\r\n| `--leaderboard-tab` | Leaderboard tab (private/public) | private |\r\n| `--version, -v` | Show version information | - |\r\n\r\n## \ud83d\udcc1 Output Structure\r\n\r\n### Writeup Extraction\r\nThe writeup extractor creates a `writeups_extracted` directory with:\r\n\r\n```\r\nwriteups_extracted/\r\n\u251c\u2500\u2500 contest-name_01_TeamName.md\r\n\u251c\u2500\u2500 contest-name_02_AnotherTeam.md \r\n\u251c\u2500\u2500 contest-name_03_ThirdPlace.md\r\n\u2514\u2500\u2500 ...\r\n```\r\n\r\n### Discussion Extraction\r\nThe discussion extractor creates a `kaggle_discussions_extracted` directory with:\r\n\r\n```\r\nkaggle_discussions_extracted/\r\n\u251c\u2500\u2500 01_Discussion_Title.md\r\n\u251c\u2500\u2500 02_Another_Discussion.md\r\n\u251c\u2500\u2500 03_Third_Discussion.md\r\n\u2514\u2500\u2500 ...\r\n```\r\n\r\n### Sample Output Format\r\n\r\n```markdown\r\n# Discussion Title\r\n\r\n**URL**: https://www.kaggle.com/competitions/neurips-2025/discussion/123456\r\n**Total Comments**: 15\r\n**Extracted**: 2025-01-15T10:30:00\r\n\r\n---\r\n\r\n## Main Post\r\n\r\n**Author**: username (@username)\r\n**Rank**: 27th in this Competition\r\n**Badges**: Competition Host\r\n**Upvotes**: 36\r\n\r\nMain discussion content goes here...\r\n\r\n---\r\n\r\n## Replies\r\n\r\n### Reply 1\r\n\r\n- **Author**: user1 (@user1)\r\n- **Rank**: 154th in this Competition\r\n- **Upvotes**: 11\r\n- **Timestamp**: Tue Jun 17 2025 11:54:57 GMT+0300\r\n\r\nContent of reply 1...\r\n\r\n  #### Reply 1.1\r\n\r\n  - **Author**: user2 (@user2)\r\n  - **Upvotes**: 6\r\n  - **Timestamp**: Sun Jun 29 2025 04:20:43 GMT+0300\r\n\r\n  Nested reply content...\r\n\r\n  #### Reply 1.2\r\n\r\n  - **Author**: user3 (@user3)\r\n  - **Upvotes**: 2\r\n  - **Timestamp**: Wed Jul 16 2025 12:50:34 GMT+0300\r\n\r\n  Another nested reply...\r\n\r\n---\r\n\r\n### Reply 2\r\n\r\n- **Author**: user4 (@user4)\r\n- **Upvotes**: -3\r\n\r\nContent of reply 2...\r\n\r\n---\r\n```\r\n\r\n## \u2699\ufe0f Configuration\r\n\r\n### Development Mode\r\n\r\nEnable development mode to see detailed logs and debugging information:\r\n\r\n```python\r\nextractor = KaggleDiscussionExtractor(dev_mode=True)\r\n```\r\n\r\n**What dev_mode does:**\r\n- Enables DEBUG level logging\r\n- Shows detailed progress information\r\n- Displays browser automation steps\r\n- Provides error stack traces\r\n- Logs DOM element detection details\r\n\r\n### Browser Mode\r\n\r\nRun with visible browser for debugging:\r\n\r\n```python\r\nextractor = KaggleDiscussionExtractor(headless=False)\r\n```\r\n\r\n## \ud83e\uddea Examples\r\n\r\n### Basic Example\r\n\r\n```python\r\nfrom kaggle_discussion_extractor import KaggleDiscussionExtractor\r\nimport asyncio\r\n\r\nasync def main():\r\n    extractor = KaggleDiscussionExtractor()\r\n    \r\n    await extractor.extract_competition_discussions(\r\n        \"https://www.kaggle.com/competitions/neurips-2025\"\r\n    )\r\n\r\nasyncio.run(main())\r\n```\r\n\r\n### Advanced Example with Logging\r\n\r\n```python\r\nimport asyncio\r\nimport logging\r\nfrom kaggle_discussion_extractor import KaggleDiscussionExtractor\r\n\r\n# Setup custom logging\r\nlogging.basicConfig(level=logging.INFO)\r\nlogger = logging.getLogger(__name__)\r\n\r\nasync def extract_with_monitoring():\r\n    extractor = KaggleDiscussionExtractor(\r\n        dev_mode=True,  # Enable detailed logging\r\n        headless=True   # Run in background\r\n    )\r\n    \r\n    logger.info(\"Starting extraction...\")\r\n    \r\n    success = await extractor.extract_competition_discussions(\r\n        competition_url=\"https://www.kaggle.com/competitions/neurips-2025\",\r\n        limit=20  # Extract first 20 discussions\r\n    )\r\n    \r\n    if success:\r\n        logger.info(\"\u2705 Extraction completed successfully!\")\r\n        logger.info(\"Check 'kaggle_discussions_extracted' directory for results\")\r\n    else:\r\n        logger.error(\"\u274c Extraction failed!\")\r\n\r\nif __name__ == \"__main__\":\r\n    asyncio.run(extract_with_monitoring())\r\n```\r\n\r\n## \ud83d\udd27 Development\r\n\r\n### Setup Development Environment\r\n\r\n```bash\r\n# Clone repository\r\ngit clone https://github.com/Letemoin/kaggle-discussion-extractor.git\r\ncd kaggle-discussion-extractor\r\n\r\n# Install development dependencies\r\npip install -e \".[dev]\"\r\nplaywright install chromium\r\n\r\n# Run tests\r\npytest tests/\r\n```\r\n\r\n### Project Structure\r\n\r\n```\r\nkaggle_discussion_extractor/\r\n\u251c\u2500\u2500 __init__.py          # Package initialization\r\n\u251c\u2500\u2500 core.py             # Main extraction logic\r\n\u2514\u2500\u2500 cli.py              # Command-line interface\r\n```\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\nContributions are welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) for details on how to submit pull requests, report issues, and contribute to the project.\r\n\r\n\r\n## \ud83d\ude4f Acknowledgments\r\n\r\n- Built with [Playwright](https://playwright.dev/) for reliable browser automation\r\n- Inspired by the need for better Kaggle competition analysis tools\r\n- Thanks to the open-source community for continuous support\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A professional-grade tool for extracting and analyzing discussions from Kaggle competitions",
    "version": "1.2.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/Letemoin/kaggle-discussion-extractor/issues",
        "Documentation": "https://github.com/Letemoin/kaggle-discussion-extractor#readme",
        "Homepage": "https://github.com/Letemoin/kaggle-discussion-extractor",
        "Repository": "https://github.com/Letemoin/kaggle-discussion-extractor"
    },
    "split_keywords": [
        "kaggle",
        " discussion",
        " extractor",
        " web-scraping",
        " data-extraction",
        " machine-learning",
        " competition",
        " playwright",
        " async"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "768c743b2b3a96d51a9898036b0ef3e1b2054c00c8c479519116af18832bfa48",
                "md5": "de8721356e45297edd25b1450881ac65",
                "sha256": "e25f0c4461354adf2ffeb72e8437a23c7b77eafea53a2aa869d465fa550d9ec1"
            },
            "downloads": -1,
            "filename": "kaggle_discussion_extractor-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "de8721356e45297edd25b1450881ac65",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 32591,
            "upload_time": "2025-09-08T20:03:13",
            "upload_time_iso_8601": "2025-09-08T20:03:13.240456Z",
            "url": "https://files.pythonhosted.org/packages/76/8c/743b2b3a96d51a9898036b0ef3e1b2054c00c8c479519116af18832bfa48/kaggle_discussion_extractor-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8758d96ad6962b407040ea48d2941902b2e7ceb44cecbad9ddfa567a3d236364",
                "md5": "cb1f61e61b8cc2ab8f034b79f9607c44",
                "sha256": "ccb95161f4b0051b2fd70a1b07f1b2385b657538940d3b38c5925c8ff57d8e3a"
            },
            "downloads": -1,
            "filename": "kaggle_discussion_extractor-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "cb1f61e61b8cc2ab8f034b79f9607c44",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 34941,
            "upload_time": "2025-09-08T20:03:14",
            "upload_time_iso_8601": "2025-09-08T20:03:14.817072Z",
            "url": "https://files.pythonhosted.org/packages/87/58/d96ad6962b407040ea48d2941902b2e7ceb44cecbad9ddfa567a3d236364/kaggle_discussion_extractor-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-08 20:03:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Letemoin",
    "github_project": "kaggle-discussion-extractor",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "playwright",
            "specs": [
                [
                    ">=",
                    "1.40.0"
                ]
            ]
        },
        {
            "name": "asyncio-extras",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    ">=",
                    "4.12.0"
                ]
            ]
        },
        {
            "name": "nbformat",
            "specs": [
                [
                    ">=",
                    "5.9.0"
                ]
            ]
        },
        {
            "name": "nbconvert",
            "specs": [
                [
                    ">=",
                    "7.8.0"
                ]
            ]
        }
    ],
    "lcname": "kaggle-discussion-extractor"
}
        
Elapsed time: 1.86660s