post-archiver-improved


Namepost-archiver-improved JSON
Version 0.3.0 PyPI version JSON
download
home_pageNone
SummaryA Python package for archiving YouTube community posts with zero dependencies
upload_time2025-08-09 12:01:17
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords youtube community posts archiver scraper data-extraction social-media
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Post Archiver Improved

[![Python](https://img.shields.io/badge/python-3.8+-blue.svg)](https://python.org)
[![License](https://img.shields.io/badge/license-GPL%20v3-blue.svg)](LICENSE)
[![PyPI](https://img.shields.io/pypi/v/post-archiver-improved.svg)](https://pypi.org/project/post-archiver-improved/)
[![Downloads](https://img.shields.io/pypi/dm/post-archiver-improved.svg)](https://pypi.org/project/post-archiver-improved/)

A professional-grade Python package for archiving YouTube community posts with comprehensive data extraction capabilities. Built with zero external dependencies for maximum compatibility and reliability.

**Post Archiver Improved** is a complete rewrite of the original [post-archiver](https://github.com/sadadYes/post-archiver) project, featuring better architecture, robust error handling, and extensive testing coverage.

## Key Features

- **Comprehensive Data Extraction** - Complete archival of YouTube community posts with metadata preservation
- **Advanced Comment Processing** - Full comment trees with reply chains and author information
- **High-Quality Image Archiving** - Original resolution image downloads with metadata
- **Zero External Dependencies** - Built entirely on Python standard library for maximum compatibility
- **Performance Optimized** - Intelligent rate limiting and concurrent processing capabilities
- **Comprehensive Logging** - Configurable logging levels with structured output and file rotation
- **Flexible Configuration** - Multi-source configuration management (CLI, files, environment variables)
- **Progress Monitoring** - Real-time progress tracking with detailed statistics and ETA
- **Comprehensive Reporting** - Detailed summary reports with archival statistics and health metrics
- **Data Integrity** - Automatic backup creation and data validation to prevent corruption
- **Robust Error Handling** - Graceful failure recovery with detailed error reporting
- **Extensible Architecture** - Modular design supporting custom extractors and output formats

## Installation

### From PyPI (Recommended)
```bash
pip install post-archiver-improved
```

### From Source (Development)
```bash
git clone https://github.com/sadadYes/post-archiver-improved.git
cd post-archiver-improved
pip install -e .
```

### Development Installation
```bash
git clone https://github.com/sadadYes/post-archiver-improved.git
cd post-archiver-improved
pip install -e ".[dev]"
```

## Usage

### Basic Usage

Archive all posts from a channel:
```bash
post-archiver UC5CwaMl1eIgY8h02uZw7u8A
```

Archive with comments:
```bash
post-archiver UC5CwaMl1eIgY8h02uZw7u8A --comments
```

Archive with images:
```bash
post-archiver UC5CwaMl1eIgY8h02uZw7u8A --download-images
```

Archive a single post by post ID:
```bash
post-archiver UgkxMVl0vgxzNvE3I52s0oKlEHO3KyfocebU --comments --download-images
```

Archive a single post by URL:
```bash
post-archiver "https://www.youtube.com/post/UgkxMVl0vgxzNvE3I52s0oKlEHO3KyfocebU" --comments --download-images
```

### Advanced Usage

Full archival with all features:
```bash
post-archiver UC5CwaMl1eIgY8h02uZw7u8A \
  --comments \
  --download-images \
  --max-comments 500 \
  --max-replies 100 \
  --output ./archive \
  --verbose
```

Archive members-only content with cookies:
```bash
post-archiver UC5CwaMl1eIgY8h02uZw7u8A \
  --comments \
  --download-images \
  --cookies ./cookies.txt \
  --output ./archive \
  --verbose
```

With custom configuration:
```bash
post-archiver UC5CwaMl1eIgY8h02uZw7u8A \
  --config my_config.json \
  --log-file archive.log \
  --timeout 60 \
  --retries 5
```

### Channel ID Formats

The tool accepts various channel ID formats:

- **Channel ID**: `UC5CwaMl1eIgY8h02uZw7u8A`
- **Handle**: `@username`
- **Channel URL**: `https://youtube.com/channel/UC5CwaMl1eIgY8h02uZw7u8A`
- **Custom URL**: `https://youtube.com/c/channelname`
- **Handle URL**: `https://youtube.com/@username`

### Individual Post Formats

You can also archive individual posts by providing:

- **Post ID**: `UgkxMVl0vgxzNvE3I52s0oKlEHO3KyfocebU`
- **Post URL**: `https://www.youtube.com/post/UgkxMVl0vgxzNvE3I52s0oKlEHO3KyfocebU`

When archiving individual posts, the tool automatically extracts the channel information and creates an archive containing just that specific post.

### Accessing Members-Only Content

To access members-only posts, you'll need to provide authentication cookies from a logged-in YouTube session:

1. **Export Cookies**: Use a browser extension or tool to export cookies in Netscape format
   - Recommended: [Get-cookies.txt-LOCALLY](https://github.com/kairi003/Get-cookies.txt-Locally) extension for Chrome/Firefox
   - Export cookies for `youtube.com` domains

2. **Use Cookie File**: Pass the cookie file to the archiver
   ```bash
   post-archiver UC5CwaMl1eIgY8h02uZw7u8A --cookies ./cookies.txt
   ```

3. **Cookie File Format**: The tool expects Netscape HTTP Cookie File format:
   ```
   # Netscape HTTP Cookie File
   .youtube.com	TRUE	/	FALSE	1735689600	SIDCC	cookie_value
   .google.com	TRUE	/	TRUE	1735689600	__Secure-1PSIDCC	secure_value
   ```

**Security Note**: Cookie files contain sensitive authentication data. Keep them secure and never share them publicly.

**Important**: Cookies must be from a YouTube account that has membership access to the target channel.

## Configuration

### Command Line Options

#### Scraping Options
- `-n, --num-posts N` - Maximum number of posts to scrape
- `-c, --comments` - Extract comments for each post
- `--max-comments N` - Maximum comments per post (default: 100)
- `--max-replies N` - Maximum replies per comment (default: 200)
- `-i, --download-images` - Download images to local directory
- `--cookies FILE` - Path to Netscape format cookie file for accessing members-only posts

#### Output Options
- `-o, --output DIR` - Output directory
- `--no-summary` - Skip summary report creation
- `--compact` - Save JSON without pretty printing

#### Network Options
- `--timeout SECONDS` - Request timeout (default: 30)
- `--retries N` - Maximum retry attempts (default: 3)
- `--delay SECONDS` - Delay between requests (default: 1.0)

#### Logging Options
- `-v, --verbose` - Enable verbose output (INFO level)
- `--debug` - Enable debug output (DEBUG level)
- `--log-file FILE` - Log to file in addition to console
- `--quiet` - Suppress all output except errors

### Configuration Files

Create a configuration file for repeated use:

```json
{
  "scraping": {
    "max_posts": 100,
    "extract_comments": true,
    "max_comments_per_post": 200,
    "max_replies_per_comment": 50,
    "download_images": true,
    "request_timeout": 30,
    "max_retries": 3,
    "retry_delay": 1.0
  },
  "output": {
    "output_dir": "./archives",
    "pretty_print": true,
    "include_metadata": true
  },
  "log_file": "./logs/archiver.log"
}
```

Save current settings:
```bash
post-archiver UC5CwaMl1eIgY8h02uZw7u8A --save-config my_config.json
```

Use saved configuration:
```bash
post-archiver UC5CwaMl1eIgY8h02uZw7u8A --config my_config.json
```

## Output Format

### Archive File Structure

The tool creates a JSON file with the following structure:

```json
{
  "channel_id": "UC5CwaMl1eIgY8h02uZw7u8A",
  "scrape_date": "2025-01-15T10:30:00",
  "scrape_timestamp": 1737888600,
  "posts_count": 25,
  "total_comments": 150,
  "total_images": 10,
  "images_downloaded": 10,
  "config_used": {...},
  "posts": [
    {
      "post_id": "UgxKp7...",
      "content": "Post content here...",
      "timestamp": "2 days ago",
      "timestamp_estimated": true,
      "likes": "42",
      "comments_count": "15",
      "members_only": false,
      "author": "Channel Name",
      "author_id": "UC5CwaMl1eIgY8h02uZw7u8A",
      "author_url": "https://youtube.com/channel/...",
      "author_thumbnail": "https://...",
      "author_is_verified": true,
      "author_is_member": false,
      "images": [
        {
          "src": "https://...",
          "local_path": "./images/post_123.jpg",
          "width": 1920,
          "height": 1080,
          "file_size": 245760
        }
      ],
      "links": [
        {
          "text": "Link text",
          "url": "https://..."
        }
      ],
      "comments": [
        {
          "id": "UgwKp7...",
          "text": "Comment text...",
          "like_count": "5",
          "timestamp": "1 day ago",
          "timestamp_estimated": true,
          "author_id": "UC...",
          "author": "Commenter Name",
          "author_thumbnail": "https://...",
          "author_is_verified": false,
          "author_is_member": true,
          "author_url": "https://...",
          "is_favorited": false,
          "is_pinned": false,
          "reply_count": "2",
          "replies": [...]
        }
      ]
    }
  ]
}
```

### Files Created

- `posts_[CHANNEL_ID]_[TIMESTAMP].json` - Main archive file
- `summary_[CHANNEL_ID]_[TIMESTAMP].txt` - Summary report
- `images/` - Downloaded images (if enabled)
- `[LOG_FILE]` - Log file (if specified)

## Development

### Project Structure

```
src/post_archiver_improved/
├── __init__.py              # Package initialization
├── api.py                   # YouTube API client
├── cli.py                   # Command-line interface
├── comment_processor.py     # Comment extraction logic
├── config.py                # Configuration management
├── exceptions.py            # Custom exception classes
├── extractors.py            # Data extraction utilities
├── logging_config.py        # Logging configuration
├── models.py                # Data models
├── output.py                # Output handling
├── scraper.py               # Main scraper logic
└── utils.py                 # Utility functions
```

### Key Features

#### Modular Architecture
- **Separation of concerns** with dedicated modules
- **Clean interfaces** between components
- **Easy to extend** and maintain

#### Robust Error Handling
- **Custom exception hierarchy** for different error types
- **Graceful degradation** when non-critical operations fail
- **Retry logic** with exponential backoff

#### Comprehensive Logging
- **Configurable verbosity levels** (ERROR, WARNING, INFO, DEBUG)
- **Colored console output** for better readability
- **File logging** with detailed tracebacks
- **Progress tracking** with detailed statistics

#### Configuration Management
- **Multiple configuration sources** (CLI args, config files, defaults)
- **Environment-specific settings** support
- **Configuration validation** and error reporting

### Running Tests

```bash
# Install development dependencies
pip install -e ".[dev]"

# Run tests
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=post_archiver_improved --cov-report=html
```

## Troubleshooting

### Common Issues

#### "No community tab found"
- The channel might not have community posts enabled
- Try using the channel's full URL instead of just the ID
- Some channels restrict community tab access

#### "Rate limiting detected"
- YouTube may be limiting requests
- Increase the `--delay` parameter
- Try again later

#### "Network timeout"
- Check your internet connection
- Increase the `--timeout` parameter
- Use `--retries` to attempt multiple times

#### "Permission denied" for file operations
- Check write permissions in the output directory
- Make sure the output directory exists
- Try running with appropriate permissions

### Debug Mode

Enable debug mode for detailed troubleshooting:

```bash
post-archiver UC5CwaMl1eIgY8h02uZw7u8A --debug --log-file debug.log
```

This will provide detailed information about:
- API requests and responses
- Data extraction processes
- File operations
- Error stack traces

## Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.

### Development Setup

1. Fork the repository
2. Clone your fork
3. Create a virtual environment
4. Install in development mode: `pip install -e ".[dev]"`
5. Make your changes
6. Run tests: `python -m pytest`
7. Submit a pull request

### Coding Standards

- Follow PEP 8 style guidelines
- Add type hints to all functions
- Write comprehensive docstrings
- Include tests for new functionality
- Update documentation as needed

## TODO

## License

This project is licensed under the GNU General Public License v3.0 - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

This project is heavily inspired by the [yt-dlp community plugin](https://github.com/biggestsonicfan/yt-dlp-community-plugin) by [biggestsonicfan](https://github.com/biggestsonicfan).

## Support

If you encounter any issues or have questions:

1. Check the [troubleshooting section](#-troubleshooting)
2. Search [existing issues](https://github.com/sadadYes/post-archiver-improved/issues)
3. Create a [new issue](https://github.com/sadadYes/post-archiver-improved/issues/new) with:
   - Your command line arguments
   - Error messages or logs
   - System information (OS, Python version)
   - Expected vs actual behavior

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "post-archiver-improved",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "sadadYes <syaddadpunya@gmail.com>",
    "keywords": "youtube, community, posts, archiver, scraper, data-extraction, social-media",
    "author": null,
    "author_email": "sadadYes <syaddadpunya@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/bb/86/3454819be99390da33eb2ed5f7d16c55f940f1e5f9279887daa8e255bd70/post_archiver_improved-0.3.0.tar.gz",
    "platform": null,
    "description": "# Post Archiver Improved\n\n[![Python](https://img.shields.io/badge/python-3.8+-blue.svg)](https://python.org)\n[![License](https://img.shields.io/badge/license-GPL%20v3-blue.svg)](LICENSE)\n[![PyPI](https://img.shields.io/pypi/v/post-archiver-improved.svg)](https://pypi.org/project/post-archiver-improved/)\n[![Downloads](https://img.shields.io/pypi/dm/post-archiver-improved.svg)](https://pypi.org/project/post-archiver-improved/)\n\nA professional-grade Python package for archiving YouTube community posts with comprehensive data extraction capabilities. Built with zero external dependencies for maximum compatibility and reliability.\n\n**Post Archiver Improved** is a complete rewrite of the original [post-archiver](https://github.com/sadadYes/post-archiver) project, featuring better architecture, robust error handling, and extensive testing coverage.\n\n## Key Features\n\n- **Comprehensive Data Extraction** - Complete archival of YouTube community posts with metadata preservation\n- **Advanced Comment Processing** - Full comment trees with reply chains and author information\n- **High-Quality Image Archiving** - Original resolution image downloads with metadata\n- **Zero External Dependencies** - Built entirely on Python standard library for maximum compatibility\n- **Performance Optimized** - Intelligent rate limiting and concurrent processing capabilities\n- **Comprehensive Logging** - Configurable logging levels with structured output and file rotation\n- **Flexible Configuration** - Multi-source configuration management (CLI, files, environment variables)\n- **Progress Monitoring** - Real-time progress tracking with detailed statistics and ETA\n- **Comprehensive Reporting** - Detailed summary reports with archival statistics and health metrics\n- **Data Integrity** - Automatic backup creation and data validation to prevent corruption\n- **Robust Error Handling** - Graceful failure recovery with detailed error reporting\n- **Extensible Architecture** - Modular design supporting custom extractors and output formats\n\n## Installation\n\n### From PyPI (Recommended)\n```bash\npip install post-archiver-improved\n```\n\n### From Source (Development)\n```bash\ngit clone https://github.com/sadadYes/post-archiver-improved.git\ncd post-archiver-improved\npip install -e .\n```\n\n### Development Installation\n```bash\ngit clone https://github.com/sadadYes/post-archiver-improved.git\ncd post-archiver-improved\npip install -e \".[dev]\"\n```\n\n## Usage\n\n### Basic Usage\n\nArchive all posts from a channel:\n```bash\npost-archiver UC5CwaMl1eIgY8h02uZw7u8A\n```\n\nArchive with comments:\n```bash\npost-archiver UC5CwaMl1eIgY8h02uZw7u8A --comments\n```\n\nArchive with images:\n```bash\npost-archiver UC5CwaMl1eIgY8h02uZw7u8A --download-images\n```\n\nArchive a single post by post ID:\n```bash\npost-archiver UgkxMVl0vgxzNvE3I52s0oKlEHO3KyfocebU --comments --download-images\n```\n\nArchive a single post by URL:\n```bash\npost-archiver \"https://www.youtube.com/post/UgkxMVl0vgxzNvE3I52s0oKlEHO3KyfocebU\" --comments --download-images\n```\n\n### Advanced Usage\n\nFull archival with all features:\n```bash\npost-archiver UC5CwaMl1eIgY8h02uZw7u8A \\\n  --comments \\\n  --download-images \\\n  --max-comments 500 \\\n  --max-replies 100 \\\n  --output ./archive \\\n  --verbose\n```\n\nArchive members-only content with cookies:\n```bash\npost-archiver UC5CwaMl1eIgY8h02uZw7u8A \\\n  --comments \\\n  --download-images \\\n  --cookies ./cookies.txt \\\n  --output ./archive \\\n  --verbose\n```\n\nWith custom configuration:\n```bash\npost-archiver UC5CwaMl1eIgY8h02uZw7u8A \\\n  --config my_config.json \\\n  --log-file archive.log \\\n  --timeout 60 \\\n  --retries 5\n```\n\n### Channel ID Formats\n\nThe tool accepts various channel ID formats:\n\n- **Channel ID**: `UC5CwaMl1eIgY8h02uZw7u8A`\n- **Handle**: `@username`\n- **Channel URL**: `https://youtube.com/channel/UC5CwaMl1eIgY8h02uZw7u8A`\n- **Custom URL**: `https://youtube.com/c/channelname`\n- **Handle URL**: `https://youtube.com/@username`\n\n### Individual Post Formats\n\nYou can also archive individual posts by providing:\n\n- **Post ID**: `UgkxMVl0vgxzNvE3I52s0oKlEHO3KyfocebU`\n- **Post URL**: `https://www.youtube.com/post/UgkxMVl0vgxzNvE3I52s0oKlEHO3KyfocebU`\n\nWhen archiving individual posts, the tool automatically extracts the channel information and creates an archive containing just that specific post.\n\n### Accessing Members-Only Content\n\nTo access members-only posts, you'll need to provide authentication cookies from a logged-in YouTube session:\n\n1. **Export Cookies**: Use a browser extension or tool to export cookies in Netscape format\n   - Recommended: [Get-cookies.txt-LOCALLY](https://github.com/kairi003/Get-cookies.txt-Locally) extension for Chrome/Firefox\n   - Export cookies for `youtube.com` domains\n\n2. **Use Cookie File**: Pass the cookie file to the archiver\n   ```bash\n   post-archiver UC5CwaMl1eIgY8h02uZw7u8A --cookies ./cookies.txt\n   ```\n\n3. **Cookie File Format**: The tool expects Netscape HTTP Cookie File format:\n   ```\n   # Netscape HTTP Cookie File\n   .youtube.com\tTRUE\t/\tFALSE\t1735689600\tSIDCC\tcookie_value\n   .google.com\tTRUE\t/\tTRUE\t1735689600\t__Secure-1PSIDCC\tsecure_value\n   ```\n\n**Security Note**: Cookie files contain sensitive authentication data. Keep them secure and never share them publicly.\n\n**Important**: Cookies must be from a YouTube account that has membership access to the target channel.\n\n## Configuration\n\n### Command Line Options\n\n#### Scraping Options\n- `-n, --num-posts N` - Maximum number of posts to scrape\n- `-c, --comments` - Extract comments for each post\n- `--max-comments N` - Maximum comments per post (default: 100)\n- `--max-replies N` - Maximum replies per comment (default: 200)\n- `-i, --download-images` - Download images to local directory\n- `--cookies FILE` - Path to Netscape format cookie file for accessing members-only posts\n\n#### Output Options\n- `-o, --output DIR` - Output directory\n- `--no-summary` - Skip summary report creation\n- `--compact` - Save JSON without pretty printing\n\n#### Network Options\n- `--timeout SECONDS` - Request timeout (default: 30)\n- `--retries N` - Maximum retry attempts (default: 3)\n- `--delay SECONDS` - Delay between requests (default: 1.0)\n\n#### Logging Options\n- `-v, --verbose` - Enable verbose output (INFO level)\n- `--debug` - Enable debug output (DEBUG level)\n- `--log-file FILE` - Log to file in addition to console\n- `--quiet` - Suppress all output except errors\n\n### Configuration Files\n\nCreate a configuration file for repeated use:\n\n```json\n{\n  \"scraping\": {\n    \"max_posts\": 100,\n    \"extract_comments\": true,\n    \"max_comments_per_post\": 200,\n    \"max_replies_per_comment\": 50,\n    \"download_images\": true,\n    \"request_timeout\": 30,\n    \"max_retries\": 3,\n    \"retry_delay\": 1.0\n  },\n  \"output\": {\n    \"output_dir\": \"./archives\",\n    \"pretty_print\": true,\n    \"include_metadata\": true\n  },\n  \"log_file\": \"./logs/archiver.log\"\n}\n```\n\nSave current settings:\n```bash\npost-archiver UC5CwaMl1eIgY8h02uZw7u8A --save-config my_config.json\n```\n\nUse saved configuration:\n```bash\npost-archiver UC5CwaMl1eIgY8h02uZw7u8A --config my_config.json\n```\n\n## Output Format\n\n### Archive File Structure\n\nThe tool creates a JSON file with the following structure:\n\n```json\n{\n  \"channel_id\": \"UC5CwaMl1eIgY8h02uZw7u8A\",\n  \"scrape_date\": \"2025-01-15T10:30:00\",\n  \"scrape_timestamp\": 1737888600,\n  \"posts_count\": 25,\n  \"total_comments\": 150,\n  \"total_images\": 10,\n  \"images_downloaded\": 10,\n  \"config_used\": {...},\n  \"posts\": [\n    {\n      \"post_id\": \"UgxKp7...\",\n      \"content\": \"Post content here...\",\n      \"timestamp\": \"2 days ago\",\n      \"timestamp_estimated\": true,\n      \"likes\": \"42\",\n      \"comments_count\": \"15\",\n      \"members_only\": false,\n      \"author\": \"Channel Name\",\n      \"author_id\": \"UC5CwaMl1eIgY8h02uZw7u8A\",\n      \"author_url\": \"https://youtube.com/channel/...\",\n      \"author_thumbnail\": \"https://...\",\n      \"author_is_verified\": true,\n      \"author_is_member\": false,\n      \"images\": [\n        {\n          \"src\": \"https://...\",\n          \"local_path\": \"./images/post_123.jpg\",\n          \"width\": 1920,\n          \"height\": 1080,\n          \"file_size\": 245760\n        }\n      ],\n      \"links\": [\n        {\n          \"text\": \"Link text\",\n          \"url\": \"https://...\"\n        }\n      ],\n      \"comments\": [\n        {\n          \"id\": \"UgwKp7...\",\n          \"text\": \"Comment text...\",\n          \"like_count\": \"5\",\n          \"timestamp\": \"1 day ago\",\n          \"timestamp_estimated\": true,\n          \"author_id\": \"UC...\",\n          \"author\": \"Commenter Name\",\n          \"author_thumbnail\": \"https://...\",\n          \"author_is_verified\": false,\n          \"author_is_member\": true,\n          \"author_url\": \"https://...\",\n          \"is_favorited\": false,\n          \"is_pinned\": false,\n          \"reply_count\": \"2\",\n          \"replies\": [...]\n        }\n      ]\n    }\n  ]\n}\n```\n\n### Files Created\n\n- `posts_[CHANNEL_ID]_[TIMESTAMP].json` - Main archive file\n- `summary_[CHANNEL_ID]_[TIMESTAMP].txt` - Summary report\n- `images/` - Downloaded images (if enabled)\n- `[LOG_FILE]` - Log file (if specified)\n\n## Development\n\n### Project Structure\n\n```\nsrc/post_archiver_improved/\n\u251c\u2500\u2500 __init__.py              # Package initialization\n\u251c\u2500\u2500 api.py                   # YouTube API client\n\u251c\u2500\u2500 cli.py                   # Command-line interface\n\u251c\u2500\u2500 comment_processor.py     # Comment extraction logic\n\u251c\u2500\u2500 config.py                # Configuration management\n\u251c\u2500\u2500 exceptions.py            # Custom exception classes\n\u251c\u2500\u2500 extractors.py            # Data extraction utilities\n\u251c\u2500\u2500 logging_config.py        # Logging configuration\n\u251c\u2500\u2500 models.py                # Data models\n\u251c\u2500\u2500 output.py                # Output handling\n\u251c\u2500\u2500 scraper.py               # Main scraper logic\n\u2514\u2500\u2500 utils.py                 # Utility functions\n```\n\n### Key Features\n\n#### Modular Architecture\n- **Separation of concerns** with dedicated modules\n- **Clean interfaces** between components\n- **Easy to extend** and maintain\n\n#### Robust Error Handling\n- **Custom exception hierarchy** for different error types\n- **Graceful degradation** when non-critical operations fail\n- **Retry logic** with exponential backoff\n\n#### Comprehensive Logging\n- **Configurable verbosity levels** (ERROR, WARNING, INFO, DEBUG)\n- **Colored console output** for better readability\n- **File logging** with detailed tracebacks\n- **Progress tracking** with detailed statistics\n\n#### Configuration Management\n- **Multiple configuration sources** (CLI args, config files, defaults)\n- **Environment-specific settings** support\n- **Configuration validation** and error reporting\n\n### Running Tests\n\n```bash\n# Install development dependencies\npip install -e \".[dev]\"\n\n# Run tests\npython -m pytest tests/ -v\n\n# Run with coverage\npython -m pytest tests/ --cov=post_archiver_improved --cov-report=html\n```\n\n## Troubleshooting\n\n### Common Issues\n\n#### \"No community tab found\"\n- The channel might not have community posts enabled\n- Try using the channel's full URL instead of just the ID\n- Some channels restrict community tab access\n\n#### \"Rate limiting detected\"\n- YouTube may be limiting requests\n- Increase the `--delay` parameter\n- Try again later\n\n#### \"Network timeout\"\n- Check your internet connection\n- Increase the `--timeout` parameter\n- Use `--retries` to attempt multiple times\n\n#### \"Permission denied\" for file operations\n- Check write permissions in the output directory\n- Make sure the output directory exists\n- Try running with appropriate permissions\n\n### Debug Mode\n\nEnable debug mode for detailed troubleshooting:\n\n```bash\npost-archiver UC5CwaMl1eIgY8h02uZw7u8A --debug --log-file debug.log\n```\n\nThis will provide detailed information about:\n- API requests and responses\n- Data extraction processes\n- File operations\n- Error stack traces\n\n## Contributing\n\nContributions are welcome! Please feel free to submit issues, feature requests, or pull requests.\n\n### Development Setup\n\n1. Fork the repository\n2. Clone your fork\n3. Create a virtual environment\n4. Install in development mode: `pip install -e \".[dev]\"`\n5. Make your changes\n6. Run tests: `python -m pytest`\n7. Submit a pull request\n\n### Coding Standards\n\n- Follow PEP 8 style guidelines\n- Add type hints to all functions\n- Write comprehensive docstrings\n- Include tests for new functionality\n- Update documentation as needed\n\n## TODO\n\n## License\n\nThis project is licensed under the GNU General Public License v3.0 - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\nThis project is heavily inspired by the [yt-dlp community plugin](https://github.com/biggestsonicfan/yt-dlp-community-plugin) by [biggestsonicfan](https://github.com/biggestsonicfan).\n\n## Support\n\nIf you encounter any issues or have questions:\n\n1. Check the [troubleshooting section](#-troubleshooting)\n2. Search [existing issues](https://github.com/sadadYes/post-archiver-improved/issues)\n3. Create a [new issue](https://github.com/sadadYes/post-archiver-improved/issues/new) with:\n   - Your command line arguments\n   - Error messages or logs\n   - System information (OS, Python version)\n   - Expected vs actual behavior\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python package for archiving YouTube community posts with zero dependencies",
    "version": "0.3.0",
    "project_urls": {
        "Homepage": "https://github.com/sadadYes/post-archiver-improved",
        "Issues": "https://github.com/sadadYes/post-archiver-improved/issues",
        "Repository": "https://github.com/sadadYes/post-archiver-improved"
    },
    "split_keywords": [
        "youtube",
        " community",
        " posts",
        " archiver",
        " scraper",
        " data-extraction",
        " social-media"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "772ed8c2fc50b0c0490d504202bdbd192f826aa4d31c49b985f38016276f6dba",
                "md5": "d1752350472ffc9ae86ae93f6b3cf94a",
                "sha256": "e5e667aeda27da31454b57560ceba6c116ae69c788fb0c54419df8d83567b7cf"
            },
            "downloads": -1,
            "filename": "post_archiver_improved-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d1752350472ffc9ae86ae93f6b3cf94a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 61989,
            "upload_time": "2025-08-09T12:01:16",
            "upload_time_iso_8601": "2025-08-09T12:01:16.922858Z",
            "url": "https://files.pythonhosted.org/packages/77/2e/d8c2fc50b0c0490d504202bdbd192f826aa4d31c49b985f38016276f6dba/post_archiver_improved-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bb863454819be99390da33eb2ed5f7d16c55f940f1e5f9279887daa8e255bd70",
                "md5": "8ce53f9196a2a00d3ff4d7921bd305da",
                "sha256": "a188e0408fb89c8e25aa9df59ab3ff2267ff29d3b9e0516686e2186b472331cd"
            },
            "downloads": -1,
            "filename": "post_archiver_improved-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8ce53f9196a2a00d3ff4d7921bd305da",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 96764,
            "upload_time": "2025-08-09T12:01:17",
            "upload_time_iso_8601": "2025-08-09T12:01:17.833364Z",
            "url": "https://files.pythonhosted.org/packages/bb/86/3454819be99390da33eb2ed5f7d16c55f940f1e5f9279887daa8e255bd70/post_archiver_improved-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-09 12:01:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sadadYes",
    "github_project": "post-archiver-improved",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "post-archiver-improved"
}
        
Elapsed time: 1.60409s