youtube-comments-scrapper

Name	youtube-comments-scrapper JSON
Version	1.0.1 JSON
	download
home_page	https://github.com/Abhi9868/Youtube-Comment-Scraper
Summary	A Python package to scrape YouTube comments using Selenium and BeautifulSoup
upload_time	2024-10-25 18:45:26
maintainer	None
docs_url	None
author	Abhishek Kumar
requires_python	>=3.8
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![YouTube Background](https://wallpapers.com/images/high/tilted-youtube-background-funqyvu7re0mtfil.webp)

# YouTube Comment Scraper

`YouTubeCommentScraper` is a Python package designed to scrape comments from YouTube videos using Selenium. The scraper is customizable, allowing you to run the browser in headless mode, control the timeout, pause time for scrolling, and more. You can also choose whether to log actions and return the page source along with the comments.

## Features

- **Headless Mode**: Run the browser in headless mode (optional).
- **Customizable Timeouts**: Set the timeout for waiting for elements to load.
- **Automatic Scrolling**: Automatically scrolls the page until all comments are loaded.
- **Logging Support**: Enable logging to a file for tracking activities.
- **Return Page Source**: Optionally return the page source along with the comments.
- **BeautifulSoup Integration**: Extract comments using BeautifulSoup for robust parsing.

## Installation

To install the package, use the following command:

```bash
pip install youtube-comments-scrapper
```

## Dependencies

This package requires the following dependencies:

- selenium
- webdriver-manager
- beautifulsoup4
- lxml (optional but recommended for faster HTML parsing)

You can install these dependencies using the following command (optional):

```bash
pip install selenium webdriver-manager beautifulsoup4 lxml
```

## Usage

### 1. Basic Usage: Scraping Comments

Here's a simple example to scrape comments from a YouTube video:

```python
from youtube_comments_scraper import YouTubeCommentScraper

scraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=False, return_page_source=False)
video_url = "https://www.youtube.com/watch?v=Ycg48pVp3SU"
comments = scraper.scrape_comments(video_url)

print("Comments:", comments)
```

### 2. Scraping Comments with Logging Enabled

Enable logging to track the actions performed during scraping:

```python
from youtube_comments_scraper import YouTubeCommentScraper

scraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=True, return_page_source=False)
video_url = "https://www.youtube.com/watch?v=Ycg48pVp3SU"
comments = scraper.scrape_comments(video_url)

print("Comments:", comments)
```

This will generate a log file (youtube_scraper.log) in the current directory.

### 3. Returning Page Source Along with Comments

If you want to extract comments and return the page's HTML source:

```python
from youtube_comments_scraper import YouTubeCommentScraper

scraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=False, return_page_source=True)
video_url = "https://www.youtube.com/watch?v=Ycg48pVp3SU"
comments, page_source = scraper.scrape_comments(video_url)

print("Comments:", comments)
print("Page Source:", page_source)
```

### 4. Custom Scroll Pause Time

You can control how long the scraper pauses between scroll actions using the `scroll_pause_time` parameter:

```python
from youtube_comments_scraper import YouTubeCommentScraper

scraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=2.0, enable_logging=False, return_page_source=False)
video_url = "https://www.youtube.com/watch?v=Ycg48pVp3SU"
comments = scraper.scrape_comments(video_url)

print("Comments:", comments)
```


### 5. Scraping Comments Without Scrolling

If you only want to scrape the comments that load without scrolling:

```python
from youtube_comments_scraper import YouTubeCommentScraper

scraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=False, return_page_source=False)
video_url = "https://www.youtube.com/watch?v=Ycg48pVp3SU"
comments = scraper.scrape_comments(video_url, scroll=False)

print("Comments:", comments)
```

### 6. Logging Custom Messages

You can log custom messages using the built-in log_info, log_warning, and log_error methods:

```python
scraper.log_info("This is an info log message.")
scraper.log_warning("This is a warning message.")
scraper.log_error("This is an error message.")
```

## Class Reference

### `YouTubeCommentScraper`

#### `__init__(self, headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=False, return_page_source=False)`

- `headless` (bool): Run the browser in headless mode. Default is `True`.
- `timeout` (int): The maximum time to wait for elements to load. Default is `10` seconds.
- `scroll_pause_time` (float): The pause time between scroll actions. Default is `1.5` seconds.
- `enable_logging` (bool): Whether to enable logging to a file. Default is `False`.
- `return_page_source` (bool): Whether to return the page source along with comments. Default is `False`.

#### `scrape_comments(self, video_url, scroll=True)`

- `video_url` (str): The URL of the YouTube video.
- `scroll` (bool): Whether to scroll the page to load all comments. Default is `True`.

**Returns:**

- A tuple `(comments, page_source)` if `return_page_source` is `True`, otherwise just the list of `comments`.

#### `log_info(self, message)`

- Logs an informational message.

#### `log_warning(self, message)`

- Logs a warning message.

#### `log_error(self, message)`

- Logs an error message.



## License

This project is licensed under the [MIT License](LICENSE).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Abhi9868/Youtube-Comment-Scraper",
    "name": "youtube-comments-scrapper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Abhishek Kumar",
    "author_email": "abhiop.dev@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/3e/3a/958b2d520cbed507ffd9a93dbe2487fb20c04ab0712759ce1265fdd1f2e4/youtube-comments-scrapper-1.0.1.tar.gz",
    "platform": null,
    "description": "![YouTube Background](https://wallpapers.com/images/high/tilted-youtube-background-funqyvu7re0mtfil.webp)\n\n# YouTube Comment Scraper\n\n`YouTubeCommentScraper` is a Python package designed to scrape comments from YouTube videos using Selenium. The scraper is customizable, allowing you to run the browser in headless mode, control the timeout, pause time for scrolling, and more. You can also choose whether to log actions and return the page source along with the comments.\n\n## Features\n\n- **Headless Mode**: Run the browser in headless mode (optional).\n- **Customizable Timeouts**: Set the timeout for waiting for elements to load.\n- **Automatic Scrolling**: Automatically scrolls the page until all comments are loaded.\n- **Logging Support**: Enable logging to a file for tracking activities.\n- **Return Page Source**: Optionally return the page source along with the comments.\n- **BeautifulSoup Integration**: Extract comments using BeautifulSoup for robust parsing.\n\n## Installation\n\nTo install the package, use the following command:\n\n```bash\npip install youtube-comments-scrapper\n```\n\n## Dependencies\n\nThis package requires the following dependencies:\n\n- selenium\n- webdriver-manager\n- beautifulsoup4\n- lxml (optional but recommended for faster HTML parsing)\n\nYou can install these dependencies using the following command (optional):\n\n```bash\npip install selenium webdriver-manager beautifulsoup4 lxml\n```\n\n## Usage\n\n### 1. Basic Usage: Scraping Comments\n\nHere's a simple example to scrape comments from a YouTube video:\n\n```python\nfrom youtube_comments_scraper import YouTubeCommentScraper\n\nscraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=False, return_page_source=False)\nvideo_url = \"https://www.youtube.com/watch?v=Ycg48pVp3SU\"\ncomments = scraper.scrape_comments(video_url)\n\nprint(\"Comments:\", comments)\n```\n\n### 2. Scraping Comments with Logging Enabled\n\nEnable logging to track the actions performed during scraping:\n\n```python\nfrom youtube_comments_scraper import YouTubeCommentScraper\n\nscraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=True, return_page_source=False)\nvideo_url = \"https://www.youtube.com/watch?v=Ycg48pVp3SU\"\ncomments = scraper.scrape_comments(video_url)\n\nprint(\"Comments:\", comments)\n```\n\nThis will generate a log file (youtube_scraper.log) in the current directory.\n\n### 3. Returning Page Source Along with Comments\n\nIf you want to extract comments and return the page's HTML source:\n\n```python\nfrom youtube_comments_scraper import YouTubeCommentScraper\n\nscraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=False, return_page_source=True)\nvideo_url = \"https://www.youtube.com/watch?v=Ycg48pVp3SU\"\ncomments, page_source = scraper.scrape_comments(video_url)\n\nprint(\"Comments:\", comments)\nprint(\"Page Source:\", page_source)\n```\n\n### 4. Custom Scroll Pause Time\n\nYou can control how long the scraper pauses between scroll actions using the `scroll_pause_time` parameter:\n\n```python\nfrom youtube_comments_scraper import YouTubeCommentScraper\n\nscraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=2.0, enable_logging=False, return_page_source=False)\nvideo_url = \"https://www.youtube.com/watch?v=Ycg48pVp3SU\"\ncomments = scraper.scrape_comments(video_url)\n\nprint(\"Comments:\", comments)\n```\n\n\n### 5. Scraping Comments Without Scrolling\n\nIf you only want to scrape the comments that load without scrolling:\n\n```python\nfrom youtube_comments_scraper import YouTubeCommentScraper\n\nscraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=False, return_page_source=False)\nvideo_url = \"https://www.youtube.com/watch?v=Ycg48pVp3SU\"\ncomments = scraper.scrape_comments(video_url, scroll=False)\n\nprint(\"Comments:\", comments)\n```\n\n### 6. Logging Custom Messages\n\nYou can log custom messages using the built-in log_info, log_warning, and log_error methods:\n\n```python\nscraper.log_info(\"This is an info log message.\")\nscraper.log_warning(\"This is a warning message.\")\nscraper.log_error(\"This is an error message.\")\n```\n\n## Class Reference\n\n### `YouTubeCommentScraper`\n\n#### `__init__(self, headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=False, return_page_source=False)`\n\n- `headless` (bool): Run the browser in headless mode. Default is `True`.\n- `timeout` (int): The maximum time to wait for elements to load. Default is `10` seconds.\n- `scroll_pause_time` (float): The pause time between scroll actions. Default is `1.5` seconds.\n- `enable_logging` (bool): Whether to enable logging to a file. Default is `False`.\n- `return_page_source` (bool): Whether to return the page source along with comments. Default is `False`.\n\n#### `scrape_comments(self, video_url, scroll=True)`\n\n- `video_url` (str): The URL of the YouTube video.\n- `scroll` (bool): Whether to scroll the page to load all comments. Default is `True`.\n\n**Returns:**\n\n- A tuple `(comments, page_source)` if `return_page_source` is `True`, otherwise just the list of `comments`.\n\n#### `log_info(self, message)`\n\n- Logs an informational message.\n\n#### `log_warning(self, message)`\n\n- Logs a warning message.\n\n#### `log_error(self, message)`\n\n- Logs an error message.\n\n\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE).\n\n\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python package to scrape YouTube comments using Selenium and BeautifulSoup",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/Abhi9868/Youtube-Comment-Scraper"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "437fdbf5ae465a57b4c6afa8859871d9f4e08bf7de6c9a163e40d68bfc0c90f4",
                "md5": "d7dc7db4ed7b44bdc8d8664267529e22",
                "sha256": "eb89ba003fd8ad9a7726d5b000c47ec428bead3c85295adc13ea268f4d5e802c"
            },
            "downloads": -1,
            "filename": "youtube_comments_scrapper-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d7dc7db4ed7b44bdc8d8664267529e22",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 6151,
            "upload_time": "2024-10-25T18:45:22",
            "upload_time_iso_8601": "2024-10-25T18:45:22.496276Z",
            "url": "https://files.pythonhosted.org/packages/43/7f/dbf5ae465a57b4c6afa8859871d9f4e08bf7de6c9a163e40d68bfc0c90f4/youtube_comments_scrapper-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3e3a958b2d520cbed507ffd9a93dbe2487fb20c04ab0712759ce1265fdd1f2e4",
                "md5": "507cb887b0f4bbe7f03e4c18d986df17",
                "sha256": "2e671a5382e19fa59c33173f9bf5c6f3e5a10cabbeb7a3abd26a2431f5ec670b"
            },
            "downloads": -1,
            "filename": "youtube-comments-scrapper-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "507cb887b0f4bbe7f03e4c18d986df17",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 5589,
            "upload_time": "2024-10-25T18:45:26",
            "upload_time_iso_8601": "2024-10-25T18:45:26.313254Z",
            "url": "https://files.pythonhosted.org/packages/3e/3a/958b2d520cbed507ffd9a93dbe2487fb20c04ab0712759ce1265fdd1f2e4/youtube-comments-scrapper-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-25 18:45:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Abhi9868",
    "github_project": "Youtube-Comment-Scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "youtube-comments-scrapper"
}

Abhishek Kumar