tubelearn


Nametubelearn JSON
Version 1.0.0 PyPI version JSON
download
home_page
SummaryPython script for extracting and cleaning YouTube video transcripts for Pre-Processing in machine learning.
upload_time2023-10-09 15:20:01
maintainer
docs_urlNone
authorKabilPreethamK
requires_python
license
keywords python video transcript raw data cleaning machine learning pre-processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            

## Tube-Data: YouTube Video Transcript Extractor

Tube-Data is a Python script designed for extracting and cleaning YouTube video transcripts for preprocessing in machine learning. This versatile tool streamlines the process of acquiring high-quality text data from YouTube videos, making it ideal for various natural language processing tasks, sentiment analysis, speech recognition, and more.

## Features

- Extracts video transcripts from YouTube videos.
- Cleans transcripts by removing unwanted elements like music and applause.
- Saves cleaned transcripts into separate text files.
- Supports individual video URLs, batch processing from a list of URLs, and entire playlists.
- Streamlines the dataset collection process for machine learning applications.

## Installation

You can install the required dependencies using `pip`:

```bash
pip install youtube-transcript-api requests pytube regex
```

## Usage

### Extract Transcripts from a List of Video URLs

```python
from tube_data import text_link

# Provide a path to a text file containing YouTube video URLs.
text_link('path_to_file.txt', name='output_folder_name')
```

### Extract Transcript from a Single Video URL

```python
from tube_data import url_grab

# Provide a single YouTube video URL.
url_grab('video_url', name='output_folder_name')
```

### Extract Transcripts from a YouTube Playlist

```python
from tube_data import playlist_grab

# Provide the URL of a YouTube playlist.
playlist_grab('playlist_url', name='output_folder_name')
```

### Convert Playlist Video Links to Text File

```python
from tube_data import play2text

# Provide the URL of a YouTube playlist.
play2text('playlist_url')
```

## Development Status

This project is currently in the planning stage.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Contributions

Contributions are welcome! Please feel free to open issues or submit pull requests.

## Contact

For any inquiries or feedback, please contact [KabilPreethamK](mailto:kabilpreethamk@gmail.com).


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "tubelearn",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,video,transcript,raw data,cleaning,machine learning,pre-processing",
    "author": "KabilPreethamK",
    "author_email": "<kabilpreethamk@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/9f/29/ba4779a1f1b629ab9ff5dc73ea3d53cdc23c7f5923d00379b30cf8ed98ca/tubelearn-1.0.0.tar.gz",
    "platform": null,
    "description": "\n\n## Tube-Data: YouTube Video Transcript Extractor\n\nTube-Data is a Python script designed for extracting and cleaning YouTube video transcripts for preprocessing in machine learning. This versatile tool streamlines the process of acquiring high-quality text data from YouTube videos, making it ideal for various natural language processing tasks, sentiment analysis, speech recognition, and more.\n\n## Features\n\n- Extracts video transcripts from YouTube videos.\n- Cleans transcripts by removing unwanted elements like music and applause.\n- Saves cleaned transcripts into separate text files.\n- Supports individual video URLs, batch processing from a list of URLs, and entire playlists.\n- Streamlines the dataset collection process for machine learning applications.\n\n## Installation\n\nYou can install the required dependencies using `pip`:\n\n```bash\npip install youtube-transcript-api requests pytube regex\n```\n\n## Usage\n\n### Extract Transcripts from a List of Video URLs\n\n```python\nfrom tube_data import text_link\n\n# Provide a path to a text file containing YouTube video URLs.\ntext_link('path_to_file.txt', name='output_folder_name')\n```\n\n### Extract Transcript from a Single Video URL\n\n```python\nfrom tube_data import url_grab\n\n# Provide a single YouTube video URL.\nurl_grab('video_url', name='output_folder_name')\n```\n\n### Extract Transcripts from a YouTube Playlist\n\n```python\nfrom tube_data import playlist_grab\n\n# Provide the URL of a YouTube playlist.\nplaylist_grab('playlist_url', name='output_folder_name')\n```\n\n### Convert Playlist Video Links to Text File\n\n```python\nfrom tube_data import play2text\n\n# Provide the URL of a YouTube playlist.\nplay2text('playlist_url')\n```\n\n## Development Status\n\nThis project is currently in the planning stage.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## Contributions\n\nContributions are welcome! Please feel free to open issues or submit pull requests.\n\n## Contact\n\nFor any inquiries or feedback, please contact [KabilPreethamK](mailto:kabilpreethamk@gmail.com).\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Python script for extracting and cleaning YouTube video transcripts for Pre-Processing in machine learning.",
    "version": "1.0.0",
    "project_urls": null,
    "split_keywords": [
        "python",
        "video",
        "transcript",
        "raw data",
        "cleaning",
        "machine learning",
        "pre-processing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f6c386735197bee25b0231cc47dfae8874389c68040e00bccde8053beae6407f",
                "md5": "01c9050b56142bcb0ed04855d9d02dba",
                "sha256": "afad41a86278f1d0a3cd90a4d4319982d514d4b3fa84f960dcc9ab5eb7505b69"
            },
            "downloads": -1,
            "filename": "tubelearn-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "01c9050b56142bcb0ed04855d9d02dba",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 4780,
            "upload_time": "2023-10-09T15:19:59",
            "upload_time_iso_8601": "2023-10-09T15:19:59.086031Z",
            "url": "https://files.pythonhosted.org/packages/f6/c3/86735197bee25b0231cc47dfae8874389c68040e00bccde8053beae6407f/tubelearn-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9f29ba4779a1f1b629ab9ff5dc73ea3d53cdc23c7f5923d00379b30cf8ed98ca",
                "md5": "deba53d68dced7c35d98838c4dd17a55",
                "sha256": "7474e8bb37267505892dfd1c16c781a80f2b993cda2c33a24a78ff8faef11647"
            },
            "downloads": -1,
            "filename": "tubelearn-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "deba53d68dced7c35d98838c4dd17a55",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5039,
            "upload_time": "2023-10-09T15:20:01",
            "upload_time_iso_8601": "2023-10-09T15:20:01.452896Z",
            "url": "https://files.pythonhosted.org/packages/9f/29/ba4779a1f1b629ab9ff5dc73ea3d53cdc23c7f5923d00379b30cf8ed98ca/tubelearn-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-09 15:20:01",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "tubelearn"
}
        
Elapsed time: 0.12579s