## Tube-Data: YouTube Video Transcript Extractor
Tube-Data is a Python script designed for extracting and cleaning YouTube video transcripts for preprocessing in machine learning. This versatile tool streamlines the process of acquiring high-quality text data from YouTube videos, making it ideal for various natural language processing tasks, sentiment analysis, speech recognition, and more.
## Features
- Extracts video transcripts from YouTube videos.
- Cleans transcripts by removing unwanted elements like music and applause.
- Saves cleaned transcripts into separate text files.
- Supports individual video URLs, batch processing from a list of URLs, and entire playlists.
- Streamlines the dataset collection process for machine learning applications.
## Installation
You can install the required dependencies using `pip`:
```bash
pip install youtube-transcript-api requests pytube regex
```
## Usage
### Extract Transcripts from a List of Video URLs
```python
from tube_data import text_link
# Provide a path to a text file containing YouTube video URLs.
text_link('path_to_file.txt', name='output_folder_name')
```
### Extract Transcript from a Single Video URL
```python
from tube_data import url_grab
# Provide a single YouTube video URL.
url_grab('video_url', name='output_folder_name')
```
### Extract Transcripts from a YouTube Playlist
```python
from tube_data import playlist_grab
# Provide the URL of a YouTube playlist.
playlist_grab('playlist_url', name='output_folder_name')
```
### Convert Playlist Video Links to Text File
```python
from tube_data import play2text
# Provide the URL of a YouTube playlist.
play2text('playlist_url')
```
## Development Status
This project is currently in the planning stage.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
## Contributions
Contributions are welcome! Please feel free to open issues or submit pull requests.
## Contact
For any inquiries or feedback, please contact [KabilPreethamK](mailto:kabilpreethamk@gmail.com).
Raw data
{
"_id": null,
"home_page": "",
"name": "tubelearn",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "python,video,transcript,raw data,cleaning,machine learning,pre-processing",
"author": "KabilPreethamK",
"author_email": "<kabilpreethamk@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/9f/29/ba4779a1f1b629ab9ff5dc73ea3d53cdc23c7f5923d00379b30cf8ed98ca/tubelearn-1.0.0.tar.gz",
"platform": null,
"description": "\n\n## Tube-Data: YouTube Video Transcript Extractor\n\nTube-Data is a Python script designed for extracting and cleaning YouTube video transcripts for preprocessing in machine learning. This versatile tool streamlines the process of acquiring high-quality text data from YouTube videos, making it ideal for various natural language processing tasks, sentiment analysis, speech recognition, and more.\n\n## Features\n\n- Extracts video transcripts from YouTube videos.\n- Cleans transcripts by removing unwanted elements like music and applause.\n- Saves cleaned transcripts into separate text files.\n- Supports individual video URLs, batch processing from a list of URLs, and entire playlists.\n- Streamlines the dataset collection process for machine learning applications.\n\n## Installation\n\nYou can install the required dependencies using `pip`:\n\n```bash\npip install youtube-transcript-api requests pytube regex\n```\n\n## Usage\n\n### Extract Transcripts from a List of Video URLs\n\n```python\nfrom tube_data import text_link\n\n# Provide a path to a text file containing YouTube video URLs.\ntext_link('path_to_file.txt', name='output_folder_name')\n```\n\n### Extract Transcript from a Single Video URL\n\n```python\nfrom tube_data import url_grab\n\n# Provide a single YouTube video URL.\nurl_grab('video_url', name='output_folder_name')\n```\n\n### Extract Transcripts from a YouTube Playlist\n\n```python\nfrom tube_data import playlist_grab\n\n# Provide the URL of a YouTube playlist.\nplaylist_grab('playlist_url', name='output_folder_name')\n```\n\n### Convert Playlist Video Links to Text File\n\n```python\nfrom tube_data import play2text\n\n# Provide the URL of a YouTube playlist.\nplay2text('playlist_url')\n```\n\n## Development Status\n\nThis project is currently in the planning stage.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## Contributions\n\nContributions are welcome! Please feel free to open issues or submit pull requests.\n\n## Contact\n\nFor any inquiries or feedback, please contact [KabilPreethamK](mailto:kabilpreethamk@gmail.com).\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Python script for extracting and cleaning YouTube video transcripts for Pre-Processing in machine learning.",
"version": "1.0.0",
"project_urls": null,
"split_keywords": [
"python",
"video",
"transcript",
"raw data",
"cleaning",
"machine learning",
"pre-processing"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f6c386735197bee25b0231cc47dfae8874389c68040e00bccde8053beae6407f",
"md5": "01c9050b56142bcb0ed04855d9d02dba",
"sha256": "afad41a86278f1d0a3cd90a4d4319982d514d4b3fa84f960dcc9ab5eb7505b69"
},
"downloads": -1,
"filename": "tubelearn-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "01c9050b56142bcb0ed04855d9d02dba",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 4780,
"upload_time": "2023-10-09T15:19:59",
"upload_time_iso_8601": "2023-10-09T15:19:59.086031Z",
"url": "https://files.pythonhosted.org/packages/f6/c3/86735197bee25b0231cc47dfae8874389c68040e00bccde8053beae6407f/tubelearn-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9f29ba4779a1f1b629ab9ff5dc73ea3d53cdc23c7f5923d00379b30cf8ed98ca",
"md5": "deba53d68dced7c35d98838c4dd17a55",
"sha256": "7474e8bb37267505892dfd1c16c781a80f2b993cda2c33a24a78ff8faef11647"
},
"downloads": -1,
"filename": "tubelearn-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "deba53d68dced7c35d98838c4dd17a55",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5039,
"upload_time": "2023-10-09T15:20:01",
"upload_time_iso_8601": "2023-10-09T15:20:01.452896Z",
"url": "https://files.pythonhosted.org/packages/9f/29/ba4779a1f1b629ab9ff5dc73ea3d53cdc23c7f5923d00379b30cf8ed98ca/tubelearn-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-09 15:20:01",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "tubelearn"
}