par-yt2text


Namepar-yt2text JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryExtracts metadata about a video, such as the transcript, duration, and comments, with optional audio transcription using OpenAI Whisper.
upload_time2024-10-21 19:35:17
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseMIT License Copyright (c) 2024 Paul Robello Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords transcript youtube
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PAR YT2Text

[![PyPI](https://img.shields.io/pypi/v/par_yt2text)](https://pypi.org/project/par_yt2text/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/par_yt2text.svg)](https://pypi.org/project/par_yt2text/)  
![Runs on Linux | MacOS | Windows](https://img.shields.io/badge/runs%20on-Linux%20%7C%20MacOS%20%7C%20Windows-blue)
![Arch x86-63 | ARM | AppleSilicon](https://img.shields.io/badge/arch-x86--64%20%7C%20ARM%20%7C%20AppleSilicon-blue)  
![PyPI - License](https://img.shields.io/pypi/l/par_yt2text)

PAR YT2Text Based on yt By Daniel Miessler with the addition of OpenAI Whisper for videos that don't have transcripts.

[!["Buy Me A Coffee"](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://buymeacoffee.com/probello3)


## Features

- Extract metadata, transcripts, and comments from YouTube videos
- If the transcript is not available, optionally use OpenAI Whisper API to transcribe the audio


## Prerequisites

* To install PAR YT2Text, make sure you have Python 3.11.
* Create a GOOGLE API key
* If you want to use OpenAI Whisper API, create an OPENAI API key (An OpenAI key is not needed for local OpenAI Whisper).

### [uv](https://pypi.org/project/uv/) is recommended

#### Linux and Mac
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

#### Windows
```bash
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```

## Installation

### Installation From Source WITHOUT support for local OpenAI Whisper

* Clone the repository:
```bash
git clone https://github.com/paulrobello/par_yt2text.git
cd par_yt2text
uv sync
```

### Installation From Source WITH support for local OpenAI Whisper

* Clone the repository:
```bash
git clone https://github.com/paulrobello/par_yt2text.git
cd par_yt2text
uv sync -U --extra local-whisper
```

### Installation From PyPI WITHOUT support for local OpenAI Whisper

```bash
uv tool install par_yt2text
```

```bash
pipx install par_yt2text
```

### Installation From PyPI WITH support for local OpenAI Whisper

To install PAR YT2Text from PyPI with local OpenAI Whisper, run any of the following commands:

```bash
uv tool install -U 'git+https://github.com/paulrobello/par_yt2text[local-whisper]' --index https://download.pytorch.org/whl/cu121 --index-strategy unsafe-best-match
```

```bash
pipx install 'par_yt2text[local-whisper] @ git+https://github.com/paulrobello/par_yt2text' --pip-args="--extra-index-url https://download.pytorch.org/whl/cu121"
```


## Usage
Create a file called `~/.par_yt2text.env` with your Google API key and OpenAI API key in it.
```bash
GOOGLE_API_KEY= # needed for youtube-transcript-api
OPENAI_API_KEY= # needed for OpenAI API whisper audio transcription (An OpenAI key is not needed for local OpenAI Whisper).
PAR_YT2TEXT_SAVE_DIR= # where to save the transcripts if you dont specify a folder in the --save option
```

Whisper audio transcription will only be used if you specify the `--whisper` or `--local-whisper` option and the video does not have a transcript.  
If you want to force the use of whisper audio transcription, use the `--force-whisper` option with one of the `--whisper` or `--local-whisper` options.

Often the transcript will come back a single long line.
PAR YT2Text will attempt to add newlines to the transcript to make it easier to read unless you specify the `--no-fix-newlines` option.

### Local Whisper
While the OpenAI Whisper API is fast and inexpensive a free local option is available.  
**NOTE: Local whisper mode can be very slow on cpu. If you have a CUDA enabled GPU it will be used unless you specify the `--whisper-device` option.**  
`turbo` is the default local model however you should consult the [OpenAI Whisper documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages) to see what models are available and select the best one for your VRAM needs.

### Running from source
```bash
uv run par_yt2text --transcript --whisper 'https://www.youtube.com/watch?v=COSpqsDjiiw'
```

### Running if installed from PyPI
```bash
par_yt2text --transcript --whisper 'https://www.youtube.com/watch?v=COSpqsDjiiw'
```

### Example of forcing use of local Whisper if tool was installed with local Whisper enabled
```bash
par_yt2text --transcript --force-whisper --whisper-local 'https://www.youtube.com/watch?v=COSpqsDjiiw'
```

### Options
```
usage: par_yt2text [-h] [--duration] [--transcript] [--comments] [--metadata] [--no-fix-newlines] [--whisper] [--local-whisper]
                   [--whisper-device {auto,cpu,cuda}] [--force-whisper] [--whisper-model WHISPER_MODEL] [--lang LANG] [--save FILE]
                   url

positional arguments:
  url                   YouTube video URL

options:
  -h, --help            show this help message and exit
  --duration            Output only the duration
  --transcript          Output only the transcript
  --comments            Output the comments on the video
  --metadata            Output the video metadata
  --no-fix-newlines     Dont attempt to fix missing newlines from sentences
  --whisper             Use OpenAI Whisper to transcribe the audio if transcript is not available
  --local-whisper       Use Local OpenAI Whisper to transcribe the audio if transcript is not available
  --whisper-device {auto,cpu,cuda}
                        Device to use for local Whisper cpu, cuda (default: auto)
  --force-whisper       Force use of selected Whisper to transcribe the audio even if transcript is available
  --whisper-model WHISPER_MODEL
                        Whisper model to use for audio transcription (default-api: whisper-1, default-local: turbo)
  --lang LANG           Language for the transcript (default: English)
  --save FILE           Save the output to a file
```


## Whats New
- Version 0.2.0:
  - Added support for local OpenAI Whisper
- Version 0.1.0:
  - Initial release

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Author

Paul Robello - probello@gmail.com  (Based on yt By Daniel Miessler)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "par-yt2text",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "Paul Robello <probello@gmail.com>",
    "keywords": "transcript, youtube",
    "author": null,
    "author_email": "Paul Robello <probello@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/cc/95/d83baeeb4082a1fb6da050213ef06c18d99cb5bcb88da9757ebccb72b14c/par_yt2text-0.2.0.tar.gz",
    "platform": null,
    "description": "# PAR YT2Text\n\n[![PyPI](https://img.shields.io/pypi/v/par_yt2text)](https://pypi.org/project/par_yt2text/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/par_yt2text.svg)](https://pypi.org/project/par_yt2text/)  \n![Runs on Linux | MacOS | Windows](https://img.shields.io/badge/runs%20on-Linux%20%7C%20MacOS%20%7C%20Windows-blue)\n![Arch x86-63 | ARM | AppleSilicon](https://img.shields.io/badge/arch-x86--64%20%7C%20ARM%20%7C%20AppleSilicon-blue)  \n![PyPI - License](https://img.shields.io/pypi/l/par_yt2text)\n\nPAR YT2Text Based on yt By Daniel Miessler with the addition of OpenAI Whisper for videos that don't have transcripts.\n\n[![\"Buy Me A Coffee\"](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://buymeacoffee.com/probello3)\n\n\n## Features\n\n- Extract metadata, transcripts, and comments from YouTube videos\n- If the transcript is not available, optionally use OpenAI Whisper API to transcribe the audio\n\n\n## Prerequisites\n\n* To install PAR YT2Text, make sure you have Python 3.11.\n* Create a GOOGLE API key\n* If you want to use OpenAI Whisper API, create an OPENAI API key (An OpenAI key is not needed for local OpenAI Whisper).\n\n### [uv](https://pypi.org/project/uv/) is recommended\n\n#### Linux and Mac\n```bash\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n```\n\n#### Windows\n```bash\npowershell -ExecutionPolicy ByPass -c \"irm https://astral.sh/uv/install.ps1 | iex\"\n```\n\n## Installation\n\n### Installation From Source WITHOUT support for local OpenAI Whisper\n\n* Clone the repository:\n```bash\ngit clone https://github.com/paulrobello/par_yt2text.git\ncd par_yt2text\nuv sync\n```\n\n### Installation From Source WITH support for local OpenAI Whisper\n\n* Clone the repository:\n```bash\ngit clone https://github.com/paulrobello/par_yt2text.git\ncd par_yt2text\nuv sync -U --extra local-whisper\n```\n\n### Installation From PyPI WITHOUT support for local OpenAI Whisper\n\n```bash\nuv tool install par_yt2text\n```\n\n```bash\npipx install par_yt2text\n```\n\n### Installation From PyPI WITH support for local OpenAI Whisper\n\nTo install PAR YT2Text from PyPI with local OpenAI Whisper, run any of the following commands:\n\n```bash\nuv tool install -U 'git+https://github.com/paulrobello/par_yt2text[local-whisper]' --index https://download.pytorch.org/whl/cu121 --index-strategy unsafe-best-match\n```\n\n```bash\npipx install 'par_yt2text[local-whisper] @ git+https://github.com/paulrobello/par_yt2text' --pip-args=\"--extra-index-url https://download.pytorch.org/whl/cu121\"\n```\n\n\n## Usage\nCreate a file called `~/.par_yt2text.env` with your Google API key and OpenAI API key in it.\n```bash\nGOOGLE_API_KEY= # needed for youtube-transcript-api\nOPENAI_API_KEY= # needed for OpenAI API whisper audio transcription (An OpenAI key is not needed for local OpenAI Whisper).\nPAR_YT2TEXT_SAVE_DIR= # where to save the transcripts if you dont specify a folder in the --save option\n```\n\nWhisper audio transcription will only be used if you specify the `--whisper` or `--local-whisper` option and the video does not have a transcript.  \nIf you want to force the use of whisper audio transcription, use the `--force-whisper` option with one of the `--whisper` or `--local-whisper` options.\n\nOften the transcript will come back a single long line.\nPAR YT2Text will attempt to add newlines to the transcript to make it easier to read unless you specify the `--no-fix-newlines` option.\n\n### Local Whisper\nWhile the OpenAI Whisper API is fast and inexpensive a free local option is available.  \n**NOTE: Local whisper mode can be very slow on cpu. If you have a CUDA enabled GPU it will be used unless you specify the `--whisper-device` option.**  \n`turbo` is the default local model however you should consult the [OpenAI Whisper documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages) to see what models are available and select the best one for your VRAM needs.\n\n### Running from source\n```bash\nuv run par_yt2text --transcript --whisper 'https://www.youtube.com/watch?v=COSpqsDjiiw'\n```\n\n### Running if installed from PyPI\n```bash\npar_yt2text --transcript --whisper 'https://www.youtube.com/watch?v=COSpqsDjiiw'\n```\n\n### Example of forcing use of local Whisper if tool was installed with local Whisper enabled\n```bash\npar_yt2text --transcript --force-whisper --whisper-local 'https://www.youtube.com/watch?v=COSpqsDjiiw'\n```\n\n### Options\n```\nusage: par_yt2text [-h] [--duration] [--transcript] [--comments] [--metadata] [--no-fix-newlines] [--whisper] [--local-whisper]\n                   [--whisper-device {auto,cpu,cuda}] [--force-whisper] [--whisper-model WHISPER_MODEL] [--lang LANG] [--save FILE]\n                   url\n\npositional arguments:\n  url                   YouTube video URL\n\noptions:\n  -h, --help            show this help message and exit\n  --duration            Output only the duration\n  --transcript          Output only the transcript\n  --comments            Output the comments on the video\n  --metadata            Output the video metadata\n  --no-fix-newlines     Dont attempt to fix missing newlines from sentences\n  --whisper             Use OpenAI Whisper to transcribe the audio if transcript is not available\n  --local-whisper       Use Local OpenAI Whisper to transcribe the audio if transcript is not available\n  --whisper-device {auto,cpu,cuda}\n                        Device to use for local Whisper cpu, cuda (default: auto)\n  --force-whisper       Force use of selected Whisper to transcribe the audio even if transcript is available\n  --whisper-model WHISPER_MODEL\n                        Whisper model to use for audio transcription (default-api: whisper-1, default-local: turbo)\n  --lang LANG           Language for the transcript (default: English)\n  --save FILE           Save the output to a file\n```\n\n\n## Whats New\n- Version 0.2.0:\n  - Added support for local OpenAI Whisper\n- Version 0.1.0:\n  - Initial release\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Author\n\nPaul Robello - probello@gmail.com  (Based on yt By Daniel Miessler)\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 Paul Robello  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "Extracts metadata about a video, such as the transcript, duration, and comments, with optional audio transcription using OpenAI Whisper.",
    "version": "0.2.0",
    "project_urls": {
        "Discussions": "https://github.com/paulrobello/par_yt2text/discussions",
        "Documentation": "https://github.com/paulrobello/par_yt2text/blob/main/README.md",
        "Homepage": "https://github.com/paulrobello/par_yt2text",
        "Issues": "https://github.com/paulrobello/par_yt2text/issues",
        "Repository": "https://github.com/paulrobello/par_yt2text",
        "Wiki": "https://github.com/paulrobello/par_yt2text/wiki"
    },
    "split_keywords": [
        "transcript",
        " youtube"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b2a8841e6e2783c13dedc70ba30af77c9a6e48960d9b1f8c48a00de700e9ffcb",
                "md5": "e2afdc6ef36af1002ab7daf3ce79c9ae",
                "sha256": "b8abcc587d1bebafd1719c259eface5b4cf34ac5d7b9c63be71030fef6b06fe3"
            },
            "downloads": -1,
            "filename": "par_yt2text-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e2afdc6ef36af1002ab7daf3ce79c9ae",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 9753,
            "upload_time": "2024-10-21T19:35:15",
            "upload_time_iso_8601": "2024-10-21T19:35:15.589020Z",
            "url": "https://files.pythonhosted.org/packages/b2/a8/841e6e2783c13dedc70ba30af77c9a6e48960d9b1f8c48a00de700e9ffcb/par_yt2text-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cc95d83baeeb4082a1fb6da050213ef06c18d99cb5bcb88da9757ebccb72b14c",
                "md5": "ea7fb699403ce9dd21e6ba20222777a2",
                "sha256": "fac3822b86f741607268ef084a2951c5e1a75976e139b6e70bc65d88fdd9d330"
            },
            "downloads": -1,
            "filename": "par_yt2text-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ea7fb699403ce9dd21e6ba20222777a2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 8378,
            "upload_time": "2024-10-21T19:35:17",
            "upload_time_iso_8601": "2024-10-21T19:35:17.191528Z",
            "url": "https://files.pythonhosted.org/packages/cc/95/d83baeeb4082a1fb6da050213ef06c18d99cb5bcb88da9757ebccb72b14c/par_yt2text-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-21 19:35:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "paulrobello",
    "github_project": "par_yt2text",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "par-yt2text"
}
        
Elapsed time: 0.45966s