# podcast-transcript-convert
[![PyPI](https://img.shields.io/pypi/v/podcast-transcript-convert.svg)](https://pypi.org/project/podcast-transcript-convert/)
[![Lint and Test](https://github.com/hbmartin/podcast-transcript-convert/actions/workflows/lint.yml/badge.svg)](https://github.com/hbmartin/podcast-transcript-tools/actions/workflows/lint.yml)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Code style: black](https://img.shields.io/badge/🐧️-black-000000.svg)](https://github.com/psf/black)
[![Checked with pytype](https://img.shields.io/badge/🦆-pytype-437f30.svg)](https://google.github.io/pytype/)
[![twitter](https://img.shields.io/badge/@hmartin-00aced.svg?logo=twitter&logoColor=black)](https://twitter.com/hmartin)
<img src=".idea/icon.svg" width="100" align="right">
Convert podcast transcripts from HTML, SRT, WebVtt, Podlove etc into [PodcastIndex JSON](https://github.com/Podcastindex-org/podcast-namespace/blob/main/transcripts/transcripts.md).
## Installation
It is recommended to use [pipx](https://pipx.pypa.io/stable/) to install and run the CLI tool. If you wish to use the library, you can install with `pip` instead.
```bash
brew install pipx
pipx install podcast-transcript-convert
```
If you've already installed the package and wish to upgrade:
```bash
pipx upgrade podcast-transcript-convert
```
## Usage
Run the conversion app on your transcripts directory.
```bash
transcript2json transcripts/ converted/
```
You can then inspect the output JSON files in the `converted/` directory.
## Library Usage
```python
from podcast_transcript_convert.convert import bulk_convert
bulk_convert("transctipts_dir/", "converted_dir/")
```
Individual file type converters are in the `converters` package. You can use them directly if you know the file type.
You can use `file_typing.identify_file_type(file)` to determine the file type of a transcript file.
## Development
Pull requests are very welcome! For major changes, please open an issue first to discuss what you would like to change.
```bash
git clone git@github.com:hbmartin/podcast-transcript-convert.git
cd podcast-transcript-convert
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Replace with the actual path to your transcript files
python -m podcast_transcript_convert ~/Downloads/overcast-to-sqlite/archive/transcripts converted/
```
### Code Formatting
This project is linted with [ruff](https://docs.astral.sh/ruff/) and uses [Black](https://github.com/ambv/black) code formatting.
## Authors
- [Harold Martin](https://www.linkedin.com/in/harold-martin-98526971/) - harold.martin at gmail
- Icon courtesy of [Vecteezy.com](https://www.vecteezy.com)
Raw data
{
"_id": null,
"home_page": null,
"name": "podcast-transcript-convert",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "convert, pci, podcast, podcastindex, podlove, srt, transcripts, vtt, webvtt",
"author": null,
"author_email": "Harold Martin <Harold.Martin@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/56/a2/ac020367d366d91042b9782f3e6d098c885eed8baa8cdceaf76db002bbdc/podcast_transcript_convert-0.1.2.tar.gz",
"platform": null,
"description": "# podcast-transcript-convert\n\n[![PyPI](https://img.shields.io/pypi/v/podcast-transcript-convert.svg)](https://pypi.org/project/podcast-transcript-convert/)\n[![Lint and Test](https://github.com/hbmartin/podcast-transcript-convert/actions/workflows/lint.yml/badge.svg)](https://github.com/hbmartin/podcast-transcript-tools/actions/workflows/lint.yml)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![Code style: black](https://img.shields.io/badge/\ud83d\udc27\ufe0f-black-000000.svg)](https://github.com/psf/black)\n[![Checked with pytype](https://img.shields.io/badge/\ud83e\udd86-pytype-437f30.svg)](https://google.github.io/pytype/)\n[![twitter](https://img.shields.io/badge/@hmartin-00aced.svg?logo=twitter&logoColor=black)](https://twitter.com/hmartin)\n\n<img src=\".idea/icon.svg\" width=\"100\" align=\"right\">\n\nConvert podcast transcripts from HTML, SRT, WebVtt, Podlove etc into [PodcastIndex JSON](https://github.com/Podcastindex-org/podcast-namespace/blob/main/transcripts/transcripts.md).\n\n## Installation\n\nIt is recommended to use [pipx](https://pipx.pypa.io/stable/) to install and run the CLI tool. If you wish to use the library, you can install with `pip` instead.\n\n```bash\nbrew install pipx\npipx install podcast-transcript-convert\n```\n\nIf you've already installed the package and wish to upgrade:\n\n```bash\npipx upgrade podcast-transcript-convert\n```\n\n## Usage\nRun the conversion app on your transcripts directory.\n\n```bash\ntranscript2json transcripts/ converted/\n```\nYou can then inspect the output JSON files in the `converted/` directory.\n\n## Library Usage\n```python\nfrom podcast_transcript_convert.convert import bulk_convert\n\nbulk_convert(\"transctipts_dir/\", \"converted_dir/\")\n```\n\nIndividual file type converters are in the `converters` package. You can use them directly if you know the file type.\n\nYou can use `file_typing.identify_file_type(file)` to determine the file type of a transcript file.\n\n\n## Development\n\nPull requests are very welcome! For major changes, please open an issue first to discuss what you would like to change.\n\n```bash\ngit clone git@github.com:hbmartin/podcast-transcript-convert.git\ncd podcast-transcript-convert\npython3 -m venv venv\nsource venv/bin/activate\npip install -r requirements.txt\n# Replace with the actual path to your transcript files\npython -m podcast_transcript_convert ~/Downloads/overcast-to-sqlite/archive/transcripts converted/\n```\n\n### Code Formatting\n\nThis project is linted with [ruff](https://docs.astral.sh/ruff/) and uses [Black](https://github.com/ambv/black) code formatting.\n\n\n## Authors\n- [Harold Martin](https://www.linkedin.com/in/harold-martin-98526971/) - harold.martin at gmail\n- Icon courtesy of [Vecteezy.com](https://www.vecteezy.com)\n",
"bugtrack_url": null,
"license": null,
"summary": "Convert podcast transcripts from HTML, SRT, WebVtt, Podlove etc into PodcastIndex JSON.",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/hbmartin/podcast-transcript-convert"
},
"split_keywords": [
"convert",
" pci",
" podcast",
" podcastindex",
" podlove",
" srt",
" transcripts",
" vtt",
" webvtt"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7573d654f430a814480cd67e4691c4dd4fff59a86ae1c16d354ca258111ef19e",
"md5": "b10b6f097c7f555ac37a1f0e457e6e58",
"sha256": "cf7ae1e34e80e086664e2674f1f7612647891d3924932031ae5121133ea1c08c"
},
"downloads": -1,
"filename": "podcast_transcript_convert-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b10b6f097c7f555ac37a1f0e457e6e58",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 19088,
"upload_time": "2024-07-23T19:56:58",
"upload_time_iso_8601": "2024-07-23T19:56:58.117894Z",
"url": "https://files.pythonhosted.org/packages/75/73/d654f430a814480cd67e4691c4dd4fff59a86ae1c16d354ca258111ef19e/podcast_transcript_convert-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "56a2ac020367d366d91042b9782f3e6d098c885eed8baa8cdceaf76db002bbdc",
"md5": "7bc9455c528ced7c5a56ac47841e1ef5",
"sha256": "a52b12bd255f02e02c05742bd92c444a0de8b9f9d571c9c3056fcb22b46bf80d"
},
"downloads": -1,
"filename": "podcast_transcript_convert-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "7bc9455c528ced7c5a56ac47841e1ef5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 14629,
"upload_time": "2024-07-23T19:56:59",
"upload_time_iso_8601": "2024-07-23T19:56:59.408725Z",
"url": "https://files.pythonhosted.org/packages/56/a2/ac020367d366d91042b9782f3e6d098c885eed8baa8cdceaf76db002bbdc/podcast_transcript_convert-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-23 19:56:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hbmartin",
"github_project": "podcast-transcript-convert",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "beautifulsoup4",
"specs": [
[
"==",
"4.12.3"
]
]
},
{
"name": "loguru",
"specs": [
[
"==",
"0.7.2"
]
]
},
{
"name": "lxml",
"specs": [
[
"==",
"5.2.2"
]
]
},
{
"name": "webvtt-py",
"specs": [
[
"==",
"0.5.1"
]
]
}
],
"lcname": "podcast-transcript-convert"
}