podcast-transcript-convert


Namepodcast-transcript-convert JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryConvert podcast transcripts from HTML, SRT, WebVtt, Podlove etc into PodcastIndex JSON.
upload_time2024-07-23 19:56:59
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords convert pci podcast podcastindex podlove srt transcripts vtt webvtt
VCS
bugtrack_url
requirements beautifulsoup4 loguru lxml webvtt-py
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # podcast-transcript-convert

[![PyPI](https://img.shields.io/pypi/v/podcast-transcript-convert.svg)](https://pypi.org/project/podcast-transcript-convert/)
[![Lint and Test](https://github.com/hbmartin/podcast-transcript-convert/actions/workflows/lint.yml/badge.svg)](https://github.com/hbmartin/podcast-transcript-tools/actions/workflows/lint.yml)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Code style: black](https://img.shields.io/badge/🐧️-black-000000.svg)](https://github.com/psf/black)
[![Checked with pytype](https://img.shields.io/badge/🦆-pytype-437f30.svg)](https://google.github.io/pytype/)
[![twitter](https://img.shields.io/badge/@hmartin-00aced.svg?logo=twitter&logoColor=black)](https://twitter.com/hmartin)

<img src=".idea/icon.svg" width="100" align="right">

Convert podcast transcripts from HTML, SRT, WebVtt, Podlove etc into [PodcastIndex JSON](https://github.com/Podcastindex-org/podcast-namespace/blob/main/transcripts/transcripts.md).

## Installation

It is recommended to use [pipx](https://pipx.pypa.io/stable/) to install and run the CLI tool. If you wish to use the library, you can install with `pip` instead.

```bash
brew install pipx
pipx install podcast-transcript-convert
```

If you've already installed the package and wish to upgrade:

```bash
pipx upgrade podcast-transcript-convert
```

## Usage
Run the conversion app on your transcripts directory.

```bash
transcript2json transcripts/ converted/
```
You can then inspect the output JSON files in the `converted/` directory.

## Library Usage
```python
from podcast_transcript_convert.convert import bulk_convert

bulk_convert("transctipts_dir/", "converted_dir/")
```

Individual file type converters are in the `converters` package. You can use them directly if you know the file type.

You can use `file_typing.identify_file_type(file)` to determine the file type of a transcript file.


## Development

Pull requests are very welcome! For major changes, please open an issue first to discuss what you would like to change.

```bash
git clone git@github.com:hbmartin/podcast-transcript-convert.git
cd podcast-transcript-convert
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Replace with the actual path to your transcript files
python -m podcast_transcript_convert ~/Downloads/overcast-to-sqlite/archive/transcripts converted/
```

### Code Formatting

This project is linted with [ruff](https://docs.astral.sh/ruff/) and uses [Black](https://github.com/ambv/black) code formatting.


## Authors
- [Harold Martin](https://www.linkedin.com/in/harold-martin-98526971/) - harold.martin at gmail
- Icon courtesy of [Vecteezy.com](https://www.vecteezy.com)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "podcast-transcript-convert",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "convert, pci, podcast, podcastindex, podlove, srt, transcripts, vtt, webvtt",
    "author": null,
    "author_email": "Harold Martin <Harold.Martin@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/56/a2/ac020367d366d91042b9782f3e6d098c885eed8baa8cdceaf76db002bbdc/podcast_transcript_convert-0.1.2.tar.gz",
    "platform": null,
    "description": "# podcast-transcript-convert\n\n[![PyPI](https://img.shields.io/pypi/v/podcast-transcript-convert.svg)](https://pypi.org/project/podcast-transcript-convert/)\n[![Lint and Test](https://github.com/hbmartin/podcast-transcript-convert/actions/workflows/lint.yml/badge.svg)](https://github.com/hbmartin/podcast-transcript-tools/actions/workflows/lint.yml)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![Code style: black](https://img.shields.io/badge/\ud83d\udc27\ufe0f-black-000000.svg)](https://github.com/psf/black)\n[![Checked with pytype](https://img.shields.io/badge/\ud83e\udd86-pytype-437f30.svg)](https://google.github.io/pytype/)\n[![twitter](https://img.shields.io/badge/@hmartin-00aced.svg?logo=twitter&logoColor=black)](https://twitter.com/hmartin)\n\n<img src=\".idea/icon.svg\" width=\"100\" align=\"right\">\n\nConvert podcast transcripts from HTML, SRT, WebVtt, Podlove etc into [PodcastIndex JSON](https://github.com/Podcastindex-org/podcast-namespace/blob/main/transcripts/transcripts.md).\n\n## Installation\n\nIt is recommended to use [pipx](https://pipx.pypa.io/stable/) to install and run the CLI tool. If you wish to use the library, you can install with `pip` instead.\n\n```bash\nbrew install pipx\npipx install podcast-transcript-convert\n```\n\nIf you've already installed the package and wish to upgrade:\n\n```bash\npipx upgrade podcast-transcript-convert\n```\n\n## Usage\nRun the conversion app on your transcripts directory.\n\n```bash\ntranscript2json transcripts/ converted/\n```\nYou can then inspect the output JSON files in the `converted/` directory.\n\n## Library Usage\n```python\nfrom podcast_transcript_convert.convert import bulk_convert\n\nbulk_convert(\"transctipts_dir/\", \"converted_dir/\")\n```\n\nIndividual file type converters are in the `converters` package. You can use them directly if you know the file type.\n\nYou can use `file_typing.identify_file_type(file)` to determine the file type of a transcript file.\n\n\n## Development\n\nPull requests are very welcome! For major changes, please open an issue first to discuss what you would like to change.\n\n```bash\ngit clone git@github.com:hbmartin/podcast-transcript-convert.git\ncd podcast-transcript-convert\npython3 -m venv venv\nsource venv/bin/activate\npip install -r requirements.txt\n# Replace with the actual path to your transcript files\npython -m podcast_transcript_convert ~/Downloads/overcast-to-sqlite/archive/transcripts converted/\n```\n\n### Code Formatting\n\nThis project is linted with [ruff](https://docs.astral.sh/ruff/) and uses [Black](https://github.com/ambv/black) code formatting.\n\n\n## Authors\n- [Harold Martin](https://www.linkedin.com/in/harold-martin-98526971/) - harold.martin at gmail\n- Icon courtesy of [Vecteezy.com](https://www.vecteezy.com)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Convert podcast transcripts from HTML, SRT, WebVtt, Podlove etc into PodcastIndex JSON.",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/hbmartin/podcast-transcript-convert"
    },
    "split_keywords": [
        "convert",
        " pci",
        " podcast",
        " podcastindex",
        " podlove",
        " srt",
        " transcripts",
        " vtt",
        " webvtt"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7573d654f430a814480cd67e4691c4dd4fff59a86ae1c16d354ca258111ef19e",
                "md5": "b10b6f097c7f555ac37a1f0e457e6e58",
                "sha256": "cf7ae1e34e80e086664e2674f1f7612647891d3924932031ae5121133ea1c08c"
            },
            "downloads": -1,
            "filename": "podcast_transcript_convert-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b10b6f097c7f555ac37a1f0e457e6e58",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 19088,
            "upload_time": "2024-07-23T19:56:58",
            "upload_time_iso_8601": "2024-07-23T19:56:58.117894Z",
            "url": "https://files.pythonhosted.org/packages/75/73/d654f430a814480cd67e4691c4dd4fff59a86ae1c16d354ca258111ef19e/podcast_transcript_convert-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "56a2ac020367d366d91042b9782f3e6d098c885eed8baa8cdceaf76db002bbdc",
                "md5": "7bc9455c528ced7c5a56ac47841e1ef5",
                "sha256": "a52b12bd255f02e02c05742bd92c444a0de8b9f9d571c9c3056fcb22b46bf80d"
            },
            "downloads": -1,
            "filename": "podcast_transcript_convert-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "7bc9455c528ced7c5a56ac47841e1ef5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 14629,
            "upload_time": "2024-07-23T19:56:59",
            "upload_time_iso_8601": "2024-07-23T19:56:59.408725Z",
            "url": "https://files.pythonhosted.org/packages/56/a2/ac020367d366d91042b9782f3e6d098c885eed8baa8cdceaf76db002bbdc/podcast_transcript_convert-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-23 19:56:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hbmartin",
    "github_project": "podcast-transcript-convert",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    "==",
                    "4.12.3"
                ]
            ]
        },
        {
            "name": "loguru",
            "specs": [
                [
                    "==",
                    "0.7.2"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    "==",
                    "5.2.2"
                ]
            ]
        },
        {
            "name": "webvtt-py",
            "specs": [
                [
                    "==",
                    "0.5.1"
                ]
            ]
        }
    ],
    "lcname": "podcast-transcript-convert"
}
        
Elapsed time: 4.97559s