# SRT Equalizer
A Python module to transform subtitle line lengths, splitting into multiple subtitle
fragments if necessary. Useful to adjust automatic speech recognition outputs from e.g. [Whisper](https://github.com/openai/whisper) to a more convenient size.
This library works for all languages where spaces separate words.
## Installing
`pip install srt_equalizer`
## Example
If the SRT file contains lines over a certain length like this:
```
1
00:00:00,000 --> 00:00:04,000
Good evening. I appreciate you giving me a few minutes of your time tonight
2
00:00:04,000 --> 00:00:11,000
so I can discuss with you a complex and difficult issue, an issue that is one of the most profound of our time.
```
Using this code to shorten the subtitles to a maximum length of 42 chars:
```python
from srt_equalizer import srt_equalizer
srt_equalizer.equalize_srt_file("test.srt", "shortened.srt", 42)
```
...they are split into multiple fragments and time code is adjusted to the
approximate proportional length of each segment while staying inside the time
slot for the fragment.
```
1
00:00:00,000 --> 00:00:02,132
Good evening. I appreciate you giving me
2
00:00:02,132 --> 00:00:04,000
a few minutes of your time tonight
3
00:00:04,000 --> 00:00:06,458
so I can discuss with you a complex and
4
00:00:06,458 --> 00:00:08,979
difficult issue, an issue that is one of
5
00:00:08,979 --> 00:00:11,000
the most profound of our time.
```
## Adjust Whisper subtitle lengths
Is is also possible to work with the subtitle items with the following utility methods:
```python
split_subtitle(sub: srt.Subtitle, target_chars: int=42, start_from_index: int=1) -> list[srt.Subtitle]:
whisper_result_to_srt(segments: list[dict]) -> list[srt.Subtitle]:
```
Here is an example of how to reduce the lingth of subtitles created by Whisper. It assumes you have an audio file to transcribe called gwb.wav.
```python
import whisper
from srt_equalizer import srt_equalizer
import srt
from datetime import timedelta
options_dict = {"task" : "transcribe", "language": "en"}
model = whisper.load_model("small")
result = model.transcribe("gwb.wav", language="en")
segments = result["segments"]
subs = srt_equalizer.whisper_result_to_srt(segments)
# Reduce line lenth in the whisper result to <= 42 chars
equalized = []
for sub in subs:
equalized.extend(srt_equalizer.split_subtitle(sub, 42))
for i in equalized:
print(i.content)
```
## Contributing
This library is built with [Poetry](https://python-poetry.org). Checkout this repo and run `poetry install` in the source folder. To run tests use `poetry run pytest tests`.
If you want to explore the library start a `poetry shell`.
Raw data
{
"_id": null,
"home_page": "https://github.com/peterk/srt_equalizer",
"name": "srt-equalizer",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "Peter Krantz",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/89/59/d8d2586ba8a67171d4b70f264961413385963a1713310e6a5f435d02fd5a/srt_equalizer-0.1.7.tar.gz",
"platform": null,
"description": "# SRT Equalizer\n\nA Python module to transform subtitle line lengths, splitting into multiple subtitle\nfragments if necessary. Useful to adjust automatic speech recognition outputs from e.g. [Whisper](https://github.com/openai/whisper) to a more convenient size.\n\nThis library works for all languages where spaces separate words.\n\n## Installing\n\n`pip install srt_equalizer`\n\n## Example\n\nIf the SRT file contains lines over a certain length like this:\n\n```\n1\n00:00:00,000 --> 00:00:04,000\nGood evening. I appreciate you giving me a few minutes of your time tonight\n\n2\n00:00:04,000 --> 00:00:11,000\nso I can discuss with you a complex and difficult issue, an issue that is one of the most profound of our time.\n```\n\nUsing this code to shorten the subtitles to a maximum length of 42 chars:\n\n```python\n\nfrom srt_equalizer import srt_equalizer\n\nsrt_equalizer.equalize_srt_file(\"test.srt\", \"shortened.srt\", 42)\n```\n\n...they are split into multiple fragments and time code is adjusted to the\napproximate proportional length of each segment while staying inside the time\nslot for the fragment.\n\n```\n1\n00:00:00,000 --> 00:00:02,132\nGood evening. I appreciate you giving me\n\n2\n00:00:02,132 --> 00:00:04,000\na few minutes of your time tonight\n\n3\n00:00:04,000 --> 00:00:06,458\nso I can discuss with you a complex and\n\n4\n00:00:06,458 --> 00:00:08,979\ndifficult issue, an issue that is one of\n\n5\n00:00:08,979 --> 00:00:11,000\nthe most profound of our time.\n```\n\n## Adjust Whisper subtitle lengths\nIs is also possible to work with the subtitle items with the following utility methods:\n\n```python\nsplit_subtitle(sub: srt.Subtitle, target_chars: int=42, start_from_index: int=1) -> list[srt.Subtitle]:\n\nwhisper_result_to_srt(segments: list[dict]) -> list[srt.Subtitle]:\n```\n\nHere is an example of how to reduce the lingth of subtitles created by Whisper. It assumes you have an audio file to transcribe called gwb.wav.\n\n```python\nimport whisper\nfrom srt_equalizer import srt_equalizer\nimport srt\nfrom datetime import timedelta\n\noptions_dict = {\"task\" : \"transcribe\", \"language\": \"en\"}\nmodel = whisper.load_model(\"small\")\nresult = model.transcribe(\"gwb.wav\", language=\"en\")\nsegments = result[\"segments\"]\nsubs = srt_equalizer.whisper_result_to_srt(segments)\n\n# Reduce line lenth in the whisper result to <= 42 chars\nequalized = []\nfor sub in subs:\n equalized.extend(srt_equalizer.split_subtitle(sub, 42))\n\nfor i in equalized:\n print(i.content)\n```\n\n## Contributing\n\nThis library is built with [Poetry](https://python-poetry.org). Checkout this repo and run `poetry install` in the source folder. To run tests use `poetry run pytest tests`.\n\nIf you want to explore the library start a `poetry shell`.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Transform subtitle line lengths, splitting into multiple subtitle fragments if necessary. ",
"version": "0.1.7",
"project_urls": {
"Homepage": "https://github.com/peterk/srt_equalizer",
"Repository": "https://github.com/peterk/srt_equalizer"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5903f929ba54c593082e4edcd388585b0899dcc346a99f059895d5b75979264c",
"md5": "f67bc932ef1c9733cf47db9f0aa206ab",
"sha256": "3ecc73daec914d5aed4a970eb5bca45fa6c0ff51e91ceea4a20be87f446c9f1f"
},
"downloads": -1,
"filename": "srt_equalizer-0.1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f67bc932ef1c9733cf47db9f0aa206ab",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 4633,
"upload_time": "2023-07-02T17:06:20",
"upload_time_iso_8601": "2023-07-02T17:06:20.868999Z",
"url": "https://files.pythonhosted.org/packages/59/03/f929ba54c593082e4edcd388585b0899dcc346a99f059895d5b75979264c/srt_equalizer-0.1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8959d8d2586ba8a67171d4b70f264961413385963a1713310e6a5f435d02fd5a",
"md5": "7e9415d9757259641a4dffef61df2b28",
"sha256": "49567f6646957635ed766f70fe12e8cd47a63d95c109a0c10de2973a3d050202"
},
"downloads": -1,
"filename": "srt_equalizer-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "7e9415d9757259641a4dffef61df2b28",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 3958,
"upload_time": "2023-07-02T17:06:22",
"upload_time_iso_8601": "2023-07-02T17:06:22.380968Z",
"url": "https://files.pythonhosted.org/packages/89/59/d8d2586ba8a67171d4b70f264961413385963a1713310e6a5f435d02fd5a/srt_equalizer-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-02 17:06:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "peterk",
"github_project": "srt_equalizer",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "srt-equalizer"
}