# Installation from pip3
```shell
pip3 install --verbose subtitle_analyzer
python -m spacy download en_core_web_trf
python -m spacy download es_dep_news_trf
python -m spacy download de_dep_news_trf
```
# Usage
Please refer to [api docs](https://qishe-nlp.github.io/subtitle-analyzer/).
### Excutable usage
Since the x2cdict needs environment variables `DICT_DB_HOST` and `DEEPL_AUTH_KEY`, so **Don't forget!!!**.
* Write ass file with vocabulary information
```shell
sta_vocab --srtfile movie.srt --lang en --assfile en_vocab.ass --external False
```
* Write ass file with phrase information
```shell
sta_phrase --srtfile movie.srt --lang en --assfile en_phrase.ass --external False
```
### Package usage
```
from subtitlecore import Subtitle
from subtitle_analyzer import VocabAnalyzer, PhraseAnalyzer
from subtitle_analyzer import VocabASSWriter, PhraseASSWriter
import json
def subtitle_vocab(srtfile, lang, assfile, external):
phase = {"step": 1, "msg": "Start sentenizing"}
print(json.dumps(phase), flush=True)
sf = Subtitle(srtfile, lang)
sens = sf.sentenize()
for e in sens:
print(e)
phase = {"step": 2, "msg": "Finish sentenizing"}
print(json.dumps(phase), flush=True)
analyzer = VocabAnalyzer(lang)
exs = analyzer.get_line_vocabs(sens, external)
shown = exs[:20]
phase = {"step": 3, "msg": "Finish vocabs dictionary lookup", "vocabs": shown}
print(json.dumps(phase), flush=True)
if assfile:
ass_writer = VocabASSWriter(srtfile)
ass_writer.write(exs, assfile, {"animation": False})
phase = {"step": 4, "msg": "Finish ass saving"}
print(json.dumps(phase), flush=True)
def subtitle_phrase(srtfile, lang, assfile, external):
phase = {"step": 1, "msg": "Start sentenizing"}
print(json.dumps(phase), flush=True)
sf = Subtitle(srtfile, lang)
sens = sf.sentenize()
for e in sens:
print(e)
phase = {"step": 2, "msg": "Finish sentenizing"}
print(json.dumps(phase), flush=True)
analyzer = PhraseAnalyzer(lang)
exs = analyzer.get_line_phrases(sens, external)
phase = {"step": 3, "msg": "Finish phrases dictionary lookup", "vocabs": exs[:10]}
print(json.dumps(phase), flush=True)
if assfile:
ass_writer = PhraseASSWriter(srtfile)
ass_writer.write(exs, assfile, {"animation": False})
phase = {"step": 4, "msg": "Finish ass saving"}
print(json.dumps(phase), flush=True)
```
# Development
### Clone project
```
git clone https://github.com/qishe-nlp/subtitle-analyzer.git
```
### Install [poetry](https://python-poetry.org/docs/)
### Install dependencies
```
poetry update
```
### Test
```
poetry run pytest -rP
```
which run tests under `tests/*`
### Execute
```
poetry run sta_vocab --help
poetry run sta_phrase --help
```
### Create sphinx docs
```
poetry shell
cd apidocs
sphinx-apidoc -f -o source ../subtitle_analyzer
make html
python -m http.server -d build/html
```
### Host docs on github pages
```
cp -rf apidocs/build/html/* docs/
```
### Build
* Change `version` in `pyproject.toml` and `subtitle_analyzer/__init__.py`
* Build python package by `poetry build`
### Git commit and push
### Publish from local dev env
* Set pypi test environment variables in poetry, refer to [poetry doc](https://python-poetry.org/docs/repositories/)
* Publish to pypi test by `poetry publish -r test`
### Publish through CI
* Github action build and publish package to [test pypi repo](https://test.pypi.org/)
```
git tag [x.x.x]
git push origin master
```
* Manually publish to [pypi repo](https://pypi.org/) through [github action](https://github.com/qishe-nlp/subtitle-analyzer/actions/workflows/pypi.yml)
Raw data
{
"_id": null,
"home_page": "https://github.com/qishe-nlp/subtitle-analyzer",
"name": "subtitle-analyzer",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "vocabulary,phrases,subtitle,nlp",
"author": "Phoenix Grey",
"author_email": "phoenix.grey0108@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/a0/5c/f2f9216a5d6ccd1af833fb9512799ec6e692b935a163fc302e87c68657ad/subtitle_analyzer-0.1.23.tar.gz",
"platform": null,
"description": "# Installation from pip3\n\n```shell\npip3 install --verbose subtitle_analyzer\npython -m spacy download en_core_web_trf\npython -m spacy download es_dep_news_trf\npython -m spacy download de_dep_news_trf\n```\n\n# Usage\n\nPlease refer to [api docs](https://qishe-nlp.github.io/subtitle-analyzer/).\n\n### Excutable usage\n\nSince the x2cdict needs environment variables `DICT_DB_HOST` and `DEEPL_AUTH_KEY`, so **Don't forget!!!**.\n\n* Write ass file with vocabulary information\n\n```shell\nsta_vocab --srtfile movie.srt --lang en --assfile en_vocab.ass --external False\n``` \n\n* Write ass file with phrase information \n\n```shell\nsta_phrase --srtfile movie.srt --lang en --assfile en_phrase.ass --external False\n```\n\n### Package usage\n```\nfrom subtitlecore import Subtitle\nfrom subtitle_analyzer import VocabAnalyzer, PhraseAnalyzer\nfrom subtitle_analyzer import VocabASSWriter, PhraseASSWriter\nimport json\n\ndef subtitle_vocab(srtfile, lang, assfile, external):\n\n phase = {\"step\": 1, \"msg\": \"Start sentenizing\"}\n print(json.dumps(phase), flush=True)\n\n sf = Subtitle(srtfile, lang)\n sens = sf.sentenize()\n for e in sens:\n print(e)\n\n phase = {\"step\": 2, \"msg\": \"Finish sentenizing\"}\n print(json.dumps(phase), flush=True)\n\n analyzer = VocabAnalyzer(lang)\n exs = analyzer.get_line_vocabs(sens, external)\n shown = exs[:20]\n\n phase = {\"step\": 3, \"msg\": \"Finish vocabs dictionary lookup\", \"vocabs\": shown}\n print(json.dumps(phase), flush=True)\n\n if assfile:\n ass_writer = VocabASSWriter(srtfile)\n ass_writer.write(exs, assfile, {\"animation\": False})\n \n phase = {\"step\": 4, \"msg\": \"Finish ass saving\"} \n print(json.dumps(phase), flush=True)\n\ndef subtitle_phrase(srtfile, lang, assfile, external):\n\n phase = {\"step\": 1, \"msg\": \"Start sentenizing\"}\n print(json.dumps(phase), flush=True)\n\n sf = Subtitle(srtfile, lang)\n sens = sf.sentenize()\n for e in sens:\n print(e)\n\n phase = {\"step\": 2, \"msg\": \"Finish sentenizing\"}\n print(json.dumps(phase), flush=True)\n\n analyzer = PhraseAnalyzer(lang)\n exs = analyzer.get_line_phrases(sens, external)\n\n phase = {\"step\": 3, \"msg\": \"Finish phrases dictionary lookup\", \"vocabs\": exs[:10]}\n print(json.dumps(phase), flush=True)\n\n if assfile:\n ass_writer = PhraseASSWriter(srtfile)\n ass_writer.write(exs, assfile, {\"animation\": False})\n \n phase = {\"step\": 4, \"msg\": \"Finish ass saving\"} \n print(json.dumps(phase), flush=True)\n```\n\n# Development\n\n### Clone project\n```\ngit clone https://github.com/qishe-nlp/subtitle-analyzer.git\n```\n\n### Install [poetry](https://python-poetry.org/docs/)\n\n### Install dependencies\n```\npoetry update\n```\n\n### Test\n```\npoetry run pytest -rP\n```\nwhich run tests under `tests/*`\n\n### Execute\n```\npoetry run sta_vocab --help\npoetry run sta_phrase --help\n```\n\n### Create sphinx docs\n```\npoetry shell\ncd apidocs\nsphinx-apidoc -f -o source ../subtitle_analyzer\nmake html\npython -m http.server -d build/html\n```\n\n### Host docs on github pages\n```\ncp -rf apidocs/build/html/* docs/\n```\n\n### Build\n* Change `version` in `pyproject.toml` and `subtitle_analyzer/__init__.py`\n* Build python package by `poetry build`\n\n### Git commit and push\n\n### Publish from local dev env\n* Set pypi test environment variables in poetry, refer to [poetry doc](https://python-poetry.org/docs/repositories/)\n* Publish to pypi test by `poetry publish -r test`\n\n### Publish through CI \n\n* Github action build and publish package to [test pypi repo](https://test.pypi.org/)\n\n```\ngit tag [x.x.x]\ngit push origin master\n```\n\n* Manually publish to [pypi repo](https://pypi.org/) through [github action](https://github.com/qishe-nlp/subtitle-analyzer/actions/workflows/pypi.yml)\n\n",
"bugtrack_url": null,
"license": "",
"summary": "",
"version": "0.1.23",
"split_keywords": [
"vocabulary",
"phrases",
"subtitle",
"nlp"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8f0f68271706debf2e6e77a7c68f28b39fd10a15e2f828865724707e56ddbfa3",
"md5": "fd5ebc07e41029d02acdfa4b80ddf962",
"sha256": "22f101f9e9ea759fff3c119f6f95d23d979a5eef609f5a34785476bd3a3d1967"
},
"downloads": -1,
"filename": "subtitle_analyzer-0.1.23-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fd5ebc07e41029d02acdfa4b80ddf962",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 11604,
"upload_time": "2023-04-03T04:36:09",
"upload_time_iso_8601": "2023-04-03T04:36:09.824742Z",
"url": "https://files.pythonhosted.org/packages/8f/0f/68271706debf2e6e77a7c68f28b39fd10a15e2f828865724707e56ddbfa3/subtitle_analyzer-0.1.23-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a05cf2f9216a5d6ccd1af833fb9512799ec6e692b935a163fc302e87c68657ad",
"md5": "0feb1fa94347fdeb7fd766538ac1108f",
"sha256": "d979776ffc1bb7d215013a30f9910bbb84de8974328c1dde8b85d62df44824d6"
},
"downloads": -1,
"filename": "subtitle_analyzer-0.1.23.tar.gz",
"has_sig": false,
"md5_digest": "0feb1fa94347fdeb7fd766538ac1108f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 8296,
"upload_time": "2023-04-03T04:36:11",
"upload_time_iso_8601": "2023-04-03T04:36:11.062974Z",
"url": "https://files.pythonhosted.org/packages/a0/5c/f2f9216a5d6ccd1af833fb9512799ec6e692b935a163fc302e87c68657ad/subtitle_analyzer-0.1.23.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-03 04:36:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "qishe-nlp",
"github_project": "subtitle-analyzer",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "subtitle-analyzer"
}