subtitle-analyzer


Namesubtitle-analyzer JSON
Version 0.1.23 PyPI version JSON
download
home_pagehttps://github.com/qishe-nlp/subtitle-analyzer
Summary
upload_time2023-04-03 04:36:11
maintainer
docs_urlNone
authorPhoenix Grey
requires_python>=3.8,<4.0
license
keywords vocabulary phrases subtitle nlp
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Installation from pip3

```shell
pip3 install --verbose subtitle_analyzer
python -m spacy download en_core_web_trf
python -m spacy download es_dep_news_trf
python -m spacy download de_dep_news_trf
```

# Usage

Please refer to [api docs](https://qishe-nlp.github.io/subtitle-analyzer/).

### Excutable usage

Since the x2cdict needs environment variables `DICT_DB_HOST` and `DEEPL_AUTH_KEY`, so **Don't forget!!!**.

* Write ass file with vocabulary information

```shell
sta_vocab --srtfile movie.srt --lang en --assfile en_vocab.ass --external False
``` 

* Write ass file with phrase information 

```shell
sta_phrase --srtfile movie.srt --lang en --assfile en_phrase.ass --external False
```

### Package usage
```
from subtitlecore import Subtitle
from subtitle_analyzer import VocabAnalyzer, PhraseAnalyzer
from subtitle_analyzer import VocabASSWriter, PhraseASSWriter
import json

def subtitle_vocab(srtfile, lang, assfile, external):

  phase = {"step": 1, "msg": "Start sentenizing"}
  print(json.dumps(phase), flush=True)

  sf = Subtitle(srtfile, lang)
  sens = sf.sentenize()
  for e in sens:
    print(e)

  phase = {"step": 2, "msg": "Finish sentenizing"}
  print(json.dumps(phase), flush=True)

  analyzer = VocabAnalyzer(lang)
  exs = analyzer.get_line_vocabs(sens, external)
  shown = exs[:20]

  phase = {"step": 3, "msg": "Finish vocabs dictionary lookup", "vocabs": shown}
  print(json.dumps(phase), flush=True)

  if assfile:
    ass_writer = VocabASSWriter(srtfile)
    ass_writer.write(exs, assfile, {"animation": False})
    
    phase = {"step": 4, "msg": "Finish ass saving"} 
    print(json.dumps(phase), flush=True)

def subtitle_phrase(srtfile, lang, assfile, external):

  phase = {"step": 1, "msg": "Start sentenizing"}
  print(json.dumps(phase), flush=True)

  sf = Subtitle(srtfile, lang)
  sens = sf.sentenize()
  for e in sens:
    print(e)

  phase = {"step": 2, "msg": "Finish sentenizing"}
  print(json.dumps(phase), flush=True)

  analyzer = PhraseAnalyzer(lang)
  exs = analyzer.get_line_phrases(sens, external)

  phase = {"step": 3, "msg": "Finish phrases dictionary lookup", "vocabs": exs[:10]}
  print(json.dumps(phase), flush=True)

  if assfile:
    ass_writer = PhraseASSWriter(srtfile)
    ass_writer.write(exs, assfile, {"animation": False})
    
    phase = {"step": 4, "msg": "Finish ass saving"} 
    print(json.dumps(phase), flush=True)
```

# Development

### Clone project
```
git clone https://github.com/qishe-nlp/subtitle-analyzer.git
```

### Install [poetry](https://python-poetry.org/docs/)

### Install dependencies
```
poetry update
```

### Test
```
poetry run pytest -rP
```
which run tests under `tests/*`

### Execute
```
poetry run sta_vocab --help
poetry run sta_phrase --help
```

### Create sphinx docs
```
poetry shell
cd apidocs
sphinx-apidoc -f -o source ../subtitle_analyzer
make html
python -m http.server -d build/html
```

### Host docs on github pages
```
cp -rf apidocs/build/html/* docs/
```

### Build
* Change `version` in `pyproject.toml` and `subtitle_analyzer/__init__.py`
* Build python package by `poetry build`

### Git commit and push

### Publish from local dev env
* Set pypi test environment variables in poetry, refer to [poetry doc](https://python-poetry.org/docs/repositories/)
* Publish to pypi test by `poetry publish -r test`

### Publish through CI 

* Github action build and publish package to [test pypi repo](https://test.pypi.org/)

```
git tag [x.x.x]
git push origin master
```

* Manually publish to [pypi repo](https://pypi.org/) through [github action](https://github.com/qishe-nlp/subtitle-analyzer/actions/workflows/pypi.yml)


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/qishe-nlp/subtitle-analyzer",
    "name": "subtitle-analyzer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "vocabulary,phrases,subtitle,nlp",
    "author": "Phoenix Grey",
    "author_email": "phoenix.grey0108@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a0/5c/f2f9216a5d6ccd1af833fb9512799ec6e692b935a163fc302e87c68657ad/subtitle_analyzer-0.1.23.tar.gz",
    "platform": null,
    "description": "# Installation from pip3\n\n```shell\npip3 install --verbose subtitle_analyzer\npython -m spacy download en_core_web_trf\npython -m spacy download es_dep_news_trf\npython -m spacy download de_dep_news_trf\n```\n\n# Usage\n\nPlease refer to [api docs](https://qishe-nlp.github.io/subtitle-analyzer/).\n\n### Excutable usage\n\nSince the x2cdict needs environment variables `DICT_DB_HOST` and `DEEPL_AUTH_KEY`, so **Don't forget!!!**.\n\n* Write ass file with vocabulary information\n\n```shell\nsta_vocab --srtfile movie.srt --lang en --assfile en_vocab.ass --external False\n``` \n\n* Write ass file with phrase information \n\n```shell\nsta_phrase --srtfile movie.srt --lang en --assfile en_phrase.ass --external False\n```\n\n### Package usage\n```\nfrom subtitlecore import Subtitle\nfrom subtitle_analyzer import VocabAnalyzer, PhraseAnalyzer\nfrom subtitle_analyzer import VocabASSWriter, PhraseASSWriter\nimport json\n\ndef subtitle_vocab(srtfile, lang, assfile, external):\n\n  phase = {\"step\": 1, \"msg\": \"Start sentenizing\"}\n  print(json.dumps(phase), flush=True)\n\n  sf = Subtitle(srtfile, lang)\n  sens = sf.sentenize()\n  for e in sens:\n    print(e)\n\n  phase = {\"step\": 2, \"msg\": \"Finish sentenizing\"}\n  print(json.dumps(phase), flush=True)\n\n  analyzer = VocabAnalyzer(lang)\n  exs = analyzer.get_line_vocabs(sens, external)\n  shown = exs[:20]\n\n  phase = {\"step\": 3, \"msg\": \"Finish vocabs dictionary lookup\", \"vocabs\": shown}\n  print(json.dumps(phase), flush=True)\n\n  if assfile:\n    ass_writer = VocabASSWriter(srtfile)\n    ass_writer.write(exs, assfile, {\"animation\": False})\n    \n    phase = {\"step\": 4, \"msg\": \"Finish ass saving\"} \n    print(json.dumps(phase), flush=True)\n\ndef subtitle_phrase(srtfile, lang, assfile, external):\n\n  phase = {\"step\": 1, \"msg\": \"Start sentenizing\"}\n  print(json.dumps(phase), flush=True)\n\n  sf = Subtitle(srtfile, lang)\n  sens = sf.sentenize()\n  for e in sens:\n    print(e)\n\n  phase = {\"step\": 2, \"msg\": \"Finish sentenizing\"}\n  print(json.dumps(phase), flush=True)\n\n  analyzer = PhraseAnalyzer(lang)\n  exs = analyzer.get_line_phrases(sens, external)\n\n  phase = {\"step\": 3, \"msg\": \"Finish phrases dictionary lookup\", \"vocabs\": exs[:10]}\n  print(json.dumps(phase), flush=True)\n\n  if assfile:\n    ass_writer = PhraseASSWriter(srtfile)\n    ass_writer.write(exs, assfile, {\"animation\": False})\n    \n    phase = {\"step\": 4, \"msg\": \"Finish ass saving\"} \n    print(json.dumps(phase), flush=True)\n```\n\n# Development\n\n### Clone project\n```\ngit clone https://github.com/qishe-nlp/subtitle-analyzer.git\n```\n\n### Install [poetry](https://python-poetry.org/docs/)\n\n### Install dependencies\n```\npoetry update\n```\n\n### Test\n```\npoetry run pytest -rP\n```\nwhich run tests under `tests/*`\n\n### Execute\n```\npoetry run sta_vocab --help\npoetry run sta_phrase --help\n```\n\n### Create sphinx docs\n```\npoetry shell\ncd apidocs\nsphinx-apidoc -f -o source ../subtitle_analyzer\nmake html\npython -m http.server -d build/html\n```\n\n### Host docs on github pages\n```\ncp -rf apidocs/build/html/* docs/\n```\n\n### Build\n* Change `version` in `pyproject.toml` and `subtitle_analyzer/__init__.py`\n* Build python package by `poetry build`\n\n### Git commit and push\n\n### Publish from local dev env\n* Set pypi test environment variables in poetry, refer to [poetry doc](https://python-poetry.org/docs/repositories/)\n* Publish to pypi test by `poetry publish -r test`\n\n### Publish through CI \n\n* Github action build and publish package to [test pypi repo](https://test.pypi.org/)\n\n```\ngit tag [x.x.x]\ngit push origin master\n```\n\n* Manually publish to [pypi repo](https://pypi.org/) through [github action](https://github.com/qishe-nlp/subtitle-analyzer/actions/workflows/pypi.yml)\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "",
    "version": "0.1.23",
    "split_keywords": [
        "vocabulary",
        "phrases",
        "subtitle",
        "nlp"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8f0f68271706debf2e6e77a7c68f28b39fd10a15e2f828865724707e56ddbfa3",
                "md5": "fd5ebc07e41029d02acdfa4b80ddf962",
                "sha256": "22f101f9e9ea759fff3c119f6f95d23d979a5eef609f5a34785476bd3a3d1967"
            },
            "downloads": -1,
            "filename": "subtitle_analyzer-0.1.23-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fd5ebc07e41029d02acdfa4b80ddf962",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 11604,
            "upload_time": "2023-04-03T04:36:09",
            "upload_time_iso_8601": "2023-04-03T04:36:09.824742Z",
            "url": "https://files.pythonhosted.org/packages/8f/0f/68271706debf2e6e77a7c68f28b39fd10a15e2f828865724707e56ddbfa3/subtitle_analyzer-0.1.23-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a05cf2f9216a5d6ccd1af833fb9512799ec6e692b935a163fc302e87c68657ad",
                "md5": "0feb1fa94347fdeb7fd766538ac1108f",
                "sha256": "d979776ffc1bb7d215013a30f9910bbb84de8974328c1dde8b85d62df44824d6"
            },
            "downloads": -1,
            "filename": "subtitle_analyzer-0.1.23.tar.gz",
            "has_sig": false,
            "md5_digest": "0feb1fa94347fdeb7fd766538ac1108f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 8296,
            "upload_time": "2023-04-03T04:36:11",
            "upload_time_iso_8601": "2023-04-03T04:36:11.062974Z",
            "url": "https://files.pythonhosted.org/packages/a0/5c/f2f9216a5d6ccd1af833fb9512799ec6e692b935a163fc302e87c68657ad/subtitle_analyzer-0.1.23.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-03 04:36:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "qishe-nlp",
    "github_project": "subtitle-analyzer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "subtitle-analyzer"
}
        
Elapsed time: 0.12330s