plaintext-analyzer


Nameplaintext-analyzer JSON
Version 0.1.13 PyPI version JSON
download
home_pagehttps://github.com/qishe-nlp/plaintext-analyzer
Summary
upload_time2023-05-15 09:11:28
maintainer
docs_urlNone
authorPhoenix Grey
requires_python>=3.8,<4.0
license
keywords vocabulary phrases text nlp
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Installation from pip3

```shell
pip3 install --verbose plaintext_analyzer 
python -m spacy download en_core_web_trf
python -m spacy download es_dep_news_trf
```

# Usage

Please refer to [api docs](https://qishe-nlp.github.io/plaintext-analyzer/).

### Excutable usage

* Get vocabularies from plaintext file 

```shell
pta_vocab --source en_plaintext.txt --stype FILE --lang en  
``` 

* Get vocabularies from text 

```shell
pta_vocab --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en  
``` 

* Get vocabularies from plaintext file, and write to csv files 

```shell
pta_vocab --source en_plaintext.txt --stype FILE --lang en --dstname en_vocab
``` 

* Get vocabularies from text, and write to csv file 

```shell
pta_vocab --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en --dstname en_vocab 
``` 

* Get phrases from plaintext file 

```shell
pta_phrase --source en_plaintext.txt --stype FILE --lang en  
``` 

* Get phrases from text 

```shell
pta_phrase --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en  
``` 

* Get phrases from plaintext file, and write to csv files 

```shell
pta_phrase --source en_plaintext.txt --stype FILE --lang en --dstname en_phrase
``` 

* Get phrases from text, and write to csv file 

```shell
pta_phrase --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en --dstname en_phrase 
``` 

### Package usage
```
def parser_vocab(source, stype, lang):

  sf = PlaintextReader(source, stype, lang)
  sens = sf.sentences

  analyzer = VocabAnalyzer(lang)
  exs = analyzer.overview_vocabs(sens)

  print(exs)

def parser_phrase(source, stype, lang):

  sf = PlaintextReader(source, stype, lang)
  sens = sf.sentences

  analyzer = PhraseAnalyzer(lang)
  exs = analyzer.overview_phrases(sens)

  print(exs)

```

# Development

### Clone project
```
git clone https://github.com/qishe-nlp/plaintext-analyzer.git
```

### Install [poetry](https://python-poetry.org/docs/)

### Install dependencies
```
poetry update
```

### Test
```
poetry run pytest -rP
```
which run tests under `tests/*`

### Execute
```
poetry run pta_vocab --help
poetry run pta_phrase --help
```

### Create sphinx docs
```
poetry shell
cd apidocs
sphinx-apidoc -f -o source ../plaintext_analyzer
make html
python -m http.server -d build/html
```

### Host docs on github pages
```
cp -rf apidocs/build/html/* docs/
```

### Build
* Change `version` in `pyproject.toml` and `plaintext_analyzer/__init__.py`
* Build python package by `poetry build`

### Git commit and push

### Publish from local dev env

* Set pypi test environment variables in poetry, refer to [poetry doc](https://python-poetry.org/docs/repositories/)
* Publish to pypi test by `poetry publish -r test`

### Publish through CI 

* Github action build and publish package to [test pypi repo](https://test.pypi.org/)

```
git tag [x.x.x]
git push origin master
```

* Manually publish to [pypi repo](https://pypi.org/) through [github action](https://github.com/qishe-nlp/plaintext-analyzer/actions/workflows/pypi.yml)


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/qishe-nlp/plaintext-analyzer",
    "name": "plaintext-analyzer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "vocabulary,phrases,text,nlp",
    "author": "Phoenix Grey",
    "author_email": "phoenix.grey0108@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/96/7f/b24abc74f452c1b92ce59bcfe2d546b357a5b9a5acac09f57dffb3c1170e/plaintext_analyzer-0.1.13.tar.gz",
    "platform": null,
    "description": "# Installation from pip3\n\n```shell\npip3 install --verbose plaintext_analyzer \npython -m spacy download en_core_web_trf\npython -m spacy download es_dep_news_trf\n```\n\n# Usage\n\nPlease refer to [api docs](https://qishe-nlp.github.io/plaintext-analyzer/).\n\n### Excutable usage\n\n* Get vocabularies from plaintext file \n\n```shell\npta_vocab --source en_plaintext.txt --stype FILE --lang en  \n``` \n\n* Get vocabularies from text \n\n```shell\npta_vocab --source \"The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular.\" --stype RAW --lang en  \n``` \n\n* Get vocabularies from plaintext file, and write to csv files \n\n```shell\npta_vocab --source en_plaintext.txt --stype FILE --lang en --dstname en_vocab\n``` \n\n* Get vocabularies from text, and write to csv file \n\n```shell\npta_vocab --source \"The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular.\" --stype RAW --lang en --dstname en_vocab \n``` \n\n* Get phrases from plaintext file \n\n```shell\npta_phrase --source en_plaintext.txt --stype FILE --lang en  \n``` \n\n* Get phrases from text \n\n```shell\npta_phrase --source \"The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular.\" --stype RAW --lang en  \n``` \n\n* Get phrases from plaintext file, and write to csv files \n\n```shell\npta_phrase --source en_plaintext.txt --stype FILE --lang en --dstname en_phrase\n``` \n\n* Get phrases from text, and write to csv file \n\n```shell\npta_phrase --source \"The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular.\" --stype RAW --lang en --dstname en_phrase \n``` \n\n### Package usage\n```\ndef parser_vocab(source, stype, lang):\n\n  sf = PlaintextReader(source, stype, lang)\n  sens = sf.sentences\n\n  analyzer = VocabAnalyzer(lang)\n  exs = analyzer.overview_vocabs(sens)\n\n  print(exs)\n\ndef parser_phrase(source, stype, lang):\n\n  sf = PlaintextReader(source, stype, lang)\n  sens = sf.sentences\n\n  analyzer = PhraseAnalyzer(lang)\n  exs = analyzer.overview_phrases(sens)\n\n  print(exs)\n\n```\n\n# Development\n\n### Clone project\n```\ngit clone https://github.com/qishe-nlp/plaintext-analyzer.git\n```\n\n### Install [poetry](https://python-poetry.org/docs/)\n\n### Install dependencies\n```\npoetry update\n```\n\n### Test\n```\npoetry run pytest -rP\n```\nwhich run tests under `tests/*`\n\n### Execute\n```\npoetry run pta_vocab --help\npoetry run pta_phrase --help\n```\n\n### Create sphinx docs\n```\npoetry shell\ncd apidocs\nsphinx-apidoc -f -o source ../plaintext_analyzer\nmake html\npython -m http.server -d build/html\n```\n\n### Host docs on github pages\n```\ncp -rf apidocs/build/html/* docs/\n```\n\n### Build\n* Change `version` in `pyproject.toml` and `plaintext_analyzer/__init__.py`\n* Build python package by `poetry build`\n\n### Git commit and push\n\n### Publish from local dev env\n\n* Set pypi test environment variables in poetry, refer to [poetry doc](https://python-poetry.org/docs/repositories/)\n* Publish to pypi test by `poetry publish -r test`\n\n### Publish through CI \n\n* Github action build and publish package to [test pypi repo](https://test.pypi.org/)\n\n```\ngit tag [x.x.x]\ngit push origin master\n```\n\n* Manually publish to [pypi repo](https://pypi.org/) through [github action](https://github.com/qishe-nlp/plaintext-analyzer/actions/workflows/pypi.yml)\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "",
    "version": "0.1.13",
    "project_urls": {
        "Documentation": "https://qishe-nlp.github.io/plaintext-analyzer/",
        "Homepage": "https://github.com/qishe-nlp/plaintext-analyzer",
        "Repository": "https://github.com/qishe-nlp/plaintext-analyzer"
    },
    "split_keywords": [
        "vocabulary",
        "phrases",
        "text",
        "nlp"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "594e9cd181ace81c07c59b4d09defa18db556e41912e5ba04383beccf5a433eb",
                "md5": "15147bb0f06346f53d0759bdafc617d4",
                "sha256": "69c693ebc58dce423b0b5b0a5d93373442e9fd38bd812e56550dad76422c9b69"
            },
            "downloads": -1,
            "filename": "plaintext_analyzer-0.1.13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "15147bb0f06346f53d0759bdafc617d4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 8286,
            "upload_time": "2023-05-15T09:11:27",
            "upload_time_iso_8601": "2023-05-15T09:11:27.284315Z",
            "url": "https://files.pythonhosted.org/packages/59/4e/9cd181ace81c07c59b4d09defa18db556e41912e5ba04383beccf5a433eb/plaintext_analyzer-0.1.13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "967fb24abc74f452c1b92ce59bcfe2d546b357a5b9a5acac09f57dffb3c1170e",
                "md5": "45f6a844b7401088267cc43aa17129a8",
                "sha256": "12a13a9490ab206e0f5328adb9bb32c5b1e98e9c8b22c3121398bb9d3e201cfd"
            },
            "downloads": -1,
            "filename": "plaintext_analyzer-0.1.13.tar.gz",
            "has_sig": false,
            "md5_digest": "45f6a844b7401088267cc43aa17129a8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 6204,
            "upload_time": "2023-05-15T09:11:28",
            "upload_time_iso_8601": "2023-05-15T09:11:28.802156Z",
            "url": "https://files.pythonhosted.org/packages/96/7f/b24abc74f452c1b92ce59bcfe2d546b357a5b9a5acac09f57dffb3c1170e/plaintext_analyzer-0.1.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-15 09:11:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "qishe-nlp",
    "github_project": "plaintext-analyzer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "plaintext-analyzer"
}
        
Elapsed time: 0.62864s