smoothtext


Namesmoothtext JSON
Version 0.3.0 PyPI version JSON
download
home_pageNone
SummaryA Python library for text readability analysis, supporting multiple languages.
upload_time2025-02-16 21:50:05
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.10
licenseMIT
keywords readability text-processing nlp linguistics text-analysis syllables english german turkish atesman bezirci-yilmaz flesch flesch-kincaid wiener-sachtextformel
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SmoothText

---

[![Tests](https://github.com/smoothtext/smoothtext/actions/workflows/main.yml/badge.svg)](https://github.com/smoothtext/smoothtext/actions)
[![license](https://img.shields.io/github/license/smoothtext/smoothtext.svg)](https://github.com/smoothtext/smoothtext/blob/main/LICENSE)
[![versions](https://img.shields.io/pypi/pyversions/smoothtext.svg)](https://github.com/smoothtext/smoothtext)
[![pypi](https://img.shields.io/pypi/v/smoothtext.svg)](https://pypi.org/project/smoothtext/)
[![downloads](https://static.pepy.tech/personalized-badge/smoothtext?period=total&units=international_system&left_color=grey&right_color=orange&left_text=pip%20downloads)](https://pypi.org/project/smoothtext/)

---

## Introduction

SmoothText is a Python library for calculating readability scores of texts and statistical information for texts in
multiple languages.

The design principle of this library is to ensure high accuracy.

## Requirements

Python 3.10 or higher.

### External Dependencies

|                     Library                      |  Version   |           License            | Notes                                         |
|:------------------------------------------------:|:----------:|:----------------------------:|-----------------------------------------------|
|          [NLTK](https://www.nltk.org/)           | `>=3.9.1`  |         `Apache 2.0`         | Conditionally optional.                       |
| [Stanza](https://stanfordnlp.github.io/stanza/)  | `>=1.10.1` |         `Apache 2.0`         | Conditionally optional.                       |
|   [CMUdict](https://pypi.org/project/cmudict/)   | `>=1.0.32` |           `GPLv3+`           | Required if `Stanza` is the selected backend. |
| [Unidecode](https://pypi.org/project/Unidecode/) | `>=1.3.8`  |         `GNU GPLv2`          | Required.                                     |
|    [Pyphen](https://pypi.org/project/pyphen/)    | `>=0.17.0` | `GPL 2.0+/LGPL 2.1+/MPL 1.1` | Required.                                     |
|     [emoji](https://pypi.org/project/emoji/)     | `>=2.14.1` |            `BSD`             | Required.                                     |

Either NLTK or Stanza must be installed and used with the SmoothText library.

## Features

### Readability Analysis

SmoothText can calculate readability scores of text in the following languages, using the following formulas.

| Formula/Language                                                                                                                                                                                                                             | English |         German          |                                                                                                                                Turkish                                                                                                                                |
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------:|:-----------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| [Flesch Reading Ease](https://scholar.google.com/scholar?as_sdt=0%2C5&q=A+New+Readability+Yardstick+R+Flesch&btnG=)                                                                                                                          |    ✔    |            ✔            |                                                          ✔ [Ateşman](https://scholar.google.com/scholar?as_sdt=0%2C5&q=T%C3%BCrk%C3%A7ede+Okunabilirli%C4%9Fin+%C3%96l%C3%A7%C3%BClmesi+Ate%C5%9Fman&btnG=)                                                           |
| [Flesch-Kincaid Grade](https://scholar.google.com/scholar?as_sdt=0%2C5&q=Derivation+of+new+readability+formulas+%28automated+readability+index%2C+fog+count+and+flesch+reading+ease+formula%29+for+navy+enlisted+personnel&btnG=)            |    ✔    | ✔ Wiener Sachtextformel | ✔ [Bezirci-Yılmaz](https://scholar.google.com/scholar?as_sdt=0%2C5&q=Metinlerin+okunabilirli%C4%9Finin+%C3%B6l%C3%A7%C3%BClmesi+%C3%BCzerine+bir+yazilim+k%C3%BCt%C3%BCphanesi+ve+T%C3%BCrk%C3%A7e+i%C3%A7in+yeni+bir+okunabilirlik+%C3%B6l%C3%A7%C3%BCt%C3%BC&btnG=) |
| [Flesch-Kincaid Grade Simplified](https://scholar.google.com/scholar?as_sdt=0%2C5&q=Derivation+of+new+readability+formulas+%28automated+readability+index%2C+fog+count+and+flesch+reading+ease+formula%29+for+navy+enlisted+personnel&btnG=) |    ✔    |            ❌            |                                                                                                                                   ❌                                                                                                                                   |

**Notes:**

English:

- Formulas work best with US English. However, SmoothText supports both US English and GB English.

German:

- **Flesch Reading Ease** is applicable to German texts. SmoothText handles the language-specific adaptations of the
  formula.
- **Wiener Sachtextformel** is the German adaptation of **Flesch-Kincaid Grade**.

Turkish:

- **Ateşman** is the Turkish adaptation of **Flesch Reading Ease**.
- **Bezirci-Yılmaz** is the Turkish adaptation of **Flesch-Kincaid Grade**.

### Sentencizing, Tokenizing, and Syllabifying

SmoothText can extract sentences, words, or syllables from texts.

### Reading Time

SmoothText can calculate how long would a text take to read.

## Installation

You can install SmoothText via `pip`.

```Python
pip install smoothtext
```

## Usage

### Importing and Initializing the Library

SmoothText comes with four submodules: `Backend`, `Language`, `ReadabilityFormula` and `SmoothText`.

```Python
from smoothtext import Backend, Language, ReadabilityFormula, SmoothText
```

#### Instancing

SmoothText was not designed to be used with static methods. Thus, an instance must be created to access its methods.

When creating an instance, the language and the backend to be used with it can be specified.

The following will create a new SmoothText instance configured to be used with the English language (by default, the
United States variant) using NLTK as the backend.

```Python
st = SmoothText('en', 'nltk')
```

Once an instance is created, its backend cannot be changed, but its working language can be changed at any time.

```Python
st.language = 'tr'  # Now configured to work with Turkish.
st.language = 'en-gb'  # Switching back to English, but to the United Kingdom variant.
```

#### Readying the Backends

When an instance is created, the instance will first attempt to import and download the required backend/language data.
To avoid this, and to prepare the required packages in advance, we can use the static `SmoothText.prepare()` method.

```Python
SmoothText.prepare('nltk', 'en,tr')  # Preparing NLTK to be used with English and Turkish
```

### Computing Readability Scores

Each language has its own set of readability formulas. When computing the readability score of a text in a language, one
of the supporting formulas must be used. Using SmoothText, there are three ways to perform this calculation.

```Python
text: str = 'Forrest Gump is a 1994 American comedy-drama film directed by Robert Zemeckis.'  # https://en.wikipedia.org/wiki/Forrest_Gump

# Generic computation method
st.compute_readability(text, ReadabilityFormula.Flesch_Reading_Ease)

# Using instance as a callable for generic computation
st(text, ReadabilityFormula.Flesch_Reading_Ease)

# Specific formula method
st.flesch_reading_ease(text)
```

### Tokenizing and Calculating Text Statistics

SmoothText is designed to work with sentences, words/tokens, and syllables.

```Python
text = 'This is a test sentence. This is another test sentence. This is a third test sentence.'

st.count_sentences(text)
# Output: 3

st.count_words(text)
# Output: 14

st.count_syllables(text)
# Output: 21
```

### Other Features

Refer to the documentation for a complete list of available methods.

## Inconsistencies

### Backend Related Inconsistencies

- NLTK and Stanza have different tokenization rules. This may cause differences in the number of tokens/sentences between the two backends.

### Language Related Inconsistencies

- The syllabification of words may differ within the same language variant. For example, the word "hello" has two syllables in American English but one in British English.

## Documentation

See [here](https://smoothtext.github.io/) for API documentation.

## License

SmoothText has an MIT license. See [LICENSE](./LICENSE).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "smoothtext",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": "readability, text-processing, nlp, linguistics, text-analysis, syllables, english, german, turkish, atesman, bezirci-yilmaz, flesch, flesch-kincaid, wiener-sachtextformel",
    "author": null,
    "author_email": "Tu\u011frul G\u00fcng\u00f6r <contact@tugrulgungor.me>",
    "download_url": "https://files.pythonhosted.org/packages/20/60/b165903853a31a944b1b47de182bd39b893567a1d55223a7076c96bfe57b/smoothtext-0.3.0.tar.gz",
    "platform": null,
    "description": "# SmoothText\r\n\r\n---\r\n\r\n[![Tests](https://github.com/smoothtext/smoothtext/actions/workflows/main.yml/badge.svg)](https://github.com/smoothtext/smoothtext/actions)\r\n[![license](https://img.shields.io/github/license/smoothtext/smoothtext.svg)](https://github.com/smoothtext/smoothtext/blob/main/LICENSE)\r\n[![versions](https://img.shields.io/pypi/pyversions/smoothtext.svg)](https://github.com/smoothtext/smoothtext)\r\n[![pypi](https://img.shields.io/pypi/v/smoothtext.svg)](https://pypi.org/project/smoothtext/)\r\n[![downloads](https://static.pepy.tech/personalized-badge/smoothtext?period=total&units=international_system&left_color=grey&right_color=orange&left_text=pip%20downloads)](https://pypi.org/project/smoothtext/)\r\n\r\n---\r\n\r\n## Introduction\r\n\r\nSmoothText is a Python library for calculating readability scores of texts and statistical information for texts in\r\nmultiple languages.\r\n\r\nThe design principle of this library is to ensure high accuracy.\r\n\r\n## Requirements\r\n\r\nPython 3.10 or higher.\r\n\r\n### External Dependencies\r\n\r\n|                     Library                      |  Version   |           License            | Notes                                         |\r\n|:------------------------------------------------:|:----------:|:----------------------------:|-----------------------------------------------|\r\n|          [NLTK](https://www.nltk.org/)           | `>=3.9.1`  |         `Apache 2.0`         | Conditionally optional.                       |\r\n| [Stanza](https://stanfordnlp.github.io/stanza/)  | `>=1.10.1` |         `Apache 2.0`         | Conditionally optional.                       |\r\n|   [CMUdict](https://pypi.org/project/cmudict/)   | `>=1.0.32` |           `GPLv3+`           | Required if `Stanza` is the selected backend. |\r\n| [Unidecode](https://pypi.org/project/Unidecode/) | `>=1.3.8`  |         `GNU GPLv2`          | Required.                                     |\r\n|    [Pyphen](https://pypi.org/project/pyphen/)    | `>=0.17.0` | `GPL 2.0+/LGPL 2.1+/MPL 1.1` | Required.                                     |\r\n|     [emoji](https://pypi.org/project/emoji/)     | `>=2.14.1` |            `BSD`             | Required.                                     |\r\n\r\nEither NLTK or Stanza must be installed and used with the SmoothText library.\r\n\r\n## Features\r\n\r\n### Readability Analysis\r\n\r\nSmoothText can calculate readability scores of text in the following languages, using the following formulas.\r\n\r\n| Formula/Language                                                                                                                                                                                                                             | English |         German          |                                                                                                                                Turkish                                                                                                                                |\r\n|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------:|:-----------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|\r\n| [Flesch Reading Ease](https://scholar.google.com/scholar?as_sdt=0%2C5&q=A+New+Readability+Yardstick+R+Flesch&btnG=)                                                                                                                          |    \u2714    |            \u2714            |                                                          \u2714 [Ate\u015fman](https://scholar.google.com/scholar?as_sdt=0%2C5&q=T%C3%BCrk%C3%A7ede+Okunabilirli%C4%9Fin+%C3%96l%C3%A7%C3%BClmesi+Ate%C5%9Fman&btnG=)                                                           |\r\n| [Flesch-Kincaid Grade](https://scholar.google.com/scholar?as_sdt=0%2C5&q=Derivation+of+new+readability+formulas+%28automated+readability+index%2C+fog+count+and+flesch+reading+ease+formula%29+for+navy+enlisted+personnel&btnG=)            |    \u2714    | \u2714 Wiener Sachtextformel | \u2714 [Bezirci-Y\u0131lmaz](https://scholar.google.com/scholar?as_sdt=0%2C5&q=Metinlerin+okunabilirli%C4%9Finin+%C3%B6l%C3%A7%C3%BClmesi+%C3%BCzerine+bir+yazilim+k%C3%BCt%C3%BCphanesi+ve+T%C3%BCrk%C3%A7e+i%C3%A7in+yeni+bir+okunabilirlik+%C3%B6l%C3%A7%C3%BCt%C3%BC&btnG=) |\r\n| [Flesch-Kincaid Grade Simplified](https://scholar.google.com/scholar?as_sdt=0%2C5&q=Derivation+of+new+readability+formulas+%28automated+readability+index%2C+fog+count+and+flesch+reading+ease+formula%29+for+navy+enlisted+personnel&btnG=) |    \u2714    |            \u274c            |                                                                                                                                   \u274c                                                                                                                                   |\r\n\r\n**Notes:**\r\n\r\nEnglish:\r\n\r\n- Formulas work best with US English. However, SmoothText supports both US English and GB English.\r\n\r\nGerman:\r\n\r\n- **Flesch Reading Ease** is applicable to German texts. SmoothText handles the language-specific adaptations of the\r\n  formula.\r\n- **Wiener Sachtextformel** is the German adaptation of **Flesch-Kincaid Grade**.\r\n\r\nTurkish:\r\n\r\n- **Ate\u015fman** is the Turkish adaptation of **Flesch Reading Ease**.\r\n- **Bezirci-Y\u0131lmaz** is the Turkish adaptation of **Flesch-Kincaid Grade**.\r\n\r\n### Sentencizing, Tokenizing, and Syllabifying\r\n\r\nSmoothText can extract sentences, words, or syllables from texts.\r\n\r\n### Reading Time\r\n\r\nSmoothText can calculate how long would a text take to read.\r\n\r\n## Installation\r\n\r\nYou can install SmoothText via `pip`.\r\n\r\n```Python\r\npip install smoothtext\r\n```\r\n\r\n## Usage\r\n\r\n### Importing and Initializing the Library\r\n\r\nSmoothText comes with four submodules: `Backend`, `Language`, `ReadabilityFormula` and `SmoothText`.\r\n\r\n```Python\r\nfrom smoothtext import Backend, Language, ReadabilityFormula, SmoothText\r\n```\r\n\r\n#### Instancing\r\n\r\nSmoothText was not designed to be used with static methods. Thus, an instance must be created to access its methods.\r\n\r\nWhen creating an instance, the language and the backend to be used with it can be specified.\r\n\r\nThe following will create a new SmoothText instance configured to be used with the English language (by default, the\r\nUnited States variant) using NLTK as the backend.\r\n\r\n```Python\r\nst = SmoothText('en', 'nltk')\r\n```\r\n\r\nOnce an instance is created, its backend cannot be changed, but its working language can be changed at any time.\r\n\r\n```Python\r\nst.language = 'tr'  # Now configured to work with Turkish.\r\nst.language = 'en-gb'  # Switching back to English, but to the United Kingdom variant.\r\n```\r\n\r\n#### Readying the Backends\r\n\r\nWhen an instance is created, the instance will first attempt to import and download the required backend/language data.\r\nTo avoid this, and to prepare the required packages in advance, we can use the static `SmoothText.prepare()` method.\r\n\r\n```Python\r\nSmoothText.prepare('nltk', 'en,tr')  # Preparing NLTK to be used with English and Turkish\r\n```\r\n\r\n### Computing Readability Scores\r\n\r\nEach language has its own set of readability formulas. When computing the readability score of a text in a language, one\r\nof the supporting formulas must be used. Using SmoothText, there are three ways to perform this calculation.\r\n\r\n```Python\r\ntext: str = 'Forrest Gump is a 1994 American comedy-drama film directed by Robert Zemeckis.'  # https://en.wikipedia.org/wiki/Forrest_Gump\r\n\r\n# Generic computation method\r\nst.compute_readability(text, ReadabilityFormula.Flesch_Reading_Ease)\r\n\r\n# Using instance as a callable for generic computation\r\nst(text, ReadabilityFormula.Flesch_Reading_Ease)\r\n\r\n# Specific formula method\r\nst.flesch_reading_ease(text)\r\n```\r\n\r\n### Tokenizing and Calculating Text Statistics\r\n\r\nSmoothText is designed to work with sentences, words/tokens, and syllables.\r\n\r\n```Python\r\ntext = 'This is a test sentence. This is another test sentence. This is a third test sentence.'\r\n\r\nst.count_sentences(text)\r\n# Output: 3\r\n\r\nst.count_words(text)\r\n# Output: 14\r\n\r\nst.count_syllables(text)\r\n# Output: 21\r\n```\r\n\r\n### Other Features\r\n\r\nRefer to the documentation for a complete list of available methods.\r\n\r\n## Inconsistencies\r\n\r\n### Backend Related Inconsistencies\r\n\r\n- NLTK and Stanza have different tokenization rules. This may cause differences in the number of tokens/sentences between the two backends.\r\n\r\n### Language Related Inconsistencies\r\n\r\n- The syllabification of words may differ within the same language variant. For example, the word \"hello\" has two syllables in American English but one in British English.\r\n\r\n## Documentation\r\n\r\nSee [here](https://smoothtext.github.io/) for API documentation.\r\n\r\n## License\r\n\r\nSmoothText has an MIT license. See [LICENSE](./LICENSE).\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python library for text readability analysis, supporting multiple languages.",
    "version": "0.3.0",
    "project_urls": {
        "Documentation": "https://smoothtext.github.io",
        "Homepage": "https://github.com/smoothtext/smoothtext",
        "Issues": "https://github.com/smoothtext/smoothtext/issues",
        "Source": "https://github.com/smoothtext/smoothtext"
    },
    "split_keywords": [
        "readability",
        " text-processing",
        " nlp",
        " linguistics",
        " text-analysis",
        " syllables",
        " english",
        " german",
        " turkish",
        " atesman",
        " bezirci-yilmaz",
        " flesch",
        " flesch-kincaid",
        " wiener-sachtextformel"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "80737cc11e34d08ad38c15fb1c6870daf1324f7c95d561e70162518afb803020",
                "md5": "0071f05f19eef50b19fb125b2eecf8cc",
                "sha256": "0a92fa91a564e22f958590d6dc737ef353ccc9f74b76865b2255b4e3f04bdaba"
            },
            "downloads": -1,
            "filename": "smoothtext-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0071f05f19eef50b19fb125b2eecf8cc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 22195,
            "upload_time": "2025-02-16T21:50:03",
            "upload_time_iso_8601": "2025-02-16T21:50:03.716805Z",
            "url": "https://files.pythonhosted.org/packages/80/73/7cc11e34d08ad38c15fb1c6870daf1324f7c95d561e70162518afb803020/smoothtext-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2060b165903853a31a944b1b47de182bd39b893567a1d55223a7076c96bfe57b",
                "md5": "481fa8713e36c4dadc850b177329aa01",
                "sha256": "d21f407777f878ae36d4ce0a7b86660914a6f2719a7cd7560556947076d5c55b"
            },
            "downloads": -1,
            "filename": "smoothtext-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "481fa8713e36c4dadc850b177329aa01",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 27501,
            "upload_time": "2025-02-16T21:50:05",
            "upload_time_iso_8601": "2025-02-16T21:50:05.656862Z",
            "url": "https://files.pythonhosted.org/packages/20/60/b165903853a31a944b1b47de182bd39b893567a1d55223a7076c96bfe57b/smoothtext-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-16 21:50:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "smoothtext",
    "github_project": "smoothtext",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "smoothtext"
}
        
Elapsed time: 1.40367s