# italian-ats-evalautor
This is an open source project to evaluate the performance of an italian ATS (Automatic Text Simplifier) on a set of texts.
You can analyze a single text extracting the following features:
- Overall:
- Number of tokens
- Number of tokens (including punctuation)
- Number of characters
- Number of characters (including punctuation)
- Number of words
- Number of syllables
- Number of unique lemmas
- Number of sentences
- Readability:
- Type-Token Ratio (TTR)
- Gulpease Index
- Flesch-Vacca Index
- Lexical Density
- Part of Speech (POS) distribution
- Verbs distribution
- Active Verbs
- Passive Verbs
- Italian Basic Vocabulary (NVdB) from [Il Nuovo vocabolario di base della lingua italiana, Tullio De Mauro](https://dizionario.internazionale.it/)
- All
- FO (Fundamentals)
- AU (High Usage)
- AD (High Availability)
- Expression:
- Difficult connectives
- Latinisms
You can also compare two texts and get the following metrics:
- Semantic:
- Semantic Similarity
- Character diff:
- Edit Distance
- Token diff:
- Amount of tokens added
- Amount of tokens removed
- Amount of VdB tokens removed
- Amount of VdB tokens added
## Installation
```bash
pip install italian-ats-evaluator
```
## Usage
```python
from italian_ats_evaluator import TextAnalyzer
result = TextAnalyzer(
text="Il gatto mangia il topo",
spacy_model_name="it_core_news_lg"
)
```
```python
from italian_ats_evaluator import SimplificationAnalyzer
result = SimplificationAnalyzer(
reference_text="Il felino mangia il roditore",
simplified_text="Il gatto mangia il topo",
spacy_model_name="it_core_news_lg",
sentence_transformers_model_name="intfloat/multilingual-e5-base"
)
```
## Development
Create a virtual environment
```bash
python3 -m venv venv
source venv/bin/activate
```
Install the package in editable mode
```bash
pip install -e .
```
## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
## Acknowledgements
This contribution is a result of the research conducted within the framework of the PRIN 2020 (Progetti di Rilevante Interesse Nazionale) “VerbACxSS: on analytic verbs, complexity, synthetic verbs, and simplification. For accessibility” (Prot. 2020BJKB9M), funded by the Italian Ministero dell’Università e della Ricerca.
## License
[MIT](https://choosealicense.com/licenses/mit/)
Raw data
{
"_id": null,
"home_page": null,
"name": "italian-ats-evaluator",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "RedHitMark <russodivito.marco@gmail.com>",
"keywords": "ats, text, simplification, italian, nlp",
"author": null,
"author_email": "RedHitMark <russodivito.marco@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/8c/4a/086bc09f21020112f8dda60cdab2e2a3ff06cac72f2c671088aaa5e030bd/italian_ats_evaluator-2.0.9.tar.gz",
"platform": null,
"description": "# italian-ats-evalautor\nThis is an open source project to evaluate the performance of an italian ATS (Automatic Text Simplifier) on a set of texts.\n\nYou can analyze a single text extracting the following features:\n- Overall:\n - Number of tokens\n - Number of tokens (including punctuation)\n - Number of characters\n - Number of characters (including punctuation)\n - Number of words\n - Number of syllables\n - Number of unique lemmas\n - Number of sentences\n- Readability:\n - Type-Token Ratio (TTR)\n - Gulpease Index\n - Flesch-Vacca Index\n - Lexical Density\n- Part of Speech (POS) distribution\n- Verbs distribution\n - Active Verbs\n - Passive Verbs\n- Italian Basic Vocabulary (NVdB) from [Il Nuovo vocabolario di base della lingua italiana, Tullio De Mauro](https://dizionario.internazionale.it/)\n - All\n - FO (Fundamentals)\n - AU (High Usage)\n - AD (High Availability)\n- Expression:\n - Difficult connectives\n - Latinisms\n\nYou can also compare two texts and get the following metrics:\n- Semantic:\n - Semantic Similarity \n- Character diff:\n - Edit Distance\n- Token diff:\n - Amount of tokens added\n - Amount of tokens removed\n - Amount of VdB tokens removed\n - Amount of VdB tokens added\n\n\n## Installation\n```bash\npip install italian-ats-evaluator\n```\n\n## Usage\n\n```python\nfrom italian_ats_evaluator import TextAnalyzer\n\nresult = TextAnalyzer(\n text=\"Il gatto mangia il topo\",\n spacy_model_name=\"it_core_news_lg\"\n)\n```\n\n```python\nfrom italian_ats_evaluator import SimplificationAnalyzer\n\nresult = SimplificationAnalyzer(\n reference_text=\"Il felino mangia il roditore\",\n simplified_text=\"Il gatto mangia il topo\",\n spacy_model_name=\"it_core_news_lg\",\n sentence_transformers_model_name=\"intfloat/multilingual-e5-base\"\n)\n```\n\n## Development\nCreate a virtual environment\n```bash\npython3 -m venv venv\nsource venv/bin/activate\n```\nInstall the package in editable mode\n```bash\npip install -e .\n```\n\n## Contributing\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\n## Acknowledgements\nThis contribution is a result of the research conducted within the framework of the PRIN 2020 (Progetti di Rilevante Interesse Nazionale) \u201cVerbACxSS: on analytic verbs, complexity, synthetic verbs, and simplification. For accessibility\u201d (Prot. 2020BJKB9M), funded by the Italian Ministero dell\u2019Universit\u00e0 e della Ricerca.\n\n## License\n[MIT](https://choosealicense.com/licenses/mit/)\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Italian ATS Evaluator",
"version": "2.0.9",
"project_urls": {
"Issues": "https://github.com/RedHitMark/italian-ats-evaluator/issues",
"Repository": "https://github.com/RedHitMark/italian-ats-evaluator"
},
"split_keywords": [
"ats",
" text",
" simplification",
" italian",
" nlp"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "01ee019f8a644c802f6588b025e6838b0df6b3959fa46aade6fa4de1e1f2f75b",
"md5": "3ec1fcd3dec17a0b2fe62db2a2cf5ef7",
"sha256": "3226c011811b2cb62068cf0b5b3466f9f0eb2301fee5037f160b4a1b7e7877c1"
},
"downloads": -1,
"filename": "italian_ats_evaluator-2.0.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3ec1fcd3dec17a0b2fe62db2a2cf5ef7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 40568,
"upload_time": "2025-02-03T09:15:39",
"upload_time_iso_8601": "2025-02-03T09:15:39.217195Z",
"url": "https://files.pythonhosted.org/packages/01/ee/019f8a644c802f6588b025e6838b0df6b3959fa46aade6fa4de1e1f2f75b/italian_ats_evaluator-2.0.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8c4a086bc09f21020112f8dda60cdab2e2a3ff06cac72f2c671088aaa5e030bd",
"md5": "06e49df8b0e131198821efdaf3dba8c5",
"sha256": "147d6ab37173e648ab0fc93b4d1dea7a09d4432264b5aab03ed8c94cdb4d9151"
},
"downloads": -1,
"filename": "italian_ats_evaluator-2.0.9.tar.gz",
"has_sig": false,
"md5_digest": "06e49df8b0e131198821efdaf3dba8c5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 39091,
"upload_time": "2025-02-03T09:15:41",
"upload_time_iso_8601": "2025-02-03T09:15:41.066387Z",
"url": "https://files.pythonhosted.org/packages/8c/4a/086bc09f21020112f8dda60cdab2e2a3ff06cac72f2c671088aaa5e030bd/italian_ats_evaluator-2.0.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-03 09:15:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "RedHitMark",
"github_project": "italian-ats-evaluator",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "italian-ats-evaluator"
}