# italian-ats-evalautor
This is an open source project to evaluate the performance of an italian ATS (Automatic Text Simplifier) on a set of texts.
You can analyze a single text extracting the following features:
- Overall:
- Number of tokens
- Number of tokens (including punctuation)
- Number of characters
- Number of characters (including punctuation)
- Number of words
- Number of syllables
- Number of unique lemmas
- Number of sentences
- Readability:
- Type-Token Ratio (TTR)
- Gulpease Index
- Flesch-Vacca Index
- Lexical Density
- Part of Speech (POS) distribution
- Verbs distribution
- Active Verbs
- Passive Verbs
- Italian Basic Vocabulary (NVdB) from [Il Nuovo vocabolario di base della lingua italiana, Tullio De Mauro](https://dizionario.internazionale.it/)
- All
- FO (Fundamentals)
- AU (High Usage)
- AD (High Availability)
You can also compare two texts and get the following metrics:
- Semantic:
- Semantic Similarity
- Character diff:
- Edit Distance
- Token diff:
- Amount of tokens added
- Amount of tokens removed
- Amount of VdB tokens removed
- Amount of VdB tokens added
## Installation
```bash
pip install italian-ats-evaluator
```
## Usage
```python
from italian_ats_evaluator import TextAnalyzer
result = TextAnalyzer(
text="Il gatto mangia il topo",
spacy_model_name="it_core_news_lg"
)
```
```python
from italian_ats_evaluator import SimplificationAnalyzer
result = SimplificationAnalyzer(
reference_text="Il felino mangia il roditore",
simplified_text="Il gatto mangia il topo",
spacy_model_name="it_core_news_lg",
sentence_transformers_model_name="intfloat/multilingual-e5-base"
)
```
## Development
Create a virtual environment
```bash
python3 -m venv venv
source venv/bin/activate
```
Install the package in editable mode
```bash
pip install -e .
```
## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
## Acknowledgements
This contribution is a result of the research conducted within the framework of the PRIN 2020 (Progetti di Rilevante Interesse Nazionale) “VerbACxSS: on analytic verbs, complexity, synthetic verbs, and simplification. For accessibility” (Prot. 2020BJKB9M), funded by the Italian Ministero dell’Università e della Ricerca.
## License
[MIT](https://choosealicense.com/licenses/mit/)
Raw data
{
"_id": null,
"home_page": null,
"name": "italian-ats-evaluator",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "RedHitMark <russodivito.marco@gmail.com>",
"keywords": "ats, text, simplification, italian, nlp",
"author": null,
"author_email": "RedHitMark <russodivito.marco@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/17/5a/c694957e2c77e3a4d0d4aceaa12b7960439dc353d6fb95874a9f99e358c9/italian_ats_evaluator-2.0.6.tar.gz",
"platform": null,
"description": "# italian-ats-evalautor\nThis is an open source project to evaluate the performance of an italian ATS (Automatic Text Simplifier) on a set of texts.\n\nYou can analyze a single text extracting the following features:\n- Overall:\n - Number of tokens\n - Number of tokens (including punctuation)\n - Number of characters\n - Number of characters (including punctuation)\n - Number of words\n - Number of syllables\n - Number of unique lemmas\n - Number of sentences\n- Readability:\n - Type-Token Ratio (TTR)\n - Gulpease Index\n - Flesch-Vacca Index\n - Lexical Density\n- Part of Speech (POS) distribution\n- Verbs distribution\n - Active Verbs\n - Passive Verbs\n- Italian Basic Vocabulary (NVdB) from [Il Nuovo vocabolario di base della lingua italiana, Tullio De Mauro](https://dizionario.internazionale.it/)\n - All\n - FO (Fundamentals)\n - AU (High Usage)\n - AD (High Availability)\n\n\nYou can also compare two texts and get the following metrics:\n- Semantic:\n - Semantic Similarity \n- Character diff:\n - Edit Distance\n- Token diff:\n - Amount of tokens added\n - Amount of tokens removed\n - Amount of VdB tokens removed\n - Amount of VdB tokens added\n\n\n## Installation\n```bash\npip install italian-ats-evaluator\n```\n\n## Usage\n\n```python\nfrom italian_ats_evaluator import TextAnalyzer\n\nresult = TextAnalyzer(\n text=\"Il gatto mangia il topo\",\n spacy_model_name=\"it_core_news_lg\"\n)\n```\n\n```python\nfrom italian_ats_evaluator import SimplificationAnalyzer\n\nresult = SimplificationAnalyzer(\n reference_text=\"Il felino mangia il roditore\",\n simplified_text=\"Il gatto mangia il topo\",\n spacy_model_name=\"it_core_news_lg\",\n sentence_transformers_model_name=\"intfloat/multilingual-e5-base\"\n)\n```\n\n## Development\nCreate a virtual environment\n```bash\npython3 -m venv venv\nsource venv/bin/activate\n```\nInstall the package in editable mode\n```bash\npip install -e .\n```\n\n## Contributing\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\n\n## Acknowledgements\nThis contribution is a result of the research conducted within the framework of the PRIN 2020 (Progetti di Rilevante Interesse Nazionale) \u201cVerbACxSS: on analytic verbs, complexity, synthetic verbs, and simplification. For accessibility\u201d (Prot. 2020BJKB9M), funded by the Italian Ministero dell\u2019Universit\u00e0 e della Ricerca.\n\n## License\n[MIT](https://choosealicense.com/licenses/mit/)\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Italian ATS Evaluator",
"version": "2.0.6",
"project_urls": {
"Issues": "https://github.com/RedHitMark/italian-ats-evaluator/issues",
"Repository": "https://github.com/RedHitMark/italian-ats-evaluator"
},
"split_keywords": [
"ats",
" text",
" simplification",
" italian",
" nlp"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bf8112b59becfee7e8709809d26f9451189777f01400526feec1814276a2dba8",
"md5": "1758480cf8fb134ab3cf8bf0d95b2321",
"sha256": "edaceadfeaa512b742cd7eca057cf8be5bedae4c53d9e5f74d9aea36cd589532"
},
"downloads": -1,
"filename": "italian_ats_evaluator-2.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1758480cf8fb134ab3cf8bf0d95b2321",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 35408,
"upload_time": "2024-07-13T15:18:20",
"upload_time_iso_8601": "2024-07-13T15:18:20.631498Z",
"url": "https://files.pythonhosted.org/packages/bf/81/12b59becfee7e8709809d26f9451189777f01400526feec1814276a2dba8/italian_ats_evaluator-2.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "175ac694957e2c77e3a4d0d4aceaa12b7960439dc353d6fb95874a9f99e358c9",
"md5": "51b8167420509b526e32da0c0417ee5d",
"sha256": "167d9799b7a26fb335d8fe709f881d18cb91e833873229331ed586f22062fbbe"
},
"downloads": -1,
"filename": "italian_ats_evaluator-2.0.6.tar.gz",
"has_sig": false,
"md5_digest": "51b8167420509b526e32da0c0417ee5d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 34445,
"upload_time": "2024-07-13T15:18:22",
"upload_time_iso_8601": "2024-07-13T15:18:22.250182Z",
"url": "https://files.pythonhosted.org/packages/17/5a/c694957e2c77e3a4d0d4aceaa12b7960439dc353d6fb95874a9f99e358c9/italian_ats_evaluator-2.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-13 15:18:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "RedHitMark",
"github_project": "italian-ats-evaluator",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "italian-ats-evaluator"
}