# semsim
Compare texts easily with `semsim` Python package!
## Features
- Dozens of **parameters** to be tuned by you for better performance!
- **Default values** of all the parameters validated on datasets for paraphrase detection task
- 6 different **algorithms** for efficient syntax tree comparison
- A small pack of **standard "built-in" models** which can be easily downloaded via `semsim` package itself
- Flexible **class taxonomy** which you can extend by simply inheriting from one of the model base classes
- Python library `semsim` with **command line interface** (powered by `click`)
## Dependencies
- attrs
- click
- networkx
- numpy
- pymorphy2
- scipy
- simple_elmo
- tensorflow
- tensorrt
- textract
- torch
- torch-geometric
- torch-scatter
- torch-sparse
- torchwordemb
- tqdm
- ufal.udpipe
## Quick start
To install `semsim` simply run:
`pip install semsim`
---
> **NOTE**: If you encounter problems when installing `semsim` package,
> consider first installing some prerequisites in advance:
> `$ pip install torch tensorflow tensorrt`
> Then proceed to install `semsim`.
---
Now you can use `semsim` CLI tool as follows:
`$ semsim first_src.txt second_src.txt -o output.txt`
You might want to download standard "built-in" (or we should say "add-on") models for better performance.
This can be done by executing the following line:
`$ semsim download cbow`
for fetching pretrained CBOW embeddings or
`$ semsim download -a`
for downloading **all** the add-ons at once in parallel.
More info can be found on the [documentation](https://pysemsim.readthedocs.io) page.
## Codestyle linters and test frameworks
This library has been fully checked and tested with the following tools:
- flake8
- mypy
- pydocstyle
- pytest
## Interface
CLI interface is described in the [examples](https://pysemsim.readthedocs.io/examples)
section of [documentation](https://pysemsim.readthedocs.io).
This is how you can use `semsim` CLI tool:
`$ semsim compare first_src.txt second_src.txt -e cbow -k neural -o output.txt --max-out-pairs 200 -v`
## Authors
- [Mathematician2000](https://gitlab.com/Mathematician2000)
Raw data
{
"_id": null,
"home_page": "https://gitlab.com/Mathematician2000/semsim",
"name": "semsim",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "",
"keywords": "NLP,dependency parsing,CoNLL-U,sentence similarity",
"author": "David Avagyan",
"author_email": "david_avagyan@list.ru",
"download_url": "https://files.pythonhosted.org/packages/84/b4/a9dff7c56a7fef02cd95e4f23f373f4f07bd97f0d495c1f2882810a7f944/semsim-1.1.1.tar.gz",
"platform": null,
"description": "# semsim\n\nCompare texts easily with `semsim` Python package!\n\n## Features\n- Dozens of **parameters** to be tuned by you for better performance!\n- **Default values** of all the parameters validated on datasets for paraphrase detection task\n- 6 different **algorithms** for efficient syntax tree comparison\n- A small pack of **standard \"built-in\" models** which can be easily downloaded via `semsim` package itself\n- Flexible **class taxonomy** which you can extend by simply inheriting from one of the model base classes\n- Python library `semsim` with **command line interface** (powered by `click`)\n\n## Dependencies\n- attrs\n- click\n- networkx\n- numpy\n- pymorphy2\n- scipy\n- simple_elmo\n- tensorflow\n- tensorrt\n- textract\n- torch\n- torch-geometric\n- torch-scatter\n- torch-sparse\n- torchwordemb\n- tqdm\n- ufal.udpipe\n\n## Quick start\nTo install `semsim` simply run:\n\n`pip install semsim`\n\n---\n> **NOTE**: If you encounter problems when installing `semsim` package,\n> consider first installing some prerequisites in advance:\n> `$ pip install torch tensorflow tensorrt`\n> Then proceed to install `semsim`.\n---\n\nNow you can use `semsim` CLI tool as follows:\n\n`$ semsim first_src.txt second_src.txt -o output.txt`\n\nYou might want to download standard \"built-in\" (or we should say \"add-on\") models for better performance.\nThis can be done by executing the following line:\n\n`$ semsim download cbow`\n\nfor fetching pretrained CBOW embeddings or\n\n`$ semsim download -a`\n\nfor downloading **all** the add-ons at once in parallel.\n\nMore info can be found on the [documentation](https://pysemsim.readthedocs.io) page.\n\n## Codestyle linters and test frameworks\nThis library has been fully checked and tested with the following tools:\n- flake8\n- mypy\n- pydocstyle\n- pytest\n\n## Interface\nCLI interface is described in the [examples](https://pysemsim.readthedocs.io/examples)\nsection of [documentation](https://pysemsim.readthedocs.io).\nThis is how you can use `semsim` CLI tool:\n\n`$ semsim compare first_src.txt second_src.txt -e cbow -k neural -o output.txt --max-out-pairs 200 -v`\n\n## Authors\n- [Mathematician2000](https://gitlab.com/Mathematician2000)\n",
"bugtrack_url": null,
"license": "BSD 3-Clause License",
"summary": "A free tool for sentence similarity evaluation",
"version": "1.1.1",
"project_urls": {
"Documentation": "https://pysemsim.readthedocs.io/",
"Homepage": "https://gitlab.com/Mathematician2000/semsim"
},
"split_keywords": [
"nlp",
"dependency parsing",
"conll-u",
"sentence similarity"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6bd778426c1345af25c759ee6cab4c3f52692d36f528b8f019814a3e2f8e77c4",
"md5": "c486f2bfa273108fd468f7b511b03c59",
"sha256": "e37c55e38f72696fd6c4549bfc20baad1cf035d372023d0209270f3dbcebbe27"
},
"downloads": -1,
"filename": "semsim-1.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c486f2bfa273108fd468f7b511b03c59",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 34526,
"upload_time": "2023-05-08T19:19:19",
"upload_time_iso_8601": "2023-05-08T19:19:19.990530Z",
"url": "https://files.pythonhosted.org/packages/6b/d7/78426c1345af25c759ee6cab4c3f52692d36f528b8f019814a3e2f8e77c4/semsim-1.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "84b4a9dff7c56a7fef02cd95e4f23f373f4f07bd97f0d495c1f2882810a7f944",
"md5": "b929054f95c3d3cdc5b9f6724b79ab37",
"sha256": "7f54318463115d6fcef6e9434cad3f48ef995b68c5969b852e3710c31b92ea6f"
},
"downloads": -1,
"filename": "semsim-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "b929054f95c3d3cdc5b9f6724b79ab37",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 29359,
"upload_time": "2023-05-08T19:19:21",
"upload_time_iso_8601": "2023-05-08T19:19:21.592510Z",
"url": "https://files.pythonhosted.org/packages/84/b4/a9dff7c56a7fef02cd95e4f23f373f4f07bd97f0d495c1f2882810a7f944/semsim-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-08 19:19:21",
"github": false,
"gitlab": true,
"bitbucket": false,
"codeberg": false,
"gitlab_user": "Mathematician2000",
"gitlab_project": "semsim",
"lcname": "semsim"
}