# Plagiarism Checker
![img](https://img.shields.io/pypi/v/sentence-plagiarism.svg)
![](https://img.shields.io/pypi/pyversions/sentence-plagiarism.svg)
![](https://img.shields.io/pypi/dm/sentence-plagiarism.svg)
This is a command-line tool for checking the similarity between a given text and a set of reference documents. The tool uses the Jaccard similarity algorithm to compare the input text with the reference documents.
## Installation
Install in an isolated environment using pipx (or normal pip):
```
pipx install sentence-plagiarism
```
## CLI Usage
To run the plagiarism checker, use the following command:
```sh
sentence-plagiarism <path-to-input-file> <path-to-reference-file-1> <path-to-reference-file-2> ... [--threshold <threshold-value>] [--output_file <path-to-output-file>] [--quiet]
```
- `<path-to-input-file>`: Path to the input file to be checked for plagiarism.
- `<path-to-reference-file-1> ...`: Paths to the reference files to compare against.
- `--threshold`: (optional) The minimum similarity score required to consider a sentence as plagiarized. The value should be between 0 and 1.
- `--output-file` (optional): Path to the output file to save the results in JSON format.
- `--quiet` (optional): Flag to suppress the display of similar sentences in the console.
## Example
The following command:
```sh
sentence-plagiarism input.txt --reference-files ref1.txt ref2.txt --similarity-threshold 0.8 --output-file results.json
```
can produce the following output on stdout:
```
Input Sentence: The retriever and seq2seq modules commence their operations as pretrained models, and through a joint fine-tuning process, they adapt collaboratively, thus enhancing both retrieval and generation for specific downstream tasks.
Reference Sentence: foobar The retriever and seq2seq modules commence their operations as pretrained models, and through a joint fine-tuning process, they adapt collaboratively, thus enhancing both retrieval and generation for specific downstream tasks.
Reference Document: ref1.txt
Similarity Score: 0.9667
Input Sentence: Closing thoughts For a comprehensive understanding of the RAG technique, we offer an in-depth exploration, commencing with a simplified overview and progressively delving into more intricate technical facets.
Reference Sentence: barfoo For a comprehensive understanding of the RAG technique, we offer an in-depth exploration, commencing with a simplified overview and progressively delving into more intricate technical facets.
Reference Document: ref2.txt
Similarity Score: 0.8966
Results saved to results.json
```
and save results to `results.json`.
## Programmatic Usage
```python
from sentence_plagiarism import check
check(
examined_file="txt/txt1.txt",
reference_files=["txt/txt2.txt", "txt/txt3.txt"],
similarity_threshold=0.8,
output_file=None,
quiet=False,
)
```
## License
Distributed under the MIT License. See `LICENSE` for more information.
## Contact
Krystian Safjan - ksafjan@gmail.com
Project Link: [https://github.com/izikeros/sentence-plagiarism](https://github.com/izikeros/sentence-plagiarism)
Raw data
{
"_id": null,
"home_page": "https://github.com/izikeros/sentence-plagiarism",
"name": "sentence-plagiarism",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "plagiarism,plagiarism-detection,text-similarity,sentence_similarity,sentence-plagiarism",
"author": "Krystian Safjan",
"author_email": "ksafjan@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b2/7f/274131b0750b9ecddd300c87f140572e1973d5910a73bc62b0279ac3beec/sentence_plagiarism-0.3.0.tar.gz",
"platform": null,
"description": "# Plagiarism Checker\n\n![img](https://img.shields.io/pypi/v/sentence-plagiarism.svg)\n![](https://img.shields.io/pypi/pyversions/sentence-plagiarism.svg)\n![](https://img.shields.io/pypi/dm/sentence-plagiarism.svg)\n\nThis is a command-line tool for checking the similarity between a given text and a set of reference documents. The tool uses the Jaccard similarity algorithm to compare the input text with the reference documents.\n\n## Installation\nInstall in an isolated environment using pipx (or normal pip):\n```\npipx install sentence-plagiarism\n```\n\n## CLI Usage\n\nTo run the plagiarism checker, use the following command:\n\n```sh\nsentence-plagiarism <path-to-input-file> <path-to-reference-file-1> <path-to-reference-file-2> ... [--threshold <threshold-value>] [--output_file <path-to-output-file>] [--quiet]\n```\n\n- `<path-to-input-file>`: Path to the input file to be checked for plagiarism.\n- `<path-to-reference-file-1> ...`: Paths to the reference files to compare against.\n- `--threshold`: (optional) The minimum similarity score required to consider a sentence as plagiarized. The value should be between 0 and 1.\n- `--output-file` (optional): Path to the output file to save the results in JSON format.\n- `--quiet` (optional): Flag to suppress the display of similar sentences in the console.\n\n## Example\n\nThe following command:\n```sh\nsentence-plagiarism input.txt --reference-files ref1.txt ref2.txt --similarity-threshold 0.8 --output-file results.json\n```\n\ncan produce the following output on stdout:\n```\nInput Sentence: The retriever and seq2seq modules commence their operations as pretrained models, and through a joint fine-tuning process, they adapt collaboratively, thus enhancing both retrieval and generation for specific downstream tasks.\nReference Sentence: foobar The retriever and seq2seq modules commence their operations as pretrained models, and through a joint fine-tuning process, they adapt collaboratively, thus enhancing both retrieval and generation for specific downstream tasks.\nReference Document: ref1.txt\nSimilarity Score: 0.9667\n\nInput Sentence: Closing thoughts For a comprehensive understanding of the RAG technique, we offer an in-depth exploration, commencing with a simplified overview and progressively delving into more intricate technical facets.\nReference Sentence: barfoo For a comprehensive understanding of the RAG technique, we offer an in-depth exploration, commencing with a simplified overview and progressively delving into more intricate technical facets.\nReference Document: ref2.txt\nSimilarity Score: 0.8966\n\nResults saved to results.json\n```\nand save results to `results.json`.\n\n## Programmatic Usage\n\n```python\nfrom sentence_plagiarism import check\n\ncheck(\n examined_file=\"txt/txt1.txt\",\n reference_files=[\"txt/txt2.txt\", \"txt/txt3.txt\"],\n similarity_threshold=0.8,\n output_file=None,\n quiet=False,\n)\n```\n\n## License\n\nDistributed under the MIT License. See `LICENSE` for more information.\n\n## Contact\n\nKrystian Safjan - ksafjan@gmail.com\n\nProject Link: [https://github.com/izikeros/sentence-plagiarism](https://github.com/izikeros/sentence-plagiarism)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Compare sentences from input document with all sentences from reference documents - find very similar ones.",
"version": "0.3.0",
"project_urls": {
"Documentation": "https://github.com/izikeros/sentence-plagiarism",
"Homepage": "https://github.com/izikeros/sentence-plagiarism",
"Repository": "https://github.com/izikeros/sentence-plagiarism"
},
"split_keywords": [
"plagiarism",
"plagiarism-detection",
"text-similarity",
"sentence_similarity",
"sentence-plagiarism"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "196dbc8f3159384584523d38ec6aaca0c35baab5493fe9925b69d5a00d44acc6",
"md5": "a61a52e9690edbc623c9d60134dee9ea",
"sha256": "54beb049d43f97f1f135d8dd17bed30dc398a9b6ae71d4df205c3e79c3263554"
},
"downloads": -1,
"filename": "sentence_plagiarism-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a61a52e9690edbc623c9d60134dee9ea",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 6251,
"upload_time": "2023-08-29T13:49:39",
"upload_time_iso_8601": "2023-08-29T13:49:39.255682Z",
"url": "https://files.pythonhosted.org/packages/19/6d/bc8f3159384584523d38ec6aaca0c35baab5493fe9925b69d5a00d44acc6/sentence_plagiarism-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b27f274131b0750b9ecddd300c87f140572e1973d5910a73bc62b0279ac3beec",
"md5": "3e5ca7cf7763cb2db623462b9237eed3",
"sha256": "b439e1682525355deb31ca02e3bd7977e92020ef9648b5093d278f4a5c3e778c"
},
"downloads": -1,
"filename": "sentence_plagiarism-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "3e5ca7cf7763cb2db623462b9237eed3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 4734,
"upload_time": "2023-08-29T13:49:40",
"upload_time_iso_8601": "2023-08-29T13:49:40.258936Z",
"url": "https://files.pythonhosted.org/packages/b2/7f/274131b0750b9ecddd300c87f140572e1973d5910a73bc62b0279ac3beec/sentence_plagiarism-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-29 13:49:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "izikeros",
"github_project": "sentence-plagiarism",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "sentence-plagiarism"
}