sentence-plagiarism


Namesentence-plagiarism JSON
Version 0.3.0 PyPI version JSON
download
home_pagehttps://github.com/izikeros/sentence-plagiarism
SummaryCompare sentences from input document with all sentences from reference documents - find very similar ones.
upload_time2023-08-29 13:49:40
maintainer
docs_urlNone
authorKrystian Safjan
requires_python>=3.9,<4.0
licenseMIT
keywords plagiarism plagiarism-detection text-similarity sentence_similarity sentence-plagiarism
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Plagiarism Checker

![img](https://img.shields.io/pypi/v/sentence-plagiarism.svg)
![](https://img.shields.io/pypi/pyversions/sentence-plagiarism.svg)
![](https://img.shields.io/pypi/dm/sentence-plagiarism.svg)

This is a command-line tool for checking the similarity between a given text and a set of reference documents. The tool uses the Jaccard similarity algorithm to compare the input text with the reference documents.

## Installation
Install in an isolated environment using pipx (or normal pip):
```
pipx install sentence-plagiarism
```

## CLI Usage

To run the plagiarism checker, use the following command:

```sh
sentence-plagiarism <path-to-input-file> <path-to-reference-file-1> <path-to-reference-file-2> ... [--threshold <threshold-value>] [--output_file <path-to-output-file>] [--quiet]
```

- `<path-to-input-file>`: Path to the input file to be checked for plagiarism.
- `<path-to-reference-file-1> ...`: Paths to the reference files to compare against.
- `--threshold`: (optional) The minimum similarity score required to consider a sentence as plagiarized. The value should be between 0 and 1.
- `--output-file` (optional): Path to the output file to save the results in JSON format.
- `--quiet` (optional): Flag to suppress the display of similar sentences in the console.

## Example

The following command:
```sh
sentence-plagiarism  input.txt --reference-files ref1.txt ref2.txt --similarity-threshold 0.8 --output-file results.json
```

can produce the following output on stdout:
```
Input Sentence:     The retriever and seq2seq modules commence their operations as pretrained models, and through a joint fine-tuning process, they adapt collaboratively, thus enhancing both retrieval and generation for specific downstream tasks.
Reference Sentence:  foobar  The retriever and seq2seq modules commence their operations as pretrained models, and through a joint fine-tuning process, they adapt collaboratively, thus enhancing both retrieval and generation for specific downstream tasks.
Reference Document: ref1.txt
Similarity Score: 0.9667

Input Sentence:      Closing thoughts  For a comprehensive understanding of the RAG technique, we offer an in-depth exploration, commencing with a simplified overview and progressively delving into more intricate technical facets.
Reference Sentence:  barfoo  For a comprehensive understanding of the RAG technique, we offer an in-depth exploration, commencing with a simplified overview and progressively delving into more intricate technical facets.
Reference Document: ref2.txt
Similarity Score: 0.8966

Results saved to results.json
```
and save results to `results.json`.

## Programmatic Usage

```python
from sentence_plagiarism import check

check(
    examined_file="txt/txt1.txt",
    reference_files=["txt/txt2.txt", "txt/txt3.txt"],
    similarity_threshold=0.8,
    output_file=None,
    quiet=False,
)
```

## License

Distributed under the MIT License. See `LICENSE` for more information.

## Contact

Krystian Safjan - ksafjan@gmail.com

Project Link: [https://github.com/izikeros/sentence-plagiarism](https://github.com/izikeros/sentence-plagiarism)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/izikeros/sentence-plagiarism",
    "name": "sentence-plagiarism",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "plagiarism,plagiarism-detection,text-similarity,sentence_similarity,sentence-plagiarism",
    "author": "Krystian Safjan",
    "author_email": "ksafjan@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b2/7f/274131b0750b9ecddd300c87f140572e1973d5910a73bc62b0279ac3beec/sentence_plagiarism-0.3.0.tar.gz",
    "platform": null,
    "description": "# Plagiarism Checker\n\n![img](https://img.shields.io/pypi/v/sentence-plagiarism.svg)\n![](https://img.shields.io/pypi/pyversions/sentence-plagiarism.svg)\n![](https://img.shields.io/pypi/dm/sentence-plagiarism.svg)\n\nThis is a command-line tool for checking the similarity between a given text and a set of reference documents. The tool uses the Jaccard similarity algorithm to compare the input text with the reference documents.\n\n## Installation\nInstall in an isolated environment using pipx (or normal pip):\n```\npipx install sentence-plagiarism\n```\n\n## CLI Usage\n\nTo run the plagiarism checker, use the following command:\n\n```sh\nsentence-plagiarism <path-to-input-file> <path-to-reference-file-1> <path-to-reference-file-2> ... [--threshold <threshold-value>] [--output_file <path-to-output-file>] [--quiet]\n```\n\n- `<path-to-input-file>`: Path to the input file to be checked for plagiarism.\n- `<path-to-reference-file-1> ...`: Paths to the reference files to compare against.\n- `--threshold`: (optional) The minimum similarity score required to consider a sentence as plagiarized. The value should be between 0 and 1.\n- `--output-file` (optional): Path to the output file to save the results in JSON format.\n- `--quiet` (optional): Flag to suppress the display of similar sentences in the console.\n\n## Example\n\nThe following command:\n```sh\nsentence-plagiarism  input.txt --reference-files ref1.txt ref2.txt --similarity-threshold 0.8 --output-file results.json\n```\n\ncan produce the following output on stdout:\n```\nInput Sentence:     The retriever and seq2seq modules commence their operations as pretrained models, and through a joint fine-tuning process, they adapt collaboratively, thus enhancing both retrieval and generation for specific downstream tasks.\nReference Sentence:  foobar  The retriever and seq2seq modules commence their operations as pretrained models, and through a joint fine-tuning process, they adapt collaboratively, thus enhancing both retrieval and generation for specific downstream tasks.\nReference Document: ref1.txt\nSimilarity Score: 0.9667\n\nInput Sentence:      Closing thoughts  For a comprehensive understanding of the RAG technique, we offer an in-depth exploration, commencing with a simplified overview and progressively delving into more intricate technical facets.\nReference Sentence:  barfoo  For a comprehensive understanding of the RAG technique, we offer an in-depth exploration, commencing with a simplified overview and progressively delving into more intricate technical facets.\nReference Document: ref2.txt\nSimilarity Score: 0.8966\n\nResults saved to results.json\n```\nand save results to `results.json`.\n\n## Programmatic Usage\n\n```python\nfrom sentence_plagiarism import check\n\ncheck(\n    examined_file=\"txt/txt1.txt\",\n    reference_files=[\"txt/txt2.txt\", \"txt/txt3.txt\"],\n    similarity_threshold=0.8,\n    output_file=None,\n    quiet=False,\n)\n```\n\n## License\n\nDistributed under the MIT License. See `LICENSE` for more information.\n\n## Contact\n\nKrystian Safjan - ksafjan@gmail.com\n\nProject Link: [https://github.com/izikeros/sentence-plagiarism](https://github.com/izikeros/sentence-plagiarism)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Compare sentences from input document with all sentences from reference documents - find very similar ones.",
    "version": "0.3.0",
    "project_urls": {
        "Documentation": "https://github.com/izikeros/sentence-plagiarism",
        "Homepage": "https://github.com/izikeros/sentence-plagiarism",
        "Repository": "https://github.com/izikeros/sentence-plagiarism"
    },
    "split_keywords": [
        "plagiarism",
        "plagiarism-detection",
        "text-similarity",
        "sentence_similarity",
        "sentence-plagiarism"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "196dbc8f3159384584523d38ec6aaca0c35baab5493fe9925b69d5a00d44acc6",
                "md5": "a61a52e9690edbc623c9d60134dee9ea",
                "sha256": "54beb049d43f97f1f135d8dd17bed30dc398a9b6ae71d4df205c3e79c3263554"
            },
            "downloads": -1,
            "filename": "sentence_plagiarism-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a61a52e9690edbc623c9d60134dee9ea",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 6251,
            "upload_time": "2023-08-29T13:49:39",
            "upload_time_iso_8601": "2023-08-29T13:49:39.255682Z",
            "url": "https://files.pythonhosted.org/packages/19/6d/bc8f3159384584523d38ec6aaca0c35baab5493fe9925b69d5a00d44acc6/sentence_plagiarism-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b27f274131b0750b9ecddd300c87f140572e1973d5910a73bc62b0279ac3beec",
                "md5": "3e5ca7cf7763cb2db623462b9237eed3",
                "sha256": "b439e1682525355deb31ca02e3bd7977e92020ef9648b5093d278f4a5c3e778c"
            },
            "downloads": -1,
            "filename": "sentence_plagiarism-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "3e5ca7cf7763cb2db623462b9237eed3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 4734,
            "upload_time": "2023-08-29T13:49:40",
            "upload_time_iso_8601": "2023-08-29T13:49:40.258936Z",
            "url": "https://files.pythonhosted.org/packages/b2/7f/274131b0750b9ecddd300c87f140572e1973d5910a73bc62b0279ac3beec/sentence_plagiarism-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-29 13:49:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "izikeros",
    "github_project": "sentence-plagiarism",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "sentence-plagiarism"
}
        
Elapsed time: 1.30862s