copy-spotter


Namecopy-spotter JSON
Version 0.1.16 PyPI version JSON
download
home_pagehttps://github.com/Wazzabeee/copy-spotter
SummaryMake plagiarism detection easier. This package will find similar sentences between given files and highlight them in a side by side comparison.
upload_time2024-05-09 18:17:48
maintainerNone
docs_urlNone
authorClément Delteil
requires_python>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Copy Spotter

![PyPI - Version](https://img.shields.io/pypi/v/copy-spotter) ![PyPI - License](https://img.shields.io/pypi/l/copy-spotter)
![Python](https://img.shields.io/badge/python-3.11-blue)


![GIF demo](https://raw.githubusercontent.com/Wazzabeee/copy-spotter/main/data/img/example.gif)

## About
This program will process pdf, txt, docx, and odt files that can be found in the given input directory, find similar sentences, calculate similarity percentage, display a similarity table with links to side by side comparison where similar sentences are highlighted.

**Usage**
---

```bash
$ pip install copy-spotter
$ copy-spotter [-s] [-o] [-h] input_directory
```
***Positional Arguments:***
* `input_directory`: One directory that contains all files (pdf, txt, docx, odt) (see `data/pdf/plagiarism` for example)

```
input_directory/
│
├── file_1.docx
├── file_2.pdf
└── file_3.pdf
```

***Optional Arguments:***
* `-s`, `--block-size`: Set minimum number of consecutive and similar words detected. (Default is 2)
* `-o`, `--out_dir`: Set the output directory for html files. (Default is creating a new directory called results)
* `-h`, `--help`: Show this message and exit.

**Examples**
---
```bash
# Analyze documents in 'data/pdf/plagiarism', with default settings
$ copy-spotter data/pdf/plagiarism

# Analyze with custom block size and specify output directory
$ copy-spotter data/pdf/plagiarism -s 5 -o results/output
```

**Development Setup:**
---

```bash
# Clone this repository
$ git clone https://github.com/Wazzabeee/copy_spotter

# Go into the repository
$ cd copy_spotter

# Install requirements
$ pip install -r requirements.txt
$ pip install -r requirements_lint.txt

# Install precommit
$ pip install pre-commit
$ pre-commit install

# Run tests
$ pip install pytest
$ pytest tests/

# Run package locally
$ python -m scripts.main [-s] [-o] [-h] input_directory
```

**Recommandations**
---
- Please make sure that all text files are closed before running the program.
- In order to get the best results please provide text files of the same languages.
- Pdf files that are made from scanned images won't be processed correctly.
- Ensure you have writing access when using the package 
- If a specific file is not processed correctly feel free to [contact me](mailto:<clement45.delteil45@gmail.com>) so that I can address the issue.

**TODO**
---
- Add more tests on existing functions
- Implement OCR with tesseract for scanned documents
- Add custom naming option for pdf files

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Wazzabeee/copy-spotter",
    "name": "copy-spotter",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Cl\u00e9ment Delteil",
    "author_email": "clement45.delteil45@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d8/ad/4d9ebc88ccd251829d1187cc5248aab12fa9192e24553905caa0b90dd9e9/copy-spotter-0.1.16.tar.gz",
    "platform": null,
    "description": "# Copy Spotter\n\n![PyPI - Version](https://img.shields.io/pypi/v/copy-spotter) ![PyPI - License](https://img.shields.io/pypi/l/copy-spotter)\n![Python](https://img.shields.io/badge/python-3.11-blue)\n\n\n![GIF demo](https://raw.githubusercontent.com/Wazzabeee/copy-spotter/main/data/img/example.gif)\n\n## About\nThis program will process pdf, txt, docx, and odt files that can be found in the given input directory, find similar sentences, calculate similarity percentage, display a similarity table with links to side by side comparison where similar sentences are highlighted.\n\n**Usage**\n---\n\n```bash\n$ pip install copy-spotter\n$ copy-spotter [-s] [-o] [-h] input_directory\n```\n***Positional Arguments:***\n* `input_directory`: One directory that contains all files (pdf, txt, docx, odt) (see `data/pdf/plagiarism` for example)\n\n```\ninput_directory/\n\u2502\n\u251c\u2500\u2500 file_1.docx\n\u251c\u2500\u2500 file_2.pdf\n\u2514\u2500\u2500 file_3.pdf\n```\n\n***Optional Arguments:***\n* `-s`, `--block-size`: Set minimum number of consecutive and similar words detected. (Default is 2)\n* `-o`, `--out_dir`: Set the output directory for html files. (Default is creating a new directory called results)\n* `-h`, `--help`: Show this message and exit.\n\n**Examples**\n---\n```bash\n# Analyze documents in 'data/pdf/plagiarism', with default settings\n$ copy-spotter data/pdf/plagiarism\n\n# Analyze with custom block size and specify output directory\n$ copy-spotter data/pdf/plagiarism -s 5 -o results/output\n```\n\n**Development Setup:**\n---\n\n```bash\n# Clone this repository\n$ git clone https://github.com/Wazzabeee/copy_spotter\n\n# Go into the repository\n$ cd copy_spotter\n\n# Install requirements\n$ pip install -r requirements.txt\n$ pip install -r requirements_lint.txt\n\n# Install precommit\n$ pip install pre-commit\n$ pre-commit install\n\n# Run tests\n$ pip install pytest\n$ pytest tests/\n\n# Run package locally\n$ python -m scripts.main [-s] [-o] [-h] input_directory\n```\n\n**Recommandations**\n---\n- Please make sure that all text files are closed before running the program.\n- In order to get the best results please provide text files of the same languages.\n- Pdf files that are made from scanned images won't be processed correctly.\n- Ensure you have writing access when using the package \n- If a specific file is not processed correctly feel free to [contact me](mailto:<clement45.delteil45@gmail.com>) so that I can address the issue.\n\n**TODO**\n---\n- Add more tests on existing functions\n- Implement OCR with tesseract for scanned documents\n- Add custom naming option for pdf files\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Make plagiarism detection easier. This package will find similar sentences between given files and highlight them in a side by side comparison.",
    "version": "0.1.16",
    "project_urls": {
        "Homepage": "https://github.com/Wazzabeee/copy-spotter"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "59e4ce25adf112a925b259ecdfde91f19c36c7e7b5d897af7e0751677dfb64cf",
                "md5": "ce5559d63b024ab397e3c2b83a9d7005",
                "sha256": "de87eef4e86489ab4c19c08280479fd5cb66b610c3bd548ada2a0c770751e3b3"
            },
            "downloads": -1,
            "filename": "copy_spotter-0.1.16-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ce5559d63b024ab397e3c2b83a9d7005",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 15222,
            "upload_time": "2024-05-09T18:17:46",
            "upload_time_iso_8601": "2024-05-09T18:17:46.542134Z",
            "url": "https://files.pythonhosted.org/packages/59/e4/ce25adf112a925b259ecdfde91f19c36c7e7b5d897af7e0751677dfb64cf/copy_spotter-0.1.16-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d8ad4d9ebc88ccd251829d1187cc5248aab12fa9192e24553905caa0b90dd9e9",
                "md5": "76d85bb96e65450a3aaca18a2bd12708",
                "sha256": "d3051ebb7d5f3384b7c2279d92520f1a6f7d2e9f0a3ed8333ec3136011dacd13"
            },
            "downloads": -1,
            "filename": "copy-spotter-0.1.16.tar.gz",
            "has_sig": false,
            "md5_digest": "76d85bb96e65450a3aaca18a2bd12708",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 12419,
            "upload_time": "2024-05-09T18:17:48",
            "upload_time_iso_8601": "2024-05-09T18:17:48.280970Z",
            "url": "https://files.pythonhosted.org/packages/d8/ad/4d9ebc88ccd251829d1187cc5248aab12fa9192e24553905caa0b90dd9e9/copy-spotter-0.1.16.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-09 18:17:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Wazzabeee",
    "github_project": "copy-spotter",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "copy-spotter"
}
        
Elapsed time: 0.29960s