Name | copy-spotter JSON |
Version |
0.1.16
JSON |
| download |
home_page | https://github.com/Wazzabeee/copy-spotter |
Summary | Make plagiarism detection easier. This package will find similar sentences between given files and highlight them in a side by side comparison. |
upload_time | 2024-05-09 18:17:48 |
maintainer | None |
docs_url | None |
author | Clément Delteil |
requires_python | >=3.10 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Copy Spotter
![PyPI - Version](https://img.shields.io/pypi/v/copy-spotter) ![PyPI - License](https://img.shields.io/pypi/l/copy-spotter)
![Python](https://img.shields.io/badge/python-3.11-blue)
![GIF demo](https://raw.githubusercontent.com/Wazzabeee/copy-spotter/main/data/img/example.gif)
## About
This program will process pdf, txt, docx, and odt files that can be found in the given input directory, find similar sentences, calculate similarity percentage, display a similarity table with links to side by side comparison where similar sentences are highlighted.
**Usage**
---
```bash
$ pip install copy-spotter
$ copy-spotter [-s] [-o] [-h] input_directory
```
***Positional Arguments:***
* `input_directory`: One directory that contains all files (pdf, txt, docx, odt) (see `data/pdf/plagiarism` for example)
```
input_directory/
│
├── file_1.docx
├── file_2.pdf
└── file_3.pdf
```
***Optional Arguments:***
* `-s`, `--block-size`: Set minimum number of consecutive and similar words detected. (Default is 2)
* `-o`, `--out_dir`: Set the output directory for html files. (Default is creating a new directory called results)
* `-h`, `--help`: Show this message and exit.
**Examples**
---
```bash
# Analyze documents in 'data/pdf/plagiarism', with default settings
$ copy-spotter data/pdf/plagiarism
# Analyze with custom block size and specify output directory
$ copy-spotter data/pdf/plagiarism -s 5 -o results/output
```
**Development Setup:**
---
```bash
# Clone this repository
$ git clone https://github.com/Wazzabeee/copy_spotter
# Go into the repository
$ cd copy_spotter
# Install requirements
$ pip install -r requirements.txt
$ pip install -r requirements_lint.txt
# Install precommit
$ pip install pre-commit
$ pre-commit install
# Run tests
$ pip install pytest
$ pytest tests/
# Run package locally
$ python -m scripts.main [-s] [-o] [-h] input_directory
```
**Recommandations**
---
- Please make sure that all text files are closed before running the program.
- In order to get the best results please provide text files of the same languages.
- Pdf files that are made from scanned images won't be processed correctly.
- Ensure you have writing access when using the package
- If a specific file is not processed correctly feel free to [contact me](mailto:<clement45.delteil45@gmail.com>) so that I can address the issue.
**TODO**
---
- Add more tests on existing functions
- Implement OCR with tesseract for scanned documents
- Add custom naming option for pdf files
Raw data
{
"_id": null,
"home_page": "https://github.com/Wazzabeee/copy-spotter",
"name": "copy-spotter",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": "Cl\u00e9ment Delteil",
"author_email": "clement45.delteil45@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d8/ad/4d9ebc88ccd251829d1187cc5248aab12fa9192e24553905caa0b90dd9e9/copy-spotter-0.1.16.tar.gz",
"platform": null,
"description": "# Copy Spotter\n\n![PyPI - Version](https://img.shields.io/pypi/v/copy-spotter) ![PyPI - License](https://img.shields.io/pypi/l/copy-spotter)\n![Python](https://img.shields.io/badge/python-3.11-blue)\n\n\n![GIF demo](https://raw.githubusercontent.com/Wazzabeee/copy-spotter/main/data/img/example.gif)\n\n## About\nThis program will process pdf, txt, docx, and odt files that can be found in the given input directory, find similar sentences, calculate similarity percentage, display a similarity table with links to side by side comparison where similar sentences are highlighted.\n\n**Usage**\n---\n\n```bash\n$ pip install copy-spotter\n$ copy-spotter [-s] [-o] [-h] input_directory\n```\n***Positional Arguments:***\n* `input_directory`: One directory that contains all files (pdf, txt, docx, odt) (see `data/pdf/plagiarism` for example)\n\n```\ninput_directory/\n\u2502\n\u251c\u2500\u2500 file_1.docx\n\u251c\u2500\u2500 file_2.pdf\n\u2514\u2500\u2500 file_3.pdf\n```\n\n***Optional Arguments:***\n* `-s`, `--block-size`: Set minimum number of consecutive and similar words detected. (Default is 2)\n* `-o`, `--out_dir`: Set the output directory for html files. (Default is creating a new directory called results)\n* `-h`, `--help`: Show this message and exit.\n\n**Examples**\n---\n```bash\n# Analyze documents in 'data/pdf/plagiarism', with default settings\n$ copy-spotter data/pdf/plagiarism\n\n# Analyze with custom block size and specify output directory\n$ copy-spotter data/pdf/plagiarism -s 5 -o results/output\n```\n\n**Development Setup:**\n---\n\n```bash\n# Clone this repository\n$ git clone https://github.com/Wazzabeee/copy_spotter\n\n# Go into the repository\n$ cd copy_spotter\n\n# Install requirements\n$ pip install -r requirements.txt\n$ pip install -r requirements_lint.txt\n\n# Install precommit\n$ pip install pre-commit\n$ pre-commit install\n\n# Run tests\n$ pip install pytest\n$ pytest tests/\n\n# Run package locally\n$ python -m scripts.main [-s] [-o] [-h] input_directory\n```\n\n**Recommandations**\n---\n- Please make sure that all text files are closed before running the program.\n- In order to get the best results please provide text files of the same languages.\n- Pdf files that are made from scanned images won't be processed correctly.\n- Ensure you have writing access when using the package \n- If a specific file is not processed correctly feel free to [contact me](mailto:<clement45.delteil45@gmail.com>) so that I can address the issue.\n\n**TODO**\n---\n- Add more tests on existing functions\n- Implement OCR with tesseract for scanned documents\n- Add custom naming option for pdf files\n",
"bugtrack_url": null,
"license": null,
"summary": "Make plagiarism detection easier. This package will find similar sentences between given files and highlight them in a side by side comparison.",
"version": "0.1.16",
"project_urls": {
"Homepage": "https://github.com/Wazzabeee/copy-spotter"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "59e4ce25adf112a925b259ecdfde91f19c36c7e7b5d897af7e0751677dfb64cf",
"md5": "ce5559d63b024ab397e3c2b83a9d7005",
"sha256": "de87eef4e86489ab4c19c08280479fd5cb66b610c3bd548ada2a0c770751e3b3"
},
"downloads": -1,
"filename": "copy_spotter-0.1.16-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ce5559d63b024ab397e3c2b83a9d7005",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 15222,
"upload_time": "2024-05-09T18:17:46",
"upload_time_iso_8601": "2024-05-09T18:17:46.542134Z",
"url": "https://files.pythonhosted.org/packages/59/e4/ce25adf112a925b259ecdfde91f19c36c7e7b5d897af7e0751677dfb64cf/copy_spotter-0.1.16-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d8ad4d9ebc88ccd251829d1187cc5248aab12fa9192e24553905caa0b90dd9e9",
"md5": "76d85bb96e65450a3aaca18a2bd12708",
"sha256": "d3051ebb7d5f3384b7c2279d92520f1a6f7d2e9f0a3ed8333ec3136011dacd13"
},
"downloads": -1,
"filename": "copy-spotter-0.1.16.tar.gz",
"has_sig": false,
"md5_digest": "76d85bb96e65450a3aaca18a2bd12708",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 12419,
"upload_time": "2024-05-09T18:17:48",
"upload_time_iso_8601": "2024-05-09T18:17:48.280970Z",
"url": "https://files.pythonhosted.org/packages/d8/ad/4d9ebc88ccd251829d1187cc5248aab12fa9192e24553905caa0b90dd9e9/copy-spotter-0.1.16.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-09 18:17:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Wazzabeee",
"github_project": "copy-spotter",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "copy-spotter"
}