# Table of Contents
- [Table of Contents](#table-of-contents)
- [Project Description](#project-description)
- [Installation](#installation)
- [Adding Scripts to Path - Windows](#adding-scripts-to-path---windows)
- [Adding Scripts to Path - Linux / Unix / MacOS](#adding-scripts-to-path---linux--unix--macos)
- [Usage](#usage)
- [Overall Workflow](#overall-workflow)
- [Learning How It Works](#learning-how-it-works)
- [Making Test Files](#making-test-files)
- [Scanning for Duplicates](#scanning-for-duplicates)
- [Using stdin](#using-stdin)
- [Linux/MacOS/Unix Exmample:](#linuxmacosunix-exmample)
- [Windows Example](#windows-example)
- [Using path](#using-path)
- [Reviewing Results](#reviewing-results)
- [Deleting Duplicates](#deleting-duplicates)
- [Help](#help)
- [duple scan --help](#duple-scan---help)
- [duple rm --help](#duple-rm---help)
- [duple make-test-files --help](#duple-make-test-files---help)
- [duple hash-stats --help](#duple-hash-stats---help)
- [duple version --help](#duple-version---help)
- [duple wherelog --help](#duple-wherelog---help)
- [duple followlog --help](#duple-followlog---help)
- [Version History](#version-history)
- [2.1.5 Modified Output, Fixed Typos](#215-modified-output-fixed-typos)
- [2.1.3 Fixed Bug](#213-fixed-bug)
- [2.1.0 Adding Logging, Fixed Bugs](#210-adding-logging-fixed-bugs)
- [2.0.0 Refactored to Add Features](#200-refactored-to-add-features)
- [1.1.0 Improved Documentation](#110-improved-documentation)
- [1.0.0 Refactored and Improved Output and Reporting](#100-refactored-and-improved-output-and-reporting)
- [0.5.0 Improve Data Outputs](#050-improve-data-outputs)
- [0.4.0 Performance Improvements](#040-performance-improvements)
- [0.3.0 Added Capability](#030-added-capability)
- [0.2.0 Added license](#020-added-license)
- [0.1.1 Misc. Fixes](#011-misc-fixes)
- [0.1.0 Initial Release](#010-initial-release)
# Project Description
Duple is a small package that will find and remove duplicate files.
Duple will iterate through all files and directories that is given and find duplicate files (files are compared on their contents, byte by byte). Duple then outputs a file: duple.delete. The user should review duple.delete and make edits if needed (instructions are in duple.delete). Once the review is complete and edits made, another duple command will review duple.delete and delete the apporpriate files.
# Installation
It is strongly recommended to use the latest version of duple.
pip install duple
or if you need to upgrade:
pip install duple --upgrade
You may need to add the Python Scripts folder on your computer to the PATH.
## Adding Scripts to Path - Windows
Open PowerShell (Start > [search for powershell]) and copy/paste the following text to the command line:
python3 -c "from duple.info import get_user_scripts_path;get_user_scripts_path()"
Go to Start > [search for 'edit environment variables for your account'] > Users Variables for [user name] > Select Path in top list box > Click Edit...
Once the window pops up, add to the bottom of the list the result from the PowerShell command above
## Adding Scripts to Path - Linux / Unix / MacOS
Open terminal and copy/paste teh following text to the command line:
python3 -c "from duple.info import USER_SCRIPTS_PATH;print(USER_SCRIPTS_PATH)"
# Usage
## Overall Workflow
First, open the terminal and navigate to the directory you want to analyze for duplicates. Then, run 'duple scan', which will make two output files: duple.delete. Review duple.delete to validate how duple determined which files were original and which were duplicates. Then, run 'duple rm' to remove the files specified in 'duple.delete'.
## Learning How It Works
The following sections walk through an example from start to finish using only built in functions of Duple.
### Making Test Files
First, we'll make some test files to have something to scan for duplicates. Navigate to a convenient directory and make a test directory:
cd path_to_convenient_directory
duple make-test-files -pt
making directories: 100%|██████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1579.78it/s]
├── UOotpgGLv
│ ├── bWFMyHRrutcAxRibDV.txt
│ └── QYlFjmULSV
│ ├── QofPgusOsvlmKWVluYkWQgevDBE.txt
│ └── HAKhBlspwMkTYtVzTkLoENg.txt
├── .DS_Store
├── PRynXIdIXkAeaIPAQdCoFQSeuzhrK
│ ├── BppxnezMcePwzdJLfAEF.txt
│ └── FrADugjjVuGUvsN
│ ├── OstZgGsAuyRefYrMWybCOMpSEb.txt
│ └── GhvqDiXptHJvfmDxP.txt
└── BKniVYZvtcaiXncTCFAXdwZ
├── ewrZSzxOnrkA.txt
└── KXbShU
├── TSRNhUlhRSCM.txt
└── VfHPJNExNzTadfoHAWfpFVEtXlDZ.txt
### Scanning for Duplicates
#### Using stdin
For the example below, we used the option flags:<br>
-d means use the depth of the path to determine the original, -d means shallowest, -D means deepest<br>
-c means use accessed time to determine the original, -c means use oldest created time, -C means use newest created time<br>
When using stdin, the user must only pipe files into the duple scan. The most common way to use stdin would be to use the find command on Linux/MacOS/Unix and the Get-ChildItem command on Windows PowerShell.
##### Linux/MacOS/Unix Exmample:
> find . -type f | duple scan -d -c
traversing file tree: 7it [00:00, 11810.19it/s]
hashing files: 100%|███████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 77.75it/s]
Total Files.............................................................................10
Ignored Files............................................................................0
Duplicate Files..........................................................................6
Duplicate Groups.........................................................................4
Total Size (duplicates).............................................................2.1 kB
Total Size (all files)..............................................................9.7 kB
Hash Algorithm......................................................................sha256
File System Traversal Time (seconds)...............................................0.00757
Pre-Processing Files Time (seconds)................................................0.00025
Hashing Time (seconds).............................................................0.14561
Total Time (seconds)...............................................................0.15347
Duple Version........................................................................2.0.0
Results Written To............................/Users/shout/Desktop/duple_test/duple.delete
Open the `output summary results` file listed above with a text editor for review
Once review and changes are complete, run `duple rm`
##### Windows Example
PS > Get-ChildItem . -File -Recurse | %{$_.FullName} | duple scan -d -c
traversing file tree: 7it [00:00, 11810.19it/s]
hashing files: 100%|███████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 77.75it/s]
Total Files.............................................................................10
Ignored Files............................................................................0
Duplicate Files..........................................................................6
Duplicate Groups.........................................................................4
Total Size (duplicates).............................................................2.1 kB
Total Size (all files)..............................................................9.7 kB
Hash Algorithm......................................................................sha256
File System Traversal Time (seconds)...............................................0.00757
Pre-Processing Files Time (seconds)................................................0.00025
Hashing Time (seconds).............................................................0.14561
Total Time (seconds)...............................................................0.15347
Duple Version........................................................................2.0.0
Results Written To............................/Users/shout/Desktop/duple_test/duple.delete
Open the `output summary results` file listed above with a text editor for review
Once review and changes are complete, run `duple rm`
#### Using path
For the example below, we used the option flags:<br>
-p for path (. = current directory)<br>
-d means use the depth of the path to determine the original, -d means shallowest, -D means deepest<br>
-n means use name length to determine the original, -n means use shortest name, -N means use longest name<br>
You can use multiple flags to determine the original, both will be applied. So, in the case below, we use the shallowest depth and the shortest name to determine the original vs the duplicate(s).
duple scan -p . -d -n
traversing file tree: 7it [00:00, 11810.19it/s]
hashing files: 100%|███████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 77.75it/s]
Total Files.............................................................................10
Ignored Files............................................................................0
Duplicate Files..........................................................................6
Duplicate Groups.........................................................................4
Total Size (duplicates).............................................................2.1 kB
Total Size (all files)..............................................................9.7 kB
Hash Algorithm......................................................................sha256
File System Traversal Time (seconds)...............................................0.00757
Pre-Processing Files Time (seconds)................................................0.00025
Hashing Time (seconds).............................................................0.14561
Total Time (seconds)...............................................................0.15347
Duple Version........................................................................2.0.0
Results Written To............................/Users/shout/Desktop/duple_test/duple.delete
Open the `output summary results` file listed above with a text editor for review
Once review and changes are complete, run `duple rm`
### Reviewing Results
<span style="color:red">**ONLY FILES LISTED IN THE 'Duplicate Results' SECTION OF DUPLE.DELETE WILL BE DELETE**<br>
**THE 'Ignored Files in Scan' SECTION IS IGNORED**</span><br>
Open 'duple.delete' to review and edit the results. The user can change the left most column in duple.delete. The following line would be deleted:
DUPLICATE | 962 Bytes | /Users/shout/Desktop/duple_test/UOotpgGLv/QYlFjmULSV/QofPgusOsvlmKWVluYkWQgevDBE.txt
If the user changes the 'DUPLICATE' to 'ORIGINAL', see below, the file on that line will not be deleted.
ORIGINAL | 962 Bytes | /Users/shout/Desktop/duple_test/UOotpgGLv/QYlFjmULSV/QofPgusOsvlmKWVluYkWQgevDBE.txt
A sample duple.delete file is below:
Duple Report Generated on 2024-10-02T20:43:06.590382-04:00, commanded by user: shout
------------------------------------------------------------------------------------------
Summary Statistics:
Total Files.............................................................................10
Ignored Files............................................................................2
Duplicate Files..........................................................................6
Duplicate Groups.........................................................................2
Total Size (duplicates).............................................................2.7 kB
Total Size (all files).............................................................32.6 kB
Hash Algorithm......................................................................sha256
File System Traversal Time (seconds)...............................................0.00645
Pre-Processing Files Time (seconds)................................................0.00050
Hashing Time (seconds).............................................................0.15670
Total Time (seconds)...............................................................0.16371
Duple Version........................................................................2.1.6
Results Written To............................/Users/shout/Desktop/duple_test/duple.delete
------------------------------------------------------------------------------------------
Inputs (True = minimum, False = Maximum):
depth = True
namelength = True
------------------------------------------------------------------------------------------
Outputs:
/Users/shout/Desktop/duple_test/duple.delete
------------------------------------------------------------------------------------------
Instructions to User:
The sections below describe what action duple will take when 'duple rm' is commanded. The first column is the flag that tells duple what to do:
ORIGINAL : means duple will take no action for this file, listed only as a reference to the user
DUPLICATE : means duple will send this file to the trash can or recycling bin, if able
------------------------------------------------------------------------------------------
Duplicate Results:
DUPLICATE | 565 Bytes | /Users/shout/Desktop/duple_test/CXnVbEhTJHJmDoSR/JABUvTKiElLxxNeNjZh.txt
DUPLICATE | 565 Bytes | /Users/shout/Desktop/duple_test/PvNEOcwjFlmMGOUQFnqfDsJVzkOLi/eBTYszALyJXoealOjGj.txt
DUPLICATE | 565 Bytes | /Users/shout/Desktop/duple_test/PvNEOcwjFlmMGOUQFnqfDsJVzkOLi/MsxAwYDKeBkmUWLRHAsRRJLOA.txt
DUPLICATE | 565 Bytes | /Users/shout/Desktop/duple_test/PvNEOcwjFlmMGOUQFnqfDsJVzkOLi/mOgjQxE/kTGerbYckSIpJmeXTYUlmnLdQ.txt
ORIGINAL | 565 Bytes | /Users/shout/Desktop/duple_test/bLeKJEGLsdYxMNeEmUC/ERYCCCDsbfdYGiIFfh.txt
ORIGINAL | 231 Bytes | /Users/shout/Desktop/duple_test/CXnVbEhTJHJmDoSR/uWlvcHM.txt
DUPLICATE | 231 Bytes | /Users/shout/Desktop/duple_test/CXnVbEhTJHJmDoSR/rbevPzGoLmvGXJwsOuKuWXhbDq/FxUfdtjxeRGN.txt
DUPLICATE | 231 Bytes | /Users/shout/Desktop/duple_test/bLeKJEGLsdYxMNeEmUC/WWwniGsaAkLr.txt
------------------------------------------------------------------------------------------
Ignored Files in Scan:
IGNORED | 28.7 kB | UNIQUE_FILE_SIZE | /Users/shout/Desktop/duple_test/.DS_Store
IGNORED | 375 Bytes | UNIQUE_FILE_SIZE | /Users/shout/Desktop/duple_test/bLeKJEGLsdYxMNeEmUC/JsmDv/dioMDVyMZTHeaCJPdCSniu.txt
### Deleting Duplicates
After the user has reviewed/edited the 'duple.delete' file, you can run the duple rm command. This command <span style="color:red">**will delete files**</span> specified in duple.delete as 'DUPLICATE'.
It is recommended to first do a dry run to review the output, the dry run will **not** delete any files.
> duple rm -dr
[ 0.0%] will delete 4.0 kB duple.delete
[ 9.1%] will delete 484 Bytes UOotpgGLv/bWFMyHRrutcAxRibDV.txt
[ 18.2%] will keep 6.1 kB .DS_Store
[ 27.3%] will delete 962 Bytes UOotpgGLv/QYlFjmULSV/QofPgusOsvlmKWVluYkWQgevDBE.txt
[ 36.4%] will keep 962 Bytes PRynXIdIXkAeaIPAQdCoFQSeuzhrK/FrADugjjVuGUvsN/GhvqDiXptHJvfmDxP.txt
[ 45.5%] will delete 109 Bytes UOotpgGLv/QYlFjmULSV/HAKhBlspwMkTYtVzTkLoENg.txt
[ 54.5%] will keep 109 Bytes PRynXIdIXkAeaIPAQdCoFQSeuzhrK/BppxnezMcePwzdJLfAEF.txt
[ 63.6%] will delete 109 Bytes PRynXIdIXkAeaIPAQdCoFQSeuzhrK/FrADugjjVuGUvsN/OstZgGsAuyRefYrMWybCOMpSEb.txt
[ 72.7%] will delete 109 Bytes BKniVYZvtcaiXncTCFAXdwZ/ewrZSzxOnrkA.txt
[ 81.8%] will delete 352 Bytes BKniVYZvtcaiXncTCFAXdwZ/KXbShU/TSRNhUlhRSCM.txt
[ 90.9%] will keep 352 Bytes BKniVYZvtcaiXncTCFAXdwZ/KXbShU/VfHPJNExNzTadfoHAWfpFVEtXlDZ.txt
If this looks good, then we proceed to:
For verbose output, each file is listed in the output as it is being deleted:
> duple rm -v
If we don't want to see every file, but just a progress bar:
> duple rm
# Help
The top level help:
duple
Usage: duple [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
followlog follow the log until user interupts (ctrl-c), (tail -f)
hash-stats report hashing times for each available hashing...
make-test-files make test files to learn or test with duple
rm rm sends all 'duplicate' files specified in...
scan Scan recursively computes a hash of each file and puts...
version display the current version of duple
wherelog print the path to the logs
## duple scan --help
duple scan --help
Usage: duple scan [OPTIONS]
Scan recursively computes a hash of each file and puts the hash into a
dictionary. The keys are the hashes of the files, and the values are the
file paths and metadata. If an entry has more than 1 file associated, they
are duplicates. The original is determined by the flags or options (ex:
-d). The duplicates are added to a file called duple.delete.
Options:
-p, --path TEXT path to look in for duplicates, if this
option is present, paths is ignored
-in, --paths_file_stdin FILENAME
either a file containing a list of paths to
evaluate or stdin
-h, --hash TEXT the hashalgorithm to use, default = sha256,
allowed alogorithsm: ['blake2b', 'blake2s',
'md5', 'md5-sha1', 'ripemd160', 'sha1',
'sha224', 'sha256', 'sha384', 'sha3_224',
'sha3_256', 'sha3_384', 'sha3_512',
'sha512', 'sha512_224', 'sha512_256', 'sm3']
-d, --depth_min keep the file with the lowest pathway depth
-D, --depth_max keep the file with the highest pathway depth
-n, --name_min keep the file with the shortest name
-N, --name_max keep the file with the longest name
-c, --created_min keep the file with the oldest creation date
-C, --created_max keep the file with the newest creation date
-m, --modified_min keep the file with the oldest modified date
-M, --modified_max keep the file with the newest modified, date
-a, --accessed_min keep the file with the oldest accessed, date
-A, --accessed_max keep the file with the newest accessed, date
-ncpu, --number_of_cpus INTEGER
maximum number of cpu cores to use
-ch, --chunksize INTEGER chunksize to give to workers, minimum of 2
--help Show this message and exit.
## duple rm --help
duple rm --help
Usage: duple rm [OPTIONS]
rm sends all 'duplicate' files specified in duple.delete to the trash folder
Options:
-v, --verbose be more verbose during execution
-dr, --dry_run Perform dry run, do everything except deleting
files
-led, --leave_empty_dirs Do not delete empty directories/folders
--help Show this message and exit.
## duple make-test-files --help
duple make-test-files --help
Usage: duple make-test-files [OPTIONS]
make test files to learn or test with duple
Options:
-tp, --test_path PATH path where the test directories will be
created
-nd, --number_of_directories INTEGER
number of directories to make for the test
-nf, --number_of_files INTEGER number of files to make in each top level
directory, spread across the directories
-fs, --max_file_size INTEGER file size to create in bytes
-pt, --print_tree print tree with results
--help Show this message and exit.
## duple hash-stats --help
duple hash-stats --help
Usage: duple hash-stats [OPTIONS] PATH
report hashing times for each available hashing algorithm on the specified
file
Args: path (str): path to file to hash
Options:
--help Show this message and exit.
## duple version --help
duple version --help
Usage: duple version [OPTIONS]
display the current version of duple
Options:
--help Show this message and exit.
## duple wherelog --help
duple wherelog --help
Usage: duple wherelog [OPTIONS]
print the path to the logs
Options:
--help Show this message and exit.
## duple followlog --help
duple followlog --help
Usage: duple followlog [OPTIONS]
follow the log until user interupts (ctrl-c), (tail -f)
Options:
--help Show this message and exit.
# Version History
## 2.1.5 Modified Output, Fixed Typos
- [x] changed output duple.delete All Files section to just be the Ignored files and a disposition code for why the file was ignored
- [x] fixed typo in output duple.delete instructions section
- [x] added duple followlog
## 2.1.3 Fixed Bug
- [x] fixed bug where IGNORED files were added to the duplicate results, amended test to catch this bug in the future
## 2.1.0 Adding Logging, Fixed Bugs
- [x] added logging
- [x] fixed bug where unicode characters in file names would cause error
- [x] fixed performance issue during duple scan
## 2.0.0 Refactored to Add Features
- [x] added support for multiple filters (ex: -d -n)<br>
- [x] added support for accepting stdin for files to search<br>
- [x] added tree view support to make-test-files<br>
## 1.1.0 Improved Documentation
- [x] Improved README for better installation and setup instructions
## 1.0.0 Refactored and Improved Output and Reporting
- [x] refactored code to be easier to follow and more modular<br>
- [x] improved reporting of results to duple.delete and duple.json<br>
- [x] improved duple.json output, adding additional data<br>
- [x] added dry run and verbose flags to duple rm
- [x] added hash-stats to calculate performance times for each available hash<br>
- [x] added make-test-files to make test files for the user to learn how duple works on test data<br>
## 0.5.0 Improve Data Outputs
- [x] added dictionary to duple.json for file stats, now each entry has a key to describe the number<br>
- [x] fixed progress bar for pre-processing directories<br>
- [x] added output file duple.all_files.json with file statistics on all files within the specified path for 'duple scan'<br>
- [x] Improved summary statistics output for 'duple scan'
## 0.4.0 Performance Improvements
- [x] adding multiprocessing, taking advantage of multiple cores<br>
- [x] eliminated files with unique sizes from analysis - files with unique size are not duplicates of another file
## 0.3.0 Added Capability
- [x] added mv function that will move 'duple.delete' paths instead of deleting them
## 0.2.0 Added license
- [x] Added license
## 0.1.1 Misc. Fixes
- [x] Fixed typos in help strings<br>
- [x] Added support for sending duplicates to trash ('duple rm')
## 0.1.0 Initial Release
- [x] This is the initial release of duple
Raw data
{
"_id": null,
"home_page": "https://github.com/dbruce-ae05/duple",
"name": "duple",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.11",
"maintainer_email": null,
"keywords": "duplicate, lint",
"author": "David Bruce",
"author_email": "duple.python@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f0/c2/2f6c80d4f8d18ccb18c29b6ef71e14c9735142de991038ef97cef5ccdec4/duple-2.1.7.tar.gz",
"platform": null,
"description": "# Table of Contents\n- [Table of Contents](#table-of-contents)\n- [Project Description](#project-description)\n- [Installation](#installation)\n - [Adding Scripts to Path - Windows](#adding-scripts-to-path---windows)\n - [Adding Scripts to Path - Linux / Unix / MacOS](#adding-scripts-to-path---linux--unix--macos)\n- [Usage](#usage)\n - [Overall Workflow](#overall-workflow)\n - [Learning How It Works](#learning-how-it-works)\n - [Making Test Files](#making-test-files)\n - [Scanning for Duplicates](#scanning-for-duplicates)\n - [Using stdin](#using-stdin)\n - [Linux/MacOS/Unix Exmample:](#linuxmacosunix-exmample)\n - [Windows Example](#windows-example)\n - [Using path](#using-path)\n - [Reviewing Results](#reviewing-results)\n - [Deleting Duplicates](#deleting-duplicates)\n- [Help](#help)\n - [duple scan --help](#duple-scan---help)\n - [duple rm --help](#duple-rm---help)\n - [duple make-test-files --help](#duple-make-test-files---help)\n - [duple hash-stats --help](#duple-hash-stats---help)\n - [duple version --help](#duple-version---help)\n - [duple wherelog --help](#duple-wherelog---help)\n - [duple followlog --help](#duple-followlog---help)\n- [Version History](#version-history)\n - [2.1.5 Modified Output, Fixed Typos](#215-modified-output-fixed-typos)\n - [2.1.3 Fixed Bug](#213-fixed-bug)\n - [2.1.0 Adding Logging, Fixed Bugs](#210-adding-logging-fixed-bugs)\n - [2.0.0 Refactored to Add Features](#200-refactored-to-add-features)\n - [1.1.0 Improved Documentation](#110-improved-documentation)\n - [1.0.0 Refactored and Improved Output and Reporting](#100-refactored-and-improved-output-and-reporting)\n - [0.5.0 Improve Data Outputs](#050-improve-data-outputs)\n - [0.4.0 Performance Improvements](#040-performance-improvements)\n - [0.3.0 Added Capability](#030-added-capability)\n - [0.2.0 Added license](#020-added-license)\n - [0.1.1 Misc. Fixes](#011-misc-fixes)\n - [0.1.0 Initial Release](#010-initial-release)\n# Project Description\nDuple is a small package that will find and remove duplicate files.\n\nDuple will iterate through all files and directories that is given and find duplicate files (files are compared on their contents, byte by byte). Duple then outputs a file: duple.delete. The user should review duple.delete and make edits if needed (instructions are in duple.delete). Once the review is complete and edits made, another duple command will review duple.delete and delete the apporpriate files.\n# Installation\nIt is strongly recommended to use the latest version of duple.\n\n pip install duple\n\nor if you need to upgrade:\n\n pip install duple --upgrade\n\n\nYou may need to add the Python Scripts folder on your computer to the PATH.\n\n## Adding Scripts to Path - Windows\nOpen PowerShell (Start > [search for powershell]) and copy/paste the following text to the command line:\n\n python3 -c \"from duple.info import get_user_scripts_path;get_user_scripts_path()\"\n\nGo to Start > [search for 'edit environment variables for your account'] > Users Variables for [user name] > Select Path in top list box > Click Edit...\n\nOnce the window pops up, add to the bottom of the list the result from the PowerShell command above\n## Adding Scripts to Path - Linux / Unix / MacOS\nOpen terminal and copy/paste teh following text to the command line:\n\n python3 -c \"from duple.info import USER_SCRIPTS_PATH;print(USER_SCRIPTS_PATH)\"\n# Usage\n## Overall Workflow\nFirst, open the terminal and navigate to the directory you want to analyze for duplicates. Then, run 'duple scan', which will make two output files: duple.delete. Review duple.delete to validate how duple determined which files were original and which were duplicates. Then, run 'duple rm' to remove the files specified in 'duple.delete'.\n## Learning How It Works\nThe following sections walk through an example from start to finish using only built in functions of Duple.\n### Making Test Files\nFirst, we'll make some test files to have something to scan for duplicates. Navigate to a convenient directory and make a test directory:\n \n cd path_to_convenient_directory\n duple make-test-files -pt\n making directories: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 1579.78it/s]\n\n \u251c\u2500\u2500 UOotpgGLv\n \u2502 \u251c\u2500\u2500 bWFMyHRrutcAxRibDV.txt\n \u2502 \u2514\u2500\u2500 QYlFjmULSV\n \u2502 \u251c\u2500\u2500 QofPgusOsvlmKWVluYkWQgevDBE.txt\n \u2502 \u2514\u2500\u2500 HAKhBlspwMkTYtVzTkLoENg.txt\n \u251c\u2500\u2500 .DS_Store\n \u251c\u2500\u2500 PRynXIdIXkAeaIPAQdCoFQSeuzhrK\n \u2502 \u251c\u2500\u2500 BppxnezMcePwzdJLfAEF.txt\n \u2502 \u2514\u2500\u2500 FrADugjjVuGUvsN\n \u2502 \u251c\u2500\u2500 OstZgGsAuyRefYrMWybCOMpSEb.txt\n \u2502 \u2514\u2500\u2500 GhvqDiXptHJvfmDxP.txt\n \u2514\u2500\u2500 BKniVYZvtcaiXncTCFAXdwZ\n \u251c\u2500\u2500 ewrZSzxOnrkA.txt\n \u2514\u2500\u2500 KXbShU\n \u251c\u2500\u2500 TSRNhUlhRSCM.txt\n \u2514\u2500\u2500 VfHPJNExNzTadfoHAWfpFVEtXlDZ.txt\n### Scanning for Duplicates\n#### Using stdin\nFor the example below, we used the option flags:<br>\n -d means use the depth of the path to determine the original, -d means shallowest, -D means deepest<br>\n -c means use accessed time to determine the original, -c means use oldest created time, -C means use newest created time<br>\n\nWhen using stdin, the user must only pipe files into the duple scan. The most common way to use stdin would be to use the find command on Linux/MacOS/Unix and the Get-ChildItem command on Windows PowerShell.\n\n##### Linux/MacOS/Unix Exmample:\n \n > find . -type f | duple scan -d -c\n traversing file tree: 7it [00:00, 11810.19it/s]\n hashing files: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 77.75it/s]\n Total Files.............................................................................10\n Ignored Files............................................................................0\n Duplicate Files..........................................................................6\n Duplicate Groups.........................................................................4\n Total Size (duplicates).............................................................2.1 kB\n Total Size (all files)..............................................................9.7 kB\n Hash Algorithm......................................................................sha256\n File System Traversal Time (seconds)...............................................0.00757\n Pre-Processing Files Time (seconds)................................................0.00025\n Hashing Time (seconds).............................................................0.14561\n Total Time (seconds)...............................................................0.15347\n Duple Version........................................................................2.0.0\n Results Written To............................/Users/shout/Desktop/duple_test/duple.delete\n\n Open the `output summary results` file listed above with a text editor for review\n Once review and changes are complete, run `duple rm`\n\n##### Windows Example\n\n PS > Get-ChildItem . -File -Recurse | %{$_.FullName} | duple scan -d -c\n traversing file tree: 7it [00:00, 11810.19it/s]\n hashing files: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 77.75it/s]\n Total Files.............................................................................10\n Ignored Files............................................................................0\n Duplicate Files..........................................................................6\n Duplicate Groups.........................................................................4\n Total Size (duplicates).............................................................2.1 kB\n Total Size (all files)..............................................................9.7 kB\n Hash Algorithm......................................................................sha256\n File System Traversal Time (seconds)...............................................0.00757\n Pre-Processing Files Time (seconds)................................................0.00025\n Hashing Time (seconds).............................................................0.14561\n Total Time (seconds)...............................................................0.15347\n Duple Version........................................................................2.0.0\n Results Written To............................/Users/shout/Desktop/duple_test/duple.delete\n\n Open the `output summary results` file listed above with a text editor for review\n Once review and changes are complete, run `duple rm`\n#### Using path\nFor the example below, we used the option flags:<br>\n -p for path (. = current directory)<br>\n -d means use the depth of the path to determine the original, -d means shallowest, -D means deepest<br>\n -n means use name length to determine the original, -n means use shortest name, -N means use longest name<br>\n \nYou can use multiple flags to determine the original, both will be applied. So, in the case below, we use the shallowest depth and the shortest name to determine the original vs the duplicate(s).\n\n duple scan -p . -d -n\n traversing file tree: 7it [00:00, 11810.19it/s]\n hashing files: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 77.75it/s]\n Total Files.............................................................................10\n Ignored Files............................................................................0\n Duplicate Files..........................................................................6\n Duplicate Groups.........................................................................4\n Total Size (duplicates).............................................................2.1 kB\n Total Size (all files)..............................................................9.7 kB\n Hash Algorithm......................................................................sha256\n File System Traversal Time (seconds)...............................................0.00757\n Pre-Processing Files Time (seconds)................................................0.00025\n Hashing Time (seconds).............................................................0.14561\n Total Time (seconds)...............................................................0.15347\n Duple Version........................................................................2.0.0\n Results Written To............................/Users/shout/Desktop/duple_test/duple.delete\n\n Open the `output summary results` file listed above with a text editor for review\n Once review and changes are complete, run `duple rm`\n### Reviewing Results\n<span style=\"color:red\">**ONLY FILES LISTED IN THE 'Duplicate Results' SECTION OF DUPLE.DELETE WILL BE DELETE**<br>\n**THE 'Ignored Files in Scan' SECTION IS IGNORED**</span><br>\n\nOpen 'duple.delete' to review and edit the results. The user can change the left most column in duple.delete. The following line would be deleted:\n\n DUPLICATE | 962 Bytes | /Users/shout/Desktop/duple_test/UOotpgGLv/QYlFjmULSV/QofPgusOsvlmKWVluYkWQgevDBE.txt\n\nIf the user changes the 'DUPLICATE' to 'ORIGINAL', see below, the file on that line will not be deleted.\n\n ORIGINAL | 962 Bytes | /Users/shout/Desktop/duple_test/UOotpgGLv/QYlFjmULSV/QofPgusOsvlmKWVluYkWQgevDBE.txt\n\nA sample duple.delete file is below:\n\n Duple Report Generated on 2024-10-02T20:43:06.590382-04:00, commanded by user: shout\n ------------------------------------------------------------------------------------------\n Summary Statistics:\n Total Files.............................................................................10\n Ignored Files............................................................................2\n Duplicate Files..........................................................................6\n Duplicate Groups.........................................................................2\n Total Size (duplicates).............................................................2.7 kB\n Total Size (all files).............................................................32.6 kB\n Hash Algorithm......................................................................sha256\n File System Traversal Time (seconds)...............................................0.00645\n Pre-Processing Files Time (seconds)................................................0.00050\n Hashing Time (seconds).............................................................0.15670\n Total Time (seconds)...............................................................0.16371\n Duple Version........................................................................2.1.6\n Results Written To............................/Users/shout/Desktop/duple_test/duple.delete\n\n ------------------------------------------------------------------------------------------\n Inputs (True = minimum, False = Maximum): \n depth = True\n namelength = True\n\n ------------------------------------------------------------------------------------------\n Outputs:\n /Users/shout/Desktop/duple_test/duple.delete\n\n ------------------------------------------------------------------------------------------\n Instructions to User:\n The sections below describe what action duple will take when 'duple rm' is commanded. The first column is the flag that tells duple what to do:\n ORIGINAL : means duple will take no action for this file, listed only as a reference to the user\n DUPLICATE : means duple will send this file to the trash can or recycling bin, if able\n\n ------------------------------------------------------------------------------------------\n Duplicate Results:\n DUPLICATE | 565 Bytes | /Users/shout/Desktop/duple_test/CXnVbEhTJHJmDoSR/JABUvTKiElLxxNeNjZh.txt\n DUPLICATE | 565 Bytes | /Users/shout/Desktop/duple_test/PvNEOcwjFlmMGOUQFnqfDsJVzkOLi/eBTYszALyJXoealOjGj.txt\n DUPLICATE | 565 Bytes | /Users/shout/Desktop/duple_test/PvNEOcwjFlmMGOUQFnqfDsJVzkOLi/MsxAwYDKeBkmUWLRHAsRRJLOA.txt\n DUPLICATE | 565 Bytes | /Users/shout/Desktop/duple_test/PvNEOcwjFlmMGOUQFnqfDsJVzkOLi/mOgjQxE/kTGerbYckSIpJmeXTYUlmnLdQ.txt\n ORIGINAL | 565 Bytes | /Users/shout/Desktop/duple_test/bLeKJEGLsdYxMNeEmUC/ERYCCCDsbfdYGiIFfh.txt\n\n ORIGINAL | 231 Bytes | /Users/shout/Desktop/duple_test/CXnVbEhTJHJmDoSR/uWlvcHM.txt\n DUPLICATE | 231 Bytes | /Users/shout/Desktop/duple_test/CXnVbEhTJHJmDoSR/rbevPzGoLmvGXJwsOuKuWXhbDq/FxUfdtjxeRGN.txt\n DUPLICATE | 231 Bytes | /Users/shout/Desktop/duple_test/bLeKJEGLsdYxMNeEmUC/WWwniGsaAkLr.txt\n\n\n ------------------------------------------------------------------------------------------\n Ignored Files in Scan:\n IGNORED | 28.7 kB | UNIQUE_FILE_SIZE | /Users/shout/Desktop/duple_test/.DS_Store\n IGNORED | 375 Bytes | UNIQUE_FILE_SIZE | /Users/shout/Desktop/duple_test/bLeKJEGLsdYxMNeEmUC/JsmDv/dioMDVyMZTHeaCJPdCSniu.txt\n### Deleting Duplicates\nAfter the user has reviewed/edited the 'duple.delete' file, you can run the duple rm command. This command <span style=\"color:red\">**will delete files**</span> specified in duple.delete as 'DUPLICATE'.\n\nIt is recommended to first do a dry run to review the output, the dry run will **not** delete any files.\n\n > duple rm -dr\n [ 0.0%] will delete 4.0 kB duple.delete\n [ 9.1%] will delete 484 Bytes UOotpgGLv/bWFMyHRrutcAxRibDV.txt\n [ 18.2%] will keep 6.1 kB .DS_Store\n [ 27.3%] will delete 962 Bytes UOotpgGLv/QYlFjmULSV/QofPgusOsvlmKWVluYkWQgevDBE.txt\n [ 36.4%] will keep 962 Bytes PRynXIdIXkAeaIPAQdCoFQSeuzhrK/FrADugjjVuGUvsN/GhvqDiXptHJvfmDxP.txt\n [ 45.5%] will delete 109 Bytes UOotpgGLv/QYlFjmULSV/HAKhBlspwMkTYtVzTkLoENg.txt\n [ 54.5%] will keep 109 Bytes PRynXIdIXkAeaIPAQdCoFQSeuzhrK/BppxnezMcePwzdJLfAEF.txt\n [ 63.6%] will delete 109 Bytes PRynXIdIXkAeaIPAQdCoFQSeuzhrK/FrADugjjVuGUvsN/OstZgGsAuyRefYrMWybCOMpSEb.txt\n [ 72.7%] will delete 109 Bytes BKniVYZvtcaiXncTCFAXdwZ/ewrZSzxOnrkA.txt\n [ 81.8%] will delete 352 Bytes BKniVYZvtcaiXncTCFAXdwZ/KXbShU/TSRNhUlhRSCM.txt\n [ 90.9%] will keep 352 Bytes BKniVYZvtcaiXncTCFAXdwZ/KXbShU/VfHPJNExNzTadfoHAWfpFVEtXlDZ.txt\n\nIf this looks good, then we proceed to:\n\nFor verbose output, each file is listed in the output as it is being deleted:\n\n > duple rm -v\n\nIf we don't want to see every file, but just a progress bar:\n\n > duple rm\n# Help\nThe top level help:\n duple\n Usage: duple [OPTIONS] COMMAND [ARGS]...\n\n Options:\n --help Show this message and exit.\n\n Commands:\n followlog follow the log until user interupts (ctrl-c), (tail -f)\n hash-stats report hashing times for each available hashing...\n make-test-files make test files to learn or test with duple\n rm rm sends all 'duplicate' files specified in...\n scan Scan recursively computes a hash of each file and puts...\n version display the current version of duple\n wherelog print the path to the logs\n## duple scan --help\n duple scan --help\n Usage: duple scan [OPTIONS]\n\n Scan recursively computes a hash of each file and puts the hash into a\n dictionary. The keys are the hashes of the files, and the values are the\n file paths and metadata. If an entry has more than 1 file associated, they\n are duplicates. The original is determined by the flags or options (ex:\n -d). The duplicates are added to a file called duple.delete.\n\n Options:\n -p, --path TEXT path to look in for duplicates, if this\n option is present, paths is ignored\n -in, --paths_file_stdin FILENAME\n either a file containing a list of paths to\n evaluate or stdin\n -h, --hash TEXT the hashalgorithm to use, default = sha256,\n allowed alogorithsm: ['blake2b', 'blake2s',\n 'md5', 'md5-sha1', 'ripemd160', 'sha1',\n 'sha224', 'sha256', 'sha384', 'sha3_224',\n 'sha3_256', 'sha3_384', 'sha3_512',\n 'sha512', 'sha512_224', 'sha512_256', 'sm3']\n -d, --depth_min keep the file with the lowest pathway depth\n -D, --depth_max keep the file with the highest pathway depth\n -n, --name_min keep the file with the shortest name\n -N, --name_max keep the file with the longest name\n -c, --created_min keep the file with the oldest creation date\n -C, --created_max keep the file with the newest creation date\n -m, --modified_min keep the file with the oldest modified date\n -M, --modified_max keep the file with the newest modified, date\n -a, --accessed_min keep the file with the oldest accessed, date\n -A, --accessed_max keep the file with the newest accessed, date\n -ncpu, --number_of_cpus INTEGER\n maximum number of cpu cores to use\n -ch, --chunksize INTEGER chunksize to give to workers, minimum of 2\n --help Show this message and exit.\n## duple rm --help\n duple rm --help\n Usage: duple rm [OPTIONS]\n\n rm sends all 'duplicate' files specified in duple.delete to the trash folder\n\n Options:\n -v, --verbose be more verbose during execution\n -dr, --dry_run Perform dry run, do everything except deleting\n files\n -led, --leave_empty_dirs Do not delete empty directories/folders\n --help Show this message and exit.\n## duple make-test-files --help\n duple make-test-files --help\n Usage: duple make-test-files [OPTIONS]\n\n make test files to learn or test with duple\n\n Options:\n -tp, --test_path PATH path where the test directories will be\n created\n -nd, --number_of_directories INTEGER\n number of directories to make for the test\n -nf, --number_of_files INTEGER number of files to make in each top level\n directory, spread across the directories\n -fs, --max_file_size INTEGER file size to create in bytes\n -pt, --print_tree print tree with results\n --help Show this message and exit.\n## duple hash-stats --help\n duple hash-stats --help\n Usage: duple hash-stats [OPTIONS] PATH\n\n report hashing times for each available hashing algorithm on the specified\n file\n\n Args: path (str): path to file to hash\n\n Options:\n --help Show this message and exit.\n## duple version --help\n duple version --help\n Usage: duple version [OPTIONS]\n\n display the current version of duple\n\n Options:\n --help Show this message and exit.\n## duple wherelog --help\n duple wherelog --help\n Usage: duple wherelog [OPTIONS]\n\n print the path to the logs\n\n Options:\n --help Show this message and exit.\n## duple followlog --help\n duple followlog --help\n Usage: duple followlog [OPTIONS]\n\n follow the log until user interupts (ctrl-c), (tail -f)\n\n Options:\n --help Show this message and exit.\n# Version History\n## 2.1.5 Modified Output, Fixed Typos\n- [x] changed output duple.delete All Files section to just be the Ignored files and a disposition code for why the file was ignored\n- [x] fixed typo in output duple.delete instructions section\n- [x] added duple followlog \n## 2.1.3 Fixed Bug\n- [x] fixed bug where IGNORED files were added to the duplicate results, amended test to catch this bug in the future\n## 2.1.0 Adding Logging, Fixed Bugs\n- [x] added logging\n- [x] fixed bug where unicode characters in file names would cause error\n- [x] fixed performance issue during duple scan\n## 2.0.0 Refactored to Add Features\n- [x] added support for multiple filters (ex: -d -n)<br>\n- [x] added support for accepting stdin for files to search<br>\n- [x] added tree view support to make-test-files<br>\n## 1.1.0 Improved Documentation\n- [x] Improved README for better installation and setup instructions\n## 1.0.0 Refactored and Improved Output and Reporting\n- [x] refactored code to be easier to follow and more modular<br>\n- [x] improved reporting of results to duple.delete and duple.json<br>\n- [x] improved duple.json output, adding additional data<br>\n- [x] added dry run and verbose flags to duple rm\n- [x] added hash-stats to calculate performance times for each available hash<br>\n- [x] added make-test-files to make test files for the user to learn how duple works on test data<br>\n## 0.5.0 Improve Data Outputs\n- [x] added dictionary to duple.json for file stats, now each entry has a key to describe the number<br>\n- [x] fixed progress bar for pre-processing directories<br>\n- [x] added output file duple.all_files.json with file statistics on all files within the specified path for 'duple scan'<br>\n- [x] Improved summary statistics output for 'duple scan'\n## 0.4.0 Performance Improvements\n- [x] adding multiprocessing, taking advantage of multiple cores<br>\n- [x] eliminated files with unique sizes from analysis - files with unique size are not duplicates of another file\n## 0.3.0 Added Capability\n- [x] added mv function that will move 'duple.delete' paths instead of deleting them\n## 0.2.0 Added license\n- [x] Added license\n## 0.1.1 Misc. Fixes\n- [x] Fixed typos in help strings<br>\n- [x] Added support for sending duplicates to trash ('duple rm')\n## 0.1.0 Initial Release\n- [x] This is the initial release of duple",
"bugtrack_url": null,
"license": "GPL-3.0-or-later",
"summary": "Duple is a CLI that finds and removes duplicate files.",
"version": "2.1.7",
"project_urls": {
"Homepage": "https://github.com/dbruce-ae05/duple",
"Repository": "https://github.com/dbruce-ae05/duple"
},
"split_keywords": [
"duplicate",
" lint"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d6d4f29f91766cb33a6b2cbcf564e2fedef603785cb742547747f6556dc6379c",
"md5": "a78135368d8323e1354fc6c3ff0f93ac",
"sha256": "c90ab03f947ff08c8a567f332eb210c43f6901961a2791e3d4c9da384915db2a"
},
"downloads": -1,
"filename": "duple-2.1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a78135368d8323e1354fc6c3ff0f93ac",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.11",
"size": 49680,
"upload_time": "2024-11-03T04:06:52",
"upload_time_iso_8601": "2024-11-03T04:06:52.510056Z",
"url": "https://files.pythonhosted.org/packages/d6/d4/f29f91766cb33a6b2cbcf564e2fedef603785cb742547747f6556dc6379c/duple-2.1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f0c22f6c80d4f8d18ccb18c29b6ef71e14c9735142de991038ef97cef5ccdec4",
"md5": "47846c214926891f0e247186f42ec5fb",
"sha256": "5b7bc041413e375d0b0816e1a5f65a8094a5b69671bc84a1d9e3ef5c16fb215f"
},
"downloads": -1,
"filename": "duple-2.1.7.tar.gz",
"has_sig": false,
"md5_digest": "47846c214926891f0e247186f42ec5fb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.11",
"size": 51183,
"upload_time": "2024-11-03T04:06:53",
"upload_time_iso_8601": "2024-11-03T04:06:53.999345Z",
"url": "https://files.pythonhosted.org/packages/f0/c2/2f6c80d4f8d18ccb18c29b6ef71e14c9735142de991038ef97cef5ccdec4/duple-2.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-03 04:06:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dbruce-ae05",
"github_project": "duple",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "duple"
}