# Search Duplicates (searchdups)
This is a simple application that searches for duplicate files in a set of folders. To check whether the files are identical or not, it makes use of `md5` or `sha256` algorithms, but the application calculates a _smart hash_ to enhance performance: the idea is to calculate a partial hash and finalize the calculation only if needed.
Additionally, this application includes a pseudo _hash_ that consists of checking whether the name of the files is the same. If using this _"hash algorithm"_, if the name of two files is the same, they are considered to be the same even if the content is not the same.
The basic usage is
```bash
$ searchdups -r .
> 8f8db820d89c39029a0629094e0f18c9*
/Users/calfonso/Programacion/norepo/searchdups/a1.jpg
/Users/calfonso/Programacion/norepo/searchdups/a11.jpg
```
Some other features are:
- Select the hash algorithm (using parameter `-H`).
- Searching in subfolders (using flag `-r`).
- Considering hidden folders and files (using flag `-a`).
- Show a progress bar during the process (using flag `-p`).
- Selecting which files are processed (using `-f` parameter for _sh-like_ filters, or `-e` parameter for regular expressions).
- Exclude the files to process (using `-F` parameter for _sh-like_ filters, or `-E` parameter for regular expressions).
- Summarize the amount of files and folders considered (using flag `-s`).
- Get the result in a file (using parameter `-o`).
Please check the CLI help to get updated information about the usage of this tool.
## Installation
To install the tool you can clone the code and execute the next command inside the cloned folder
```shell
$ pip install .
```
or install it from the repositories:
```shell
$ pip install searchdups
```
Raw data
{
"_id": null,
"home_page": "https://github.com/dealfonso/searchdups",
"name": "searchdups",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "command line,cli,files,sysadmin",
"author": "Carlos A.",
"author_email": "caralla@upv.es",
"download_url": "https://files.pythonhosted.org/packages/91/15/76c875a49a7c0ae21fce16b3f2d7b5f8e40ae5916cb80322ca175ac4c3a8/searchdups-1.0.0.tar.gz",
"platform": null,
"description": "# Search Duplicates (searchdups)\n\nThis is a simple application that searches for duplicate files in a set of folders. To check whether the files are identical or not, it makes use of `md5` or `sha256` algorithms, but the application calculates a _smart hash_ to enhance performance: the idea is to calculate a partial hash and finalize the calculation only if needed.\n\nAdditionally, this application includes a pseudo _hash_ that consists of checking whether the name of the files is the same. If using this _\"hash algorithm\"_, if the name of two files is the same, they are considered to be the same even if the content is not the same.\n\nThe basic usage is\n\n```bash\n$ searchdups -r . \n> 8f8db820d89c39029a0629094e0f18c9*\n/Users/calfonso/Programacion/norepo/searchdups/a1.jpg\n/Users/calfonso/Programacion/norepo/searchdups/a11.jpg\n```\n\nSome other features are:\n\n- Select the hash algorithm (using parameter `-H`).\n- Searching in subfolders (using flag `-r`).\n- Considering hidden folders and files (using flag `-a`).\n- Show a progress bar during the process (using flag `-p`).\n- Selecting which files are processed (using `-f` parameter for _sh-like_ filters, or `-e` parameter for regular expressions).\n- Exclude the files to process (using `-F` parameter for _sh-like_ filters, or `-E` parameter for regular expressions).\n- Summarize the amount of files and folders considered (using flag `-s`).\n- Get the result in a file (using parameter `-o`).\n\nPlease check the CLI help to get updated information about the usage of this tool.\n\n## Installation\n\nTo install the tool you can clone the code and execute the next command inside the cloned folder\n\n```shell\n$ pip install .\n```\n\nor install it from the repositories:\n\n```shell\n$ pip install searchdups\n```\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Searches for duplicate files in folders (recursively, if needed)",
"version": "1.0.0",
"split_keywords": [
"command line",
"cli",
"files",
"sysadmin"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "911576c875a49a7c0ae21fce16b3f2d7b5f8e40ae5916cb80322ca175ac4c3a8",
"md5": "6f80b9c4b23731db1904a4b0df052d63",
"sha256": "927f08cd832f801ff3e3f3b1f4f34f6639b4217f53127287c1a2ac41b224d1d5"
},
"downloads": -1,
"filename": "searchdups-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "6f80b9c4b23731db1904a4b0df052d63",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 9086,
"upload_time": "2023-02-02T13:46:28",
"upload_time_iso_8601": "2023-02-02T13:46:28.176773Z",
"url": "https://files.pythonhosted.org/packages/91/15/76c875a49a7c0ae21fce16b3f2d7b5f8e40ae5916cb80322ca175ac4c3a8/searchdups-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-02-02 13:46:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "dealfonso",
"github_project": "searchdups",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "searchdups"
}