bitrot


Namebitrot JSON
Version 1.0.1 PyPI version JSON
download
home_page
SummaryDetects bit rotten files on the hard drive to save your precious photo and music collection from slow decay.
upload_time2023-08-02 11:06:25
maintainer
docs_urlNone
author
requires_python>=3.8
licenseMIT
keywords file checksum database
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ======
bitrot
======

Detects bit rotten files on the hard drive to save your precious photo
and music collection from slow decay.

Usage
-----

Go to the desired directory and simply invoke::

  $ bitrot

This will start digging through your directory structure recursively
indexing all files found. The index is stored in a ``.bitrot.db`` file
which is a SQLite 3 database.

Next time you run ``bitrot`` it will add new files and update the index
for files with a changed modification date. Most importantly however, it
will report all errors, e.g. files that changed on the hard drive but
still have the same modification date.

All paths stored in ``.bitrot.db`` are relative so it's safe to rescan
a folder after moving it to another drive. Just remember to move it in
a way that doesn't touch modification dates. Otherwise the checksum
database is useless.

Performance
-----------

Obviously depends on how fast the underlying drive is.  Historically
the script was single-threaded because back in 2013 checksum
calculations on a single core still outran typical drives, including
the mobile SSDs of the day.  In 2020 this is no longer the case so the
script now uses a process pool to calculate SHA1 hashes and perform
`stat()` calls.

No rigorous performance tests have been done.  Scanning a ~1000 file
directory totalling ~5 GB takes 2.2s on a 2018 MacBook Pro 15" with
a AP0512M SSD.  Back in 2013, that same feat on a 2015 MacBook Air with
a SM0256G SSD took over 20 seconds.

On that same 2018 MacBook Pro 15", scanning a 60+ GB music library takes
24 seconds.  Back in 2013, with a typical 5400 RPM laptop hard drive
it took around 15 minutes.  How times have changed!

Tests
-----

There's a simple but comprehensive test scenario using
`pytest <https://pypi.org/p/pytest>`_ and
`pytest-order <https://pypi.org/p/pytest-order>`.

Install::

  $ python3 -m venv .venv
  $ . .venv/bin/activate
  (.venv)$ pip install -e .[test]

Run::

  (.venv)$ pytest -x
  ==================== test session starts ====================
  platform darwin -- Python 3.10.12, pytest-7.4.0, pluggy-1.2.0
  rootdir: /Users/ambv/Documents/Python/bitrot
  plugins: order-1.1.0
  collected 12 items

  tests/test_bitrot.py ............                      [100%]

  ==================== 12 passed in 15.05s ====================

Change Log
----------

1.0.1
~~~~~

* officially remove Python 2 support that was broken since 1.0.0
  anyway; now the package works with Python 3.8+ because of a few
  features

1.0.0
~~~~~

* significantly sped up execution on solid state drives by using
  a process pool executor to calculate SHA1 hashes and perform `stat()`
  calls; use `-w1` if your runs on slow magnetic drives were
  negatively affected by this change

* sped up execution by pre-loading all SQLite-stored hashes to memory
  and doing comparisons using Python sets

* all UTF-8 filenames are now normalized to NFKD in the database to
  enable cross-operating system checks

* the SQLite database is now vacuumed to minimize its size

* bugfix: additional Python 3 fixes when Unicode names were encountered

0.9.2
~~~~~

* bugfix: one place in the code incorrectly hardcoded UTF-8 as the
  filesystem encoding

0.9.1
~~~~~

* bugfix: print the path that failed to decode with FSENCODING

* bugfix: when using -q, don't hide warnings about files that can't be
  statted or read

* bugfix: -s is no longer broken on Python 3

0.9.0
~~~~~

* bugfix: bitrot.db checksum checking messages now obey --quiet

* Python 3 compatibility

0.8.0
~~~~~

* bitrot now keeps track of its own database's bitrot by storing
  a checksum of .bitrot.db in .bitrot.sha512

* bugfix: now properly uses the filesystem encoding to decode file names
  for use with the .bitrotdb database. Report and original patch by
  pallinger.

0.7.1
~~~~~

* bugfix: SHA1 computation now works correctly on Windows; previously
  opened files in text-mode. This fix will change hashes of files
  containing some specific bytes like 0x1A.

0.7.0
~~~~~

* when a file changes or is renamed, the timestamp of the last check is
  updated, too

* bugfix: files that disappeared during the run are now properly ignored

* bugfix: files that are locked or with otherwise denied access are
  skipped. If they were read before, they will be considered "missing"
  in the report.

* bugfix: if there are multiple files with the same content in the
  scanned directory tree, renames are now handled properly for them

* refactored some horrible code to be a little less horrible

0.6.0
~~~~~

* more control over performance with ``--commit-interval`` and
  ``--chunk-size`` command-line arguments

* bugfix: symbolic links are now properly skipped (or can be followed if
  ``--follow-links`` is passed)

* bugfix: files that cannot be opened are now gracefully skipped

* bugfix: fixed a rare division by zero when run in an empty directory

0.5.1
~~~~~

* bugfix: warn about test mode only in test mode

0.5.0
~~~~~

* ``--test`` command-line argument for testing the state without
  updating the database on disk (works for testing databases you don't
  have write access to)

* size of the data read is reported upon finish

* minor performance updates

0.4.0
~~~~~

* renames are now reported as such

* all non-regular files (e.g. symbolic links, pipes, sockets) are now
  skipped

* progress presented in percentage

0.3.0
~~~~~

* ``--sum`` command-line argument for easy comparison of multiple
  databases

0.2.1
~~~~~

* fixed regression from 0.2.0 where new files caused a ``KeyError``
  exception

0.2.0
~~~~~

* ``--verbose`` and ``--quiet`` command-line arguments

* if a file is no longer there, its entry is removed from the database

0.1.0
~~~~~

* First published version.

Authors
-------

Glued together by `Łukasz Langa <mailto:lukasz@langa.pl>`_. Multiple
improvements by
`Ben Shepherd <mailto:bjashepherd@gmail.com>`_,
`Jean-Louis Fuchs <mailto:ganwell@fangorn.ch>`_,
`Marcus Linderoth <marcus@thingsquare.com>`_,
`p1r473 <mailto:subwayjared@gmail.com>`_,
`Peter Hofmann <mailto:scm@uninformativ.de>`_,
`Phil Lundrigan <mailto:philipbl@cs.utah.edu>`_,
`Reid Williams <rwilliams@ideo.com>`_,
`Stan Senotrusov <senotrusov@gmail.com>`_,
`Yang Zhang <mailto:yaaang@gmail.com>`_, and
`Zhuoyun Wei <wzyboy@wzyboy.org>`_.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "bitrot",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "file,checksum,database",
    "author": "",
    "author_email": "\u0141ukasz Langa <lukasz@langa.pl>",
    "download_url": "https://files.pythonhosted.org/packages/05/5a/948160fad98ab13a7eeb1e1011a24a6b99fb2007e3456eae2bec4aaa55d1/bitrot-1.0.1.tar.gz",
    "platform": null,
    "description": "======\nbitrot\n======\n\nDetects bit rotten files on the hard drive to save your precious photo\nand music collection from slow decay.\n\nUsage\n-----\n\nGo to the desired directory and simply invoke::\n\n  $ bitrot\n\nThis will start digging through your directory structure recursively\nindexing all files found. The index is stored in a ``.bitrot.db`` file\nwhich is a SQLite 3 database.\n\nNext time you run ``bitrot`` it will add new files and update the index\nfor files with a changed modification date. Most importantly however, it\nwill report all errors, e.g. files that changed on the hard drive but\nstill have the same modification date.\n\nAll paths stored in ``.bitrot.db`` are relative so it's safe to rescan\na folder after moving it to another drive. Just remember to move it in\na way that doesn't touch modification dates. Otherwise the checksum\ndatabase is useless.\n\nPerformance\n-----------\n\nObviously depends on how fast the underlying drive is.  Historically\nthe script was single-threaded because back in 2013 checksum\ncalculations on a single core still outran typical drives, including\nthe mobile SSDs of the day.  In 2020 this is no longer the case so the\nscript now uses a process pool to calculate SHA1 hashes and perform\n`stat()` calls.\n\nNo rigorous performance tests have been done.  Scanning a ~1000 file\ndirectory totalling ~5 GB takes 2.2s on a 2018 MacBook Pro 15\" with\na AP0512M SSD.  Back in 2013, that same feat on a 2015 MacBook Air with\na SM0256G SSD took over 20 seconds.\n\nOn that same 2018 MacBook Pro 15\", scanning a 60+ GB music library takes\n24 seconds.  Back in 2013, with a typical 5400 RPM laptop hard drive\nit took around 15 minutes.  How times have changed!\n\nTests\n-----\n\nThere's a simple but comprehensive test scenario using\n`pytest <https://pypi.org/p/pytest>`_ and\n`pytest-order <https://pypi.org/p/pytest-order>`.\n\nInstall::\n\n  $ python3 -m venv .venv\n  $ . .venv/bin/activate\n  (.venv)$ pip install -e .[test]\n\nRun::\n\n  (.venv)$ pytest -x\n  ==================== test session starts ====================\n  platform darwin -- Python 3.10.12, pytest-7.4.0, pluggy-1.2.0\n  rootdir: /Users/ambv/Documents/Python/bitrot\n  plugins: order-1.1.0\n  collected 12 items\n\n  tests/test_bitrot.py ............                      [100%]\n\n  ==================== 12 passed in 15.05s ====================\n\nChange Log\n----------\n\n1.0.1\n~~~~~\n\n* officially remove Python 2 support that was broken since 1.0.0\n  anyway; now the package works with Python 3.8+ because of a few\n  features\n\n1.0.0\n~~~~~\n\n* significantly sped up execution on solid state drives by using\n  a process pool executor to calculate SHA1 hashes and perform `stat()`\n  calls; use `-w1` if your runs on slow magnetic drives were\n  negatively affected by this change\n\n* sped up execution by pre-loading all SQLite-stored hashes to memory\n  and doing comparisons using Python sets\n\n* all UTF-8 filenames are now normalized to NFKD in the database to\n  enable cross-operating system checks\n\n* the SQLite database is now vacuumed to minimize its size\n\n* bugfix: additional Python 3 fixes when Unicode names were encountered\n\n0.9.2\n~~~~~\n\n* bugfix: one place in the code incorrectly hardcoded UTF-8 as the\n  filesystem encoding\n\n0.9.1\n~~~~~\n\n* bugfix: print the path that failed to decode with FSENCODING\n\n* bugfix: when using -q, don't hide warnings about files that can't be\n  statted or read\n\n* bugfix: -s is no longer broken on Python 3\n\n0.9.0\n~~~~~\n\n* bugfix: bitrot.db checksum checking messages now obey --quiet\n\n* Python 3 compatibility\n\n0.8.0\n~~~~~\n\n* bitrot now keeps track of its own database's bitrot by storing\n  a checksum of .bitrot.db in .bitrot.sha512\n\n* bugfix: now properly uses the filesystem encoding to decode file names\n  for use with the .bitrotdb database. Report and original patch by\n  pallinger.\n\n0.7.1\n~~~~~\n\n* bugfix: SHA1 computation now works correctly on Windows; previously\n  opened files in text-mode. This fix will change hashes of files\n  containing some specific bytes like 0x1A.\n\n0.7.0\n~~~~~\n\n* when a file changes or is renamed, the timestamp of the last check is\n  updated, too\n\n* bugfix: files that disappeared during the run are now properly ignored\n\n* bugfix: files that are locked or with otherwise denied access are\n  skipped. If they were read before, they will be considered \"missing\"\n  in the report.\n\n* bugfix: if there are multiple files with the same content in the\n  scanned directory tree, renames are now handled properly for them\n\n* refactored some horrible code to be a little less horrible\n\n0.6.0\n~~~~~\n\n* more control over performance with ``--commit-interval`` and\n  ``--chunk-size`` command-line arguments\n\n* bugfix: symbolic links are now properly skipped (or can be followed if\n  ``--follow-links`` is passed)\n\n* bugfix: files that cannot be opened are now gracefully skipped\n\n* bugfix: fixed a rare division by zero when run in an empty directory\n\n0.5.1\n~~~~~\n\n* bugfix: warn about test mode only in test mode\n\n0.5.0\n~~~~~\n\n* ``--test`` command-line argument for testing the state without\n  updating the database on disk (works for testing databases you don't\n  have write access to)\n\n* size of the data read is reported upon finish\n\n* minor performance updates\n\n0.4.0\n~~~~~\n\n* renames are now reported as such\n\n* all non-regular files (e.g. symbolic links, pipes, sockets) are now\n  skipped\n\n* progress presented in percentage\n\n0.3.0\n~~~~~\n\n* ``--sum`` command-line argument for easy comparison of multiple\n  databases\n\n0.2.1\n~~~~~\n\n* fixed regression from 0.2.0 where new files caused a ``KeyError``\n  exception\n\n0.2.0\n~~~~~\n\n* ``--verbose`` and ``--quiet`` command-line arguments\n\n* if a file is no longer there, its entry is removed from the database\n\n0.1.0\n~~~~~\n\n* First published version.\n\nAuthors\n-------\n\nGlued together by `\u0141ukasz Langa <mailto:lukasz@langa.pl>`_. Multiple\nimprovements by\n`Ben Shepherd <mailto:bjashepherd@gmail.com>`_,\n`Jean-Louis Fuchs <mailto:ganwell@fangorn.ch>`_,\n`Marcus Linderoth <marcus@thingsquare.com>`_,\n`p1r473 <mailto:subwayjared@gmail.com>`_,\n`Peter Hofmann <mailto:scm@uninformativ.de>`_,\n`Phil Lundrigan <mailto:philipbl@cs.utah.edu>`_,\n`Reid Williams <rwilliams@ideo.com>`_,\n`Stan Senotrusov <senotrusov@gmail.com>`_,\n`Yang Zhang <mailto:yaaang@gmail.com>`_, and\n`Zhuoyun Wei <wzyboy@wzyboy.org>`_.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Detects bit rotten files on the hard drive to save your precious photo and music collection from slow decay.",
    "version": "1.0.1",
    "project_urls": null,
    "split_keywords": [
        "file",
        "checksum",
        "database"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8aecf88233ba4e60e02dfdcbe4a73194116c6c4c84999b64154a4794794bc323",
                "md5": "f8e1fabae7d282a3e33a53f7c87e1b74",
                "sha256": "1ab725abdfb01acddaeda68ae0b6d5341c0a0506148d2fdfe28e7a217be378d6"
            },
            "downloads": -1,
            "filename": "bitrot-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f8e1fabae7d282a3e33a53f7c87e1b74",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 11712,
            "upload_time": "2023-08-02T11:06:24",
            "upload_time_iso_8601": "2023-08-02T11:06:24.077579Z",
            "url": "https://files.pythonhosted.org/packages/8a/ec/f88233ba4e60e02dfdcbe4a73194116c6c4c84999b64154a4794794bc323/bitrot-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "055a948160fad98ab13a7eeb1e1011a24a6b99fb2007e3456eae2bec4aaa55d1",
                "md5": "ab20267b7050bfb38f1d08a6c2e1fdf8",
                "sha256": "d170c42a7b375350d1b0a49dee44e3e59285021b80ba014a641ded8d23e1d5c9"
            },
            "downloads": -1,
            "filename": "bitrot-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "ab20267b7050bfb38f1d08a6c2e1fdf8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 13728,
            "upload_time": "2023-08-02T11:06:25",
            "upload_time_iso_8601": "2023-08-02T11:06:25.073519Z",
            "url": "https://files.pythonhosted.org/packages/05/5a/948160fad98ab13a7eeb1e1011a24a6b99fb2007e3456eae2bec4aaa55d1/bitrot-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-02 11:06:25",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "bitrot"
}
        
Elapsed time: 0.10018s