bagit


Namebagit JSON
Version 1.8.1 PyPI version JSON
download
home_pagehttps://libraryofcongress.github.io/bagit-python/
SummaryCreate and validate BagIt packages
upload_time2021-02-08 18:53:22
maintainer
docs_urlNone
authorEd Summers
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            bagit-python
============

|Build Status| |Coverage Status|

bagit is a Python library and command line utility for working with
`BagIt <http://purl.org/net/bagit>`__ style packages.

Installation
------------

bagit.py is a single-file python module that you can drop into your
project as needed or you can install globally with:

::

    pip install bagit

Python v2.7+ is required.

Command Line Usage
------------------

When you install bagit you should get a command-line program called
bagit.py which you can use to turn an existing directory into a bag:

::

    bagit.py --contact-name 'John Kunze' /directory/to/bag

Finding Bagit on your system
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``bagit.py`` program should be available in your normal command-line
window (Terminal on OS X, Command Prompt or Powershell on Windows,
etc.). If you are unsure where it was installed you can also request
that Python search for ``bagit`` as a Python module: simply replace
``bagit.py`` with ``python -m bagit``:

::

    python -m bagit --help

On some systems Python may have been installed as ``python3``, ``py``,
etc. – simply use the same name you use to start an interactive Python
shell:

::

    py -m bagit --help
    python3 -m bagit --help

Configuring BagIt
~~~~~~~~~~~~~~~~~

You can pass in key/value metadata for the bag using options like
``--contact-name`` above, which get persisted to the bag-info.txt. For a
complete list of bag-info.txt properties you can use as commmand line
arguments see ``--help``.

Since calculating checksums can take a while when creating a bag, you
may want to calculate them in parallel if you are on a multicore
machine. You can do that with the ``--processes`` option:

::

    bagit.py --processes 4 /directory/to/bag

To specify which checksum algorithm(s) to use when generating the
manifest, use the --md5, --sha1, --sha256 and/or --sha512 flags (MD5 is
generated by default).

::

    bagit.py --sha1 /path/to/bag
    bagit.py --sha256 /path/to/bag
    bagit.py --sha512 /path/to/bag

If you would like to validate a bag you can use the --validate flag.

::

    bagit.py --validate /path/to/bag

If you would like to take a quick look at the bag to see if it seems
valid by just examining the structure of the bag, and comparing its
payload-oxum (byte count and number of files) then use the ``--fast``
flag.

::

    bagit.py --validate --fast /path/to/bag

And finally, if you'd like to parallelize validation to take advantage
of multiple CPUs you can:

::

    bagit.py --validate --processes 4 /path/to/bag

Using BagIt in your programs
----------------------------

You can also use BagIt programatically in your own Python programs by
importing the ``bagit`` module.

Create
~~~~~~

To create a bag you would do this:

.. code:: python

    bag = bagit.make_bag('mydir', {'Contact-Name': 'John Kunze'})

``make_bag`` returns a Bag instance. If you have a bag already on disk
and would like to create a Bag instance for it, simply call the
constructor directly:

.. code:: python

    bag = bagit.Bag('/path/to/bag')

Update Bag Metadata
~~~~~~~~~~~~~~~~~~~

You can change the metadata persisted to the bag-info.txt by using the
``info`` property on a ``Bag``.

.. code:: python

    # load the bag
    bag = bagit.Bag('/path/to/bag')

    # update bag info metadata
    bag.info['Internal-Sender-Description'] = 'Updated on 2014-06-28.'
    bag.info['Authors'] = ['John Kunze', 'Andy Boyko']
    bag.save()

Update Bag Manifests
~~~~~~~~~~~~~~~~~~~~

By default ``save`` will not update manifests. This guards against a
situation where a call to ``save`` to persist bag metadata accidentally
regenerates manifests for an invalid bag. If you have modified the
payload of a bag by adding, modifying or deleting files in the data
directory, and wish to regenerate the manifests set the ``manifests``
parameter to True when calling ``save``.

.. code:: python


    import shutil, os

    # add a file
    shutil.copyfile('newfile', '/path/to/bag/data/newfile')

    # remove a file
    os.remove('/path/to/bag/data/file')

    # persist changes
    bag.save(manifests=True)

The save method takes an optional processes parameter which will
determine how many processes are used to regenerate the checksums. This
can be handy on multicore machines.

Validation
~~~~~~~~~~

If you would like to see if a bag is valid, use its ``is_valid`` method:

.. code:: python

    bag = bagit.Bag('/path/to/bag')
    if bag.is_valid():
        print("yay :)")
    else:
        print("boo :(")

If you'd like to get a detailed list of validation errors, execute the
``validate`` method and catch the ``BagValidationError`` exception. If
the bag's manifest was invalid (and it wasn't caught by the payload
oxum) the exception's ``details`` property will contain a list of
``ManifestError``\ s that you can introspect on. Each ManifestError,
will be of type ``ChecksumMismatch``, ``FileMissing``,
``UnexpectedFile``.

So for example if you want to print out checksums that failed to
validate you can do this:

.. code:: python


    bag = bagit.Bag("/path/to/bag")

    try:
      bag.validate()

    except bagit.BagValidationError as e:
        for d in e.details:
            if isinstance(d, bagit.ChecksumMismatch):
                print("expected %s to have %s checksum of %s but found %s" %
                      (d.path, d.algorithm, d.expected, d.found))

To iterate through a bag's manifest and retrieve checksums for the
payload files use the bag's entries dictionary:

.. code:: python

    bag = bagit.Bag("/path/to/bag")

    for path, fixity in bag.entries.items():
      print("path:%s md5:%s" % (path, fixity["md5"]))

Contributing to bagit-python development
----------------------------------------

::

    % git clone git://github.com/LibraryOfCongress/bagit-python.git
    % cd bagit-python
    # MAKE CHANGES
    % python test.py

Running the tests
~~~~~~~~~~~~~~~~~

You can quickly run the tests by having setuptools install dependencies:

::

    python setup.py test

Once your code is working, you can use
`Tox <https://tox.readthedocs.io/>`__ to run the tests with every
supported version of Python which you have installed on the local
system:

::

    tox

If you have Docker installed, you can run the tests under Linux inside a
container:

::

    % docker build -t bagit:latest . && docker run -it bagit:latest

Benchmarks
----------

If you'd like to see how increasing parallelization of bag creation on
your system effects the time to create a bag try using the included
bench utility:

::

    % ./bench.py

License
-------

|cc0|

Note: By contributing to this project, you agree to license your work
under the same terms as those that govern this project's distribution.

.. |Build Status| image:: https://travis-ci.org/LibraryOfCongress/bagit-python.svg?branch=master
   :target: http://travis-ci.org/LibraryOfCongress/bagit-python
.. |Coverage Status| image:: https://coveralls.io/repos/github/LibraryOfCongress/bagit-python/badge.svg?branch=master
   :target: https://coveralls.io/github/LibraryOfCongress/bagit-python?branch=master
.. |cc0| image:: http://i.creativecommons.org/p/zero/1.0/88x31.png
   :target: http://creativecommons.org/publicdomain/zero/1.0/



            

Raw data

            {
    "_id": null,
    "home_page": "https://libraryofcongress.github.io/bagit-python/",
    "name": "bagit",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Ed Summers",
    "author_email": "ehs@pobox.com",
    "download_url": "https://files.pythonhosted.org/packages/e5/99/927b704237a1286f1022ea02a2fdfd82d5567cfbca97a4c343e2de7e37c4/bagit-1.8.1.tar.gz",
    "platform": "POSIX",
    "description": "bagit-python\n============\n\n|Build Status| |Coverage Status|\n\nbagit is a Python library and command line utility for working with\n`BagIt <http://purl.org/net/bagit>`__ style packages.\n\nInstallation\n------------\n\nbagit.py is a single-file python module that you can drop into your\nproject as needed or you can install globally with:\n\n::\n\n    pip install bagit\n\nPython v2.7+ is required.\n\nCommand Line Usage\n------------------\n\nWhen you install bagit you should get a command-line program called\nbagit.py which you can use to turn an existing directory into a bag:\n\n::\n\n    bagit.py --contact-name 'John Kunze' /directory/to/bag\n\nFinding Bagit on your system\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe ``bagit.py`` program should be available in your normal command-line\nwindow (Terminal on OS X, Command Prompt or Powershell on Windows,\netc.). If you are unsure where it was installed you can also request\nthat Python search for ``bagit`` as a Python module: simply replace\n``bagit.py`` with ``python -m bagit``:\n\n::\n\n    python -m bagit --help\n\nOn some systems Python may have been installed as ``python3``, ``py``,\netc. \u2013 simply use the same name you use to start an interactive Python\nshell:\n\n::\n\n    py -m bagit --help\n    python3 -m bagit --help\n\nConfiguring BagIt\n~~~~~~~~~~~~~~~~~\n\nYou can pass in key/value metadata for the bag using options like\n``--contact-name`` above, which get persisted to the bag-info.txt. For a\ncomplete list of bag-info.txt properties you can use as commmand line\narguments see ``--help``.\n\nSince calculating checksums can take a while when creating a bag, you\nmay want to calculate them in parallel if you are on a multicore\nmachine. You can do that with the ``--processes`` option:\n\n::\n\n    bagit.py --processes 4 /directory/to/bag\n\nTo specify which checksum algorithm(s) to use when generating the\nmanifest, use the --md5, --sha1, --sha256 and/or --sha512 flags (MD5 is\ngenerated by default).\n\n::\n\n    bagit.py --sha1 /path/to/bag\n    bagit.py --sha256 /path/to/bag\n    bagit.py --sha512 /path/to/bag\n\nIf you would like to validate a bag you can use the --validate flag.\n\n::\n\n    bagit.py --validate /path/to/bag\n\nIf you would like to take a quick look at the bag to see if it seems\nvalid by just examining the structure of the bag, and comparing its\npayload-oxum (byte count and number of files) then use the ``--fast``\nflag.\n\n::\n\n    bagit.py --validate --fast /path/to/bag\n\nAnd finally, if you'd like to parallelize validation to take advantage\nof multiple CPUs you can:\n\n::\n\n    bagit.py --validate --processes 4 /path/to/bag\n\nUsing BagIt in your programs\n----------------------------\n\nYou can also use BagIt programatically in your own Python programs by\nimporting the ``bagit`` module.\n\nCreate\n~~~~~~\n\nTo create a bag you would do this:\n\n.. code:: python\n\n    bag = bagit.make_bag('mydir', {'Contact-Name': 'John Kunze'})\n\n``make_bag`` returns a Bag instance. If you have a bag already on disk\nand would like to create a Bag instance for it, simply call the\nconstructor directly:\n\n.. code:: python\n\n    bag = bagit.Bag('/path/to/bag')\n\nUpdate Bag Metadata\n~~~~~~~~~~~~~~~~~~~\n\nYou can change the metadata persisted to the bag-info.txt by using the\n``info`` property on a ``Bag``.\n\n.. code:: python\n\n    # load the bag\n    bag = bagit.Bag('/path/to/bag')\n\n    # update bag info metadata\n    bag.info['Internal-Sender-Description'] = 'Updated on 2014-06-28.'\n    bag.info['Authors'] = ['John Kunze', 'Andy Boyko']\n    bag.save()\n\nUpdate Bag Manifests\n~~~~~~~~~~~~~~~~~~~~\n\nBy default ``save`` will not update manifests. This guards against a\nsituation where a call to ``save`` to persist bag metadata accidentally\nregenerates manifests for an invalid bag. If you have modified the\npayload of a bag by adding, modifying or deleting files in the data\ndirectory, and wish to regenerate the manifests set the ``manifests``\nparameter to True when calling ``save``.\n\n.. code:: python\n\n\n    import shutil, os\n\n    # add a file\n    shutil.copyfile('newfile', '/path/to/bag/data/newfile')\n\n    # remove a file\n    os.remove('/path/to/bag/data/file')\n\n    # persist changes\n    bag.save(manifests=True)\n\nThe save method takes an optional processes parameter which will\ndetermine how many processes are used to regenerate the checksums. This\ncan be handy on multicore machines.\n\nValidation\n~~~~~~~~~~\n\nIf you would like to see if a bag is valid, use its ``is_valid`` method:\n\n.. code:: python\n\n    bag = bagit.Bag('/path/to/bag')\n    if bag.is_valid():\n        print(\"yay :)\")\n    else:\n        print(\"boo :(\")\n\nIf you'd like to get a detailed list of validation errors, execute the\n``validate`` method and catch the ``BagValidationError`` exception. If\nthe bag's manifest was invalid (and it wasn't caught by the payload\noxum) the exception's ``details`` property will contain a list of\n``ManifestError``\\ s that you can introspect on. Each ManifestError,\nwill be of type ``ChecksumMismatch``, ``FileMissing``,\n``UnexpectedFile``.\n\nSo for example if you want to print out checksums that failed to\nvalidate you can do this:\n\n.. code:: python\n\n\n    bag = bagit.Bag(\"/path/to/bag\")\n\n    try:\n      bag.validate()\n\n    except bagit.BagValidationError as e:\n        for d in e.details:\n            if isinstance(d, bagit.ChecksumMismatch):\n                print(\"expected %s to have %s checksum of %s but found %s\" %\n                      (d.path, d.algorithm, d.expected, d.found))\n\nTo iterate through a bag's manifest and retrieve checksums for the\npayload files use the bag's entries dictionary:\n\n.. code:: python\n\n    bag = bagit.Bag(\"/path/to/bag\")\n\n    for path, fixity in bag.entries.items():\n      print(\"path:%s md5:%s\" % (path, fixity[\"md5\"]))\n\nContributing to bagit-python development\n----------------------------------------\n\n::\n\n    % git clone git://github.com/LibraryOfCongress/bagit-python.git\n    % cd bagit-python\n    # MAKE CHANGES\n    % python test.py\n\nRunning the tests\n~~~~~~~~~~~~~~~~~\n\nYou can quickly run the tests by having setuptools install dependencies:\n\n::\n\n    python setup.py test\n\nOnce your code is working, you can use\n`Tox <https://tox.readthedocs.io/>`__ to run the tests with every\nsupported version of Python which you have installed on the local\nsystem:\n\n::\n\n    tox\n\nIf you have Docker installed, you can run the tests under Linux inside a\ncontainer:\n\n::\n\n    % docker build -t bagit:latest . && docker run -it bagit:latest\n\nBenchmarks\n----------\n\nIf you'd like to see how increasing parallelization of bag creation on\nyour system effects the time to create a bag try using the included\nbench utility:\n\n::\n\n    % ./bench.py\n\nLicense\n-------\n\n|cc0|\n\nNote: By contributing to this project, you agree to license your work\nunder the same terms as those that govern this project's distribution.\n\n.. |Build Status| image:: https://travis-ci.org/LibraryOfCongress/bagit-python.svg?branch=master\n   :target: http://travis-ci.org/LibraryOfCongress/bagit-python\n.. |Coverage Status| image:: https://coveralls.io/repos/github/LibraryOfCongress/bagit-python/badge.svg?branch=master\n   :target: https://coveralls.io/github/LibraryOfCongress/bagit-python?branch=master\n.. |cc0| image:: http://i.creativecommons.org/p/zero/1.0/88x31.png\n   :target: http://creativecommons.org/publicdomain/zero/1.0/\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Create and validate BagIt packages",
    "version": "1.8.1",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "685c38ca69bf7ba6382d88059e0a22c3",
                "sha256": "d14dd7e373dd24d41f6748c42f123f7db77098dfa4a0125dbacb4c8bdf767c09"
            },
            "downloads": -1,
            "filename": "bagit-1.8.1-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "685c38ca69bf7ba6382d88059e0a22c3",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 35137,
            "upload_time": "2021-02-08T18:53:20",
            "upload_time_iso_8601": "2021-02-08T18:53:20.581492Z",
            "url": "https://files.pythonhosted.org/packages/1b/fc/58b3c209fdd383744b27914d0b88d0f9db72aa043e1475618d981d7089d9/bagit-1.8.1-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "e19b86956a6fe0fd7d14fa6b9eac6acc",
                "sha256": "37df1330d2e8640c8dee8ab6d0073ac701f0614d25f5252f9e05263409cee60c"
            },
            "downloads": -1,
            "filename": "bagit-1.8.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e19b86956a6fe0fd7d14fa6b9eac6acc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 26229,
            "upload_time": "2021-02-08T18:53:22",
            "upload_time_iso_8601": "2021-02-08T18:53:22.129861Z",
            "url": "https://files.pythonhosted.org/packages/e5/99/927b704237a1286f1022ea02a2fdfd82d5567cfbca97a4c343e2de7e37c4/bagit-1.8.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-02-08 18:53:22",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "bagit"
}
        
Elapsed time: 0.01377s