bagit-python
============
|Build Status| |Coverage Status|
bagit is a Python library and command line utility for working with
`BagIt <http://purl.org/net/bagit>`__ style packages.
Installation
------------
bagit.py is a single-file python module that you can drop into your
project as needed or you can install globally with:
::
pip install bagit
Python v2.7+ is required.
Command Line Usage
------------------
When you install bagit you should get a command-line program called
bagit.py which you can use to turn an existing directory into a bag:
::
bagit.py --contact-name 'John Kunze' /directory/to/bag
Finding Bagit on your system
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ``bagit.py`` program should be available in your normal command-line
window (Terminal on OS X, Command Prompt or Powershell on Windows,
etc.). If you are unsure where it was installed you can also request
that Python search for ``bagit`` as a Python module: simply replace
``bagit.py`` with ``python -m bagit``:
::
python -m bagit --help
On some systems Python may have been installed as ``python3``, ``py``,
etc. – simply use the same name you use to start an interactive Python
shell:
::
py -m bagit --help
python3 -m bagit --help
Configuring BagIt
~~~~~~~~~~~~~~~~~
You can pass in key/value metadata for the bag using options like
``--contact-name`` above, which get persisted to the bag-info.txt. For a
complete list of bag-info.txt properties you can use as commmand line
arguments see ``--help``.
Since calculating checksums can take a while when creating a bag, you
may want to calculate them in parallel if you are on a multicore
machine. You can do that with the ``--processes`` option:
::
bagit.py --processes 4 /directory/to/bag
To specify which checksum algorithm(s) to use when generating the
manifest, use the --md5, --sha1, --sha256 and/or --sha512 flags (MD5 is
generated by default).
::
bagit.py --sha1 /path/to/bag
bagit.py --sha256 /path/to/bag
bagit.py --sha512 /path/to/bag
If you would like to validate a bag you can use the --validate flag.
::
bagit.py --validate /path/to/bag
If you would like to take a quick look at the bag to see if it seems
valid by just examining the structure of the bag, and comparing its
payload-oxum (byte count and number of files) then use the ``--fast``
flag.
::
bagit.py --validate --fast /path/to/bag
And finally, if you'd like to parallelize validation to take advantage
of multiple CPUs you can:
::
bagit.py --validate --processes 4 /path/to/bag
Using BagIt in your programs
----------------------------
You can also use BagIt programatically in your own Python programs by
importing the ``bagit`` module.
Create
~~~~~~
To create a bag you would do this:
.. code:: python
bag = bagit.make_bag('mydir', {'Contact-Name': 'John Kunze'})
``make_bag`` returns a Bag instance. If you have a bag already on disk
and would like to create a Bag instance for it, simply call the
constructor directly:
.. code:: python
bag = bagit.Bag('/path/to/bag')
Update Bag Metadata
~~~~~~~~~~~~~~~~~~~
You can change the metadata persisted to the bag-info.txt by using the
``info`` property on a ``Bag``.
.. code:: python
# load the bag
bag = bagit.Bag('/path/to/bag')
# update bag info metadata
bag.info['Internal-Sender-Description'] = 'Updated on 2014-06-28.'
bag.info['Authors'] = ['John Kunze', 'Andy Boyko']
bag.save()
Update Bag Manifests
~~~~~~~~~~~~~~~~~~~~
By default ``save`` will not update manifests. This guards against a
situation where a call to ``save`` to persist bag metadata accidentally
regenerates manifests for an invalid bag. If you have modified the
payload of a bag by adding, modifying or deleting files in the data
directory, and wish to regenerate the manifests set the ``manifests``
parameter to True when calling ``save``.
.. code:: python
import shutil, os
# add a file
shutil.copyfile('newfile', '/path/to/bag/data/newfile')
# remove a file
os.remove('/path/to/bag/data/file')
# persist changes
bag.save(manifests=True)
The save method takes an optional processes parameter which will
determine how many processes are used to regenerate the checksums. This
can be handy on multicore machines.
Validation
~~~~~~~~~~
If you would like to see if a bag is valid, use its ``is_valid`` method:
.. code:: python
bag = bagit.Bag('/path/to/bag')
if bag.is_valid():
print("yay :)")
else:
print("boo :(")
If you'd like to get a detailed list of validation errors, execute the
``validate`` method and catch the ``BagValidationError`` exception. If
the bag's manifest was invalid (and it wasn't caught by the payload
oxum) the exception's ``details`` property will contain a list of
``ManifestError``\ s that you can introspect on. Each ManifestError,
will be of type ``ChecksumMismatch``, ``FileMissing``,
``UnexpectedFile``.
So for example if you want to print out checksums that failed to
validate you can do this:
.. code:: python
bag = bagit.Bag("/path/to/bag")
try:
bag.validate()
except bagit.BagValidationError as e:
for d in e.details:
if isinstance(d, bagit.ChecksumMismatch):
print("expected %s to have %s checksum of %s but found %s" %
(d.path, d.algorithm, d.expected, d.found))
To iterate through a bag's manifest and retrieve checksums for the
payload files use the bag's entries dictionary:
.. code:: python
bag = bagit.Bag("/path/to/bag")
for path, fixity in bag.entries.items():
print("path:%s md5:%s" % (path, fixity["md5"]))
Contributing to bagit-python development
----------------------------------------
::
% git clone git://github.com/LibraryOfCongress/bagit-python.git
% cd bagit-python
# MAKE CHANGES
% python test.py
Running the tests
~~~~~~~~~~~~~~~~~
You can quickly run the tests by having setuptools install dependencies:
::
python setup.py test
Once your code is working, you can use
`Tox <https://tox.readthedocs.io/>`__ to run the tests with every
supported version of Python which you have installed on the local
system:
::
tox
If you have Docker installed, you can run the tests under Linux inside a
container:
::
% docker build -t bagit:latest . && docker run -it bagit:latest
Benchmarks
----------
If you'd like to see how increasing parallelization of bag creation on
your system effects the time to create a bag try using the included
bench utility:
::
% ./bench.py
License
-------
|cc0|
Note: By contributing to this project, you agree to license your work
under the same terms as those that govern this project's distribution.
.. |Coverage Status| image:: https://coveralls.io/repos/github/LibraryOfCongress/bagit-python/badge.svg?branch=master
:target: https://coveralls.io/github/LibraryOfCongress/bagit-python?branch=master
.. |cc0| image:: http://i.creativecommons.org/p/zero/1.0/88x31.png
:target: http://creativecommons.org/publicdomain/zero/1.0/
Raw data
{
"_id": null,
"home_page": "https://libraryofcongress.github.io/bagit-python/",
"name": "ocrd-fork-bagit",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Ed Summers",
"author_email": "ehs@pobox.com",
"download_url": "https://files.pythonhosted.org/packages/e8/2c/1bc15418441590a10803d23d562338f91c79439dccd8d82a55dd8ab875ab/ocrd-fork-bagit-1.8.1.post2.tar.gz",
"platform": "POSIX",
"description": "bagit-python\n============\n\n|Build Status| |Coverage Status|\n\nbagit is a Python library and command line utility for working with\n`BagIt <http://purl.org/net/bagit>`__ style packages.\n\nInstallation\n------------\n\nbagit.py is a single-file python module that you can drop into your\nproject as needed or you can install globally with:\n\n::\n\n pip install bagit\n\nPython v2.7+ is required.\n\nCommand Line Usage\n------------------\n\nWhen you install bagit you should get a command-line program called\nbagit.py which you can use to turn an existing directory into a bag:\n\n::\n\n bagit.py --contact-name 'John Kunze' /directory/to/bag\n\nFinding Bagit on your system\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe ``bagit.py`` program should be available in your normal command-line\nwindow (Terminal on OS X, Command Prompt or Powershell on Windows,\netc.). If you are unsure where it was installed you can also request\nthat Python search for ``bagit`` as a Python module: simply replace\n``bagit.py`` with ``python -m bagit``:\n\n::\n\n python -m bagit --help\n\nOn some systems Python may have been installed as ``python3``, ``py``,\netc. \u2013 simply use the same name you use to start an interactive Python\nshell:\n\n::\n\n py -m bagit --help\n python3 -m bagit --help\n\nConfiguring BagIt\n~~~~~~~~~~~~~~~~~\n\nYou can pass in key/value metadata for the bag using options like\n``--contact-name`` above, which get persisted to the bag-info.txt. For a\ncomplete list of bag-info.txt properties you can use as commmand line\narguments see ``--help``.\n\nSince calculating checksums can take a while when creating a bag, you\nmay want to calculate them in parallel if you are on a multicore\nmachine. You can do that with the ``--processes`` option:\n\n::\n\n bagit.py --processes 4 /directory/to/bag\n\nTo specify which checksum algorithm(s) to use when generating the\nmanifest, use the --md5, --sha1, --sha256 and/or --sha512 flags (MD5 is\ngenerated by default).\n\n::\n\n bagit.py --sha1 /path/to/bag\n bagit.py --sha256 /path/to/bag\n bagit.py --sha512 /path/to/bag\n\nIf you would like to validate a bag you can use the --validate flag.\n\n::\n\n bagit.py --validate /path/to/bag\n\nIf you would like to take a quick look at the bag to see if it seems\nvalid by just examining the structure of the bag, and comparing its\npayload-oxum (byte count and number of files) then use the ``--fast``\nflag.\n\n::\n\n bagit.py --validate --fast /path/to/bag\n\nAnd finally, if you'd like to parallelize validation to take advantage\nof multiple CPUs you can:\n\n::\n\n bagit.py --validate --processes 4 /path/to/bag\n\nUsing BagIt in your programs\n----------------------------\n\nYou can also use BagIt programatically in your own Python programs by\nimporting the ``bagit`` module.\n\nCreate\n~~~~~~\n\nTo create a bag you would do this:\n\n.. code:: python\n\n bag = bagit.make_bag('mydir', {'Contact-Name': 'John Kunze'})\n\n``make_bag`` returns a Bag instance. If you have a bag already on disk\nand would like to create a Bag instance for it, simply call the\nconstructor directly:\n\n.. code:: python\n\n bag = bagit.Bag('/path/to/bag')\n\nUpdate Bag Metadata\n~~~~~~~~~~~~~~~~~~~\n\nYou can change the metadata persisted to the bag-info.txt by using the\n``info`` property on a ``Bag``.\n\n.. code:: python\n\n # load the bag\n bag = bagit.Bag('/path/to/bag')\n\n # update bag info metadata\n bag.info['Internal-Sender-Description'] = 'Updated on 2014-06-28.'\n bag.info['Authors'] = ['John Kunze', 'Andy Boyko']\n bag.save()\n\nUpdate Bag Manifests\n~~~~~~~~~~~~~~~~~~~~\n\nBy default ``save`` will not update manifests. This guards against a\nsituation where a call to ``save`` to persist bag metadata accidentally\nregenerates manifests for an invalid bag. If you have modified the\npayload of a bag by adding, modifying or deleting files in the data\ndirectory, and wish to regenerate the manifests set the ``manifests``\nparameter to True when calling ``save``.\n\n.. code:: python\n\n\n import shutil, os\n\n # add a file\n shutil.copyfile('newfile', '/path/to/bag/data/newfile')\n\n # remove a file\n os.remove('/path/to/bag/data/file')\n\n # persist changes\n bag.save(manifests=True)\n\nThe save method takes an optional processes parameter which will\ndetermine how many processes are used to regenerate the checksums. This\ncan be handy on multicore machines.\n\nValidation\n~~~~~~~~~~\n\nIf you would like to see if a bag is valid, use its ``is_valid`` method:\n\n.. code:: python\n\n bag = bagit.Bag('/path/to/bag')\n if bag.is_valid():\n print(\"yay :)\")\n else:\n print(\"boo :(\")\n\nIf you'd like to get a detailed list of validation errors, execute the\n``validate`` method and catch the ``BagValidationError`` exception. If\nthe bag's manifest was invalid (and it wasn't caught by the payload\noxum) the exception's ``details`` property will contain a list of\n``ManifestError``\\ s that you can introspect on. Each ManifestError,\nwill be of type ``ChecksumMismatch``, ``FileMissing``,\n``UnexpectedFile``.\n\nSo for example if you want to print out checksums that failed to\nvalidate you can do this:\n\n.. code:: python\n\n\n bag = bagit.Bag(\"/path/to/bag\")\n\n try:\n bag.validate()\n\n except bagit.BagValidationError as e:\n for d in e.details:\n if isinstance(d, bagit.ChecksumMismatch):\n print(\"expected %s to have %s checksum of %s but found %s\" %\n (d.path, d.algorithm, d.expected, d.found))\n\nTo iterate through a bag's manifest and retrieve checksums for the\npayload files use the bag's entries dictionary:\n\n.. code:: python\n\n bag = bagit.Bag(\"/path/to/bag\")\n\n for path, fixity in bag.entries.items():\n print(\"path:%s md5:%s\" % (path, fixity[\"md5\"]))\n\nContributing to bagit-python development\n----------------------------------------\n\n::\n\n % git clone git://github.com/LibraryOfCongress/bagit-python.git\n % cd bagit-python\n # MAKE CHANGES\n % python test.py\n\nRunning the tests\n~~~~~~~~~~~~~~~~~\n\nYou can quickly run the tests by having setuptools install dependencies:\n\n::\n\n python setup.py test\n\nOnce your code is working, you can use\n`Tox <https://tox.readthedocs.io/>`__ to run the tests with every\nsupported version of Python which you have installed on the local\nsystem:\n\n::\n\n tox\n\nIf you have Docker installed, you can run the tests under Linux inside a\ncontainer:\n\n::\n\n % docker build -t bagit:latest . && docker run -it bagit:latest\n\nBenchmarks\n----------\n\nIf you'd like to see how increasing parallelization of bag creation on\nyour system effects the time to create a bag try using the included\nbench utility:\n\n::\n\n % ./bench.py\n\nLicense\n-------\n\n|cc0|\n\nNote: By contributing to this project, you agree to license your work\nunder the same terms as those that govern this project's distribution.\n\n.. |Coverage Status| image:: https://coveralls.io/repos/github/LibraryOfCongress/bagit-python/badge.svg?branch=master\n :target: https://coveralls.io/github/LibraryOfCongress/bagit-python?branch=master\n.. |cc0| image:: http://i.creativecommons.org/p/zero/1.0/88x31.png\n :target: http://creativecommons.org/publicdomain/zero/1.0/\n",
"bugtrack_url": null,
"license": "",
"summary": "Create and validate BagIt packages",
"version": "1.8.1.post2",
"project_urls": {
"Homepage": "https://libraryofcongress.github.io/bagit-python/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "db686a41759ed3d8135bb6aad00a83b99bd98f27d2d8a4e08a0280d8b726c5b8",
"md5": "8377d975089c22f00ece2a2534c0c450",
"sha256": "bdd500c5bc40600c5d0504c11b888ff488ebf3619c8f2a3e28b683535eb02191"
},
"downloads": -1,
"filename": "ocrd_fork_bagit-1.8.1.post2-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "8377d975089c22f00ece2a2534c0c450",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 35456,
"upload_time": "2024-01-24T14:24:10",
"upload_time_iso_8601": "2024-01-24T14:24:10.200169Z",
"url": "https://files.pythonhosted.org/packages/db/68/6a41759ed3d8135bb6aad00a83b99bd98f27d2d8a4e08a0280d8b726c5b8/ocrd_fork_bagit-1.8.1.post2-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e82c1bc15418441590a10803d23d562338f91c79439dccd8d82a55dd8ab875ab",
"md5": "9154c4434ad332341f8f8a054039bfea",
"sha256": "bf51c7f488b85d6af72ae1d54cd1625d64283f76e4cb84ec5081832d1ae25973"
},
"downloads": -1,
"filename": "ocrd-fork-bagit-1.8.1.post2.tar.gz",
"has_sig": false,
"md5_digest": "9154c4434ad332341f8f8a054039bfea",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 29999,
"upload_time": "2024-01-24T14:24:12",
"upload_time_iso_8601": "2024-01-24T14:24:12.809639Z",
"url": "https://files.pythonhosted.org/packages/e8/2c/1bc15418441590a10803d23d562338f91c79439dccd8d82a55dd8ab875ab/ocrd-fork-bagit-1.8.1.post2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-24 14:24:12",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "ocrd-fork-bagit"
}