bdrc-volume-manifest-builder


Namebdrc-volume-manifest-builder JSON
Version 1.3.2 PyPI version JSON
download
home_pagehttps://github.com/buda-base/volume-manifest-builder/
SummaryCreates manifests for syncd works.
upload_time2024-09-06 16:26:19
maintainerNone
docs_urlNone
authorjimk
requires_python>=3.7
licenseNone
keywords
VCS
bugtrack_url
requirements boto3 Pillow botocore s3transfer setuptools boto aiofiles lxml requests s3pathlib
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ``bdrc-volume-manifest-builder``
================================


New in Release 1.3
------------------

+--------+--------------------------------+
|Release | Comment                        |
+========+================================+
|1.3.2   |Adds a missing integration      |
+--------+--------------------------------+
|1.3.1   | Corrects the  ``setup.py``     |
+--------+--------------------------------+
|1.3.0   | adds the ability to specify    |
|        | named image groups on the      |
|        | command line                   |
+--------+--------------------------------+

ex: ``manifestforwork -w W23834 --image-group 3187,I123456 fs``

Notes:

-  the -i/–image-group argument is a comma-separated list of image
   groups (or one item) If it is not given, all the image groups in the
   work’s BUDA catalog will be processed.
-  the –image-group flag cannot be given with the –work-list-file
   argument.
-  The –image-group arguments **do** apply when the –work-rid argument is a file path.
- The ``image-group`` argument is available in both the ``fs`` and ``s3`` modes.

New in Release 1.1
------------------

-  Ability to use either file system or S3 for image repository

Intent
------

This project originated as a script to extract image dimensions from a
work, and:

-  write the dimensions to a json file
-  report on images which broke certain rules.

Implementation
--------------

Archival Operations determined that this would be most useful to BUDA to
implement as a service which could be injected into the current sync
process. To do this, the system needed to:

-  be more modular
-  be distributable onto an instance which could be cloned in AWS.

This branch expands the original tool by:

-  Adding the ability to use the eXist db as a source for the image
   dimensions.
-  Use a pre-built BOM Bill of Materials) to derive the files which
   should be included in the dimesnsions file
-  Read input from either S3 or local file system repositories
-  Create and save log files.
-  Manage input files.
-  Run as a service on a Linux platform

Standalone tool
~~~~~~~~~~~~~~~

Internal tool to create json manifests of image format data for volumes
present in S3 to support the BUDA IIIF presentation server.

Language
--------

Python 3.7 or newer. It is highly recommended to use ``pip`` to install,
to manage dependencies. If you **must** do it yourself, you can refer to
``setup.py`` for the dependency list.

Environment
-----------

1. Write access to ``/var/log/VolumeManifestBuilder`` which must exist.
2. ``systemctl`` service management, if you want to use the existing
   materials to install as a service.

Usage
-----

Command line usage
~~~~~~~~~~~~~~~~~~

The command line mode allows running one batch or one work at a time.
Arguments specify the parameters, options.

You also must choose a **repository mode** which determines if the
images are on a local file system (the ``fs`` mode), or on an AWS S3
system (the ``s3``) mode.

Common parameters
^^^^^^^^^^^^^^^^^

This section describes the parameters which are independent of the
repository mode.

.. code-block:: bash

     $ manifestforwork -h usage: manifestforwork [common
    options] { fs [fs options] \| s3 [s3 options]}

    Prepares an inventory of image dimensions

    optional arguments: -h, –help show this help message and exit -d
    {info,warning,error,debug,critical}, –debugLevel
    {info,warning,error,debug,critical} choice values are from python
    logging module -l LOG_PARENT, –logDir LOG_PARENT Path to log file
    directory -f WORK_LIST_FILE, –workListFile WORK_LIST_FILE File
    containing one RID per line. -w WORK_RID, –work-Rid WORK_RID name or
    partially qualified path to one work

    Repository Parser: Handles repository alternatives

    {s3,fs}

   Common usage Notes:

   `-f/--workListFile` is a file which contains a list of RIDS, **or a list of paths
   to work RIDs, in the `fs` mode (see below.)**
   `-w/--workRID` is a single work.

   - The `--workListFile` and `--workRid` arguments are mutually exclusive

   - The system logs its activity into a file named _yyyy-MM-DD_HH_MM_PID_.local_v_m_b.log`
     in the folder given in the `-l/--logDir` argument (default `/var/log`)
     mode.

     Before release 1.3.0, `manifestforwork` used an externally generated list of files (fileList.json) in the source directory to specify the population to process.
     After that, the entire directory is scanned (this was needed to be able to process arbitrary image groups.), and the file list is disregarded.

     The use of a file list was in response to many badly formed entries in `dimensions.json` due to random files being scanned. In Release 1.3.0, these files 
     are now explicitly tagged in the `dimensions.json`

   {
     "filename":"SomeFile.ext",
     "error":"UnidentifiedImageError"
   }



fs Mode Usage
^^^^^^^^^^^^^

.. code-block:: bash

    ❯ manifestforwork fs -h usage: manifestforwork [common
    options] { fs [fs options] \| s3 [s3 options]} fs [-h] [-c CONTAINER]
    [-i IMAGE_FOLDER_NAME]

    optional arguments: -h, –help show this help message and exit -c
    CONTAINER, –container CONTAINER container for all work_rid archives.
    Prefixes entries in –source_rid or –workList -i IMAGE_FOLDER_NAME,
    –image-folder-name IMAGE_FOLDER_NAME name of parent folder of image
    files

   Notes:

   + the `-c/--container` defines a path to the RIDS (or the RID subpaths) given.
     It is optional. It prepends its value to the WorkRID paths or individual workRIDs
     in the input file (`-f`) or to the individual work (`-w`)

   In the `-w` or `-f` options above. The system supports user expansion
   (`~[uid]/path...` in Linux) and environment variable expansion in both the `-c`
   and the `-f` options. That is, the file given in the `-f` option can contain

   - Environment variables
   - User alias pathnames (`~[user]/...`)
   - Fully qualified pathnames

   e.g.

   > pwd
   /data
   >ls
   Works
   >ls ~/tmp
   /home/me/tmp/Works
   > export THISWORK="Works/FromThom"
   > cat workList
   $WORKS/W12345
   ~/tmp/$WORKS/W12345
   /home/me/tmp/Works/W89012


using this list in

.. code-block:: bash

    manifestforwork -f worklist fs


will process files from

-  /data/Works/FromThom
-  /home/me/tmp/Works/FromThom
-  /home/me/tmp/Works/W89012 if the ``--container`` argument is not
   given. (``-c`` defaults to the current working directory)

s3 mode usage
^^^^^^^^^^^^^

.. code-block:: bash

    ❯ manifestforwork s3 –help usage: manifestforwork
    [common options] { fs [fs options] \| s3 [s3 options]} s3 [-h] [-b
    BUCKET]

    optional arguments: -h, –help show this help message and exit -b BUCKET,
    –bucket BUCKET Bucket - source and destination

   The S3 mode uses a bucket named with the optional `-b/--bucket` argument. The default bucket
   is closely held. note that the `--container` argument is not applicable in this mode, and
   that if a worklist is given, it must contain only RIDs, not paths.

   The bucket example takes the aws s3api form, e.g. `--bucket somewhere.over.the.rainbow`


Installation
------------

PIP
~~~

PyPI contains `bdrc-volume-manifest-builder`

Global installation
^^^^^^^^^^^^^^^^^^^^

Install is simply
`sudo python3 -m pip install --upgrade bdrc-volume-manifest-builder` to install system-wide (which is needed to run as a
service)

Local installation
^^^^^^^^^^^^^^^^^^

To install and run locally, `python3 -m pip install --upgrade bdrc-volume-manifest-builder` will do. Best to do this in
a virtual python environment, see [venv](https://docs.python.org/3/library/venv.html)

When you install `volume-manifest-builder` three entry points are defined in `/usr/local/bin` (or your local
environment):

- `manifestforlist` the command mode, which operates on a list of RIDs
- `manifestforwork` alternate command line mode, which works on one path

## Service

See [Service Readme](service/README.md) for details on installing manifestFromS3 as a service on `systemctl` supporting
platforms.

Development
-----------

`volume-manifest-builder` is hosted
on [BUDA Github volume-manifest-builder](https://github.com/buda-base/volume-manifest-builder/)

- Credentials: you must have the input credentials for a specific AWS user installed to deposit into the archives on s3.

Usage
-----

`volume-manifest-builder` has two use cases:

+ command line, which allows using a list of workRIDS on a local system
+ service, which continually polls a well-known location, `s3://manifest.bdrc.org/processing/todo/` for a file.

Building a distribution
-----------------------

Be sure to check PyPI for current release, and update accordingly.
Use `PEP440 <https://www.python.org/dev/peps/pep-0440/#post-releases>`__ for naming releases.

Prerequisites
~~~~~~~~~~~~~

.. code-block:: bash

    pip3 install setuptools
    pip3 install wheel
    pip3 install twine

Building
~~~~~~~~

.. code-block:: bash

    python3 setup.py sdist bdist_wheel
    twine upload dist/<thing you built>

Project changelog
=================

======= =================================================================================================================== ===============================
Release Commit                                                                                                              Changes
======= =================================================================================================================== ===============================
1.3.0   `30a3b2c3 <https://github.com/buda-base/volume-manifest-builder/commit/30a3b2c3d58e8a6f8ed3106888dcefa148ff695f>`__ use specific image groups
\       `82adb9f <https://github.com/buda-base/volume-manifest-builder/commit/a65c81f80e47542d92e63b50e9fcd0889a26484b>`__  Use only image files in search
1.2.10                                                                                                                      Clean up S3 error message
1.2.9                                                                                                                       Error diags in generateManifest
1.2.8                                                                                                                       Update changelog to readme
1.2.7                                                                                                                       Use bdrc-util logging
1.2.6                                                                                                                       Use BUDA only for resolution
\                                                                                                                           Use BUDA first for resolution
1.2.0                                                                                                                       Sort all output by filename
======= =================================================================================================================== ===============================



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/buda-base/volume-manifest-builder/",
    "name": "bdrc-volume-manifest-builder",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "jimk",
    "author_email": "jimk@tbrc.org",
    "download_url": null,
    "platform": null,
    "description": "``bdrc-volume-manifest-builder``\n================================\n\n\nNew in Release 1.3\n------------------\n\n+--------+--------------------------------+\n|Release | Comment                        |\n+========+================================+\n|1.3.2   |Adds a missing integration      |\n+--------+--------------------------------+\n|1.3.1   | Corrects the  ``setup.py``     |\n+--------+--------------------------------+\n|1.3.0   | adds the ability to specify    |\n|        | named image groups on the      |\n|        | command line                   |\n+--------+--------------------------------+\n\nex: ``manifestforwork -w W23834 --image-group 3187,I123456 fs``\n\nNotes:\n\n-  the -i/\u2013image-group argument is a comma-separated list of image\n   groups (or one item) If it is not given, all the image groups in the\n   work\u2019s BUDA catalog will be processed.\n-  the \u2013image-group flag cannot be given with the \u2013work-list-file\n   argument.\n-  The \u2013image-group arguments **do** apply when the \u2013work-rid argument is a file path.\n- The ``image-group`` argument is available in both the ``fs`` and ``s3`` modes.\n\nNew in Release 1.1\n------------------\n\n-  Ability to use either file system or S3 for image repository\n\nIntent\n------\n\nThis project originated as a script to extract image dimensions from a\nwork, and:\n\n-  write the dimensions to a json file\n-  report on images which broke certain rules.\n\nImplementation\n--------------\n\nArchival Operations determined that this would be most useful to BUDA to\nimplement as a service which could be injected into the current sync\nprocess. To do this, the system needed to:\n\n-  be more modular\n-  be distributable onto an instance which could be cloned in AWS.\n\nThis branch expands the original tool by:\n\n-  Adding the ability to use the eXist db as a source for the image\n   dimensions.\n-  Use a pre-built BOM Bill of Materials) to derive the files which\n   should be included in the dimesnsions file\n-  Read input from either S3 or local file system repositories\n-  Create and save log files.\n-  Manage input files.\n-  Run as a service on a Linux platform\n\nStandalone tool\n~~~~~~~~~~~~~~~\n\nInternal tool to create json manifests of image format data for volumes\npresent in S3 to support the BUDA IIIF presentation server.\n\nLanguage\n--------\n\nPython 3.7 or newer. It is highly recommended to use ``pip`` to install,\nto manage dependencies. If you **must** do it yourself, you can refer to\n``setup.py`` for the dependency list.\n\nEnvironment\n-----------\n\n1. Write access to ``/var/log/VolumeManifestBuilder`` which must exist.\n2. ``systemctl`` service management, if you want to use the existing\n   materials to install as a service.\n\nUsage\n-----\n\nCommand line usage\n~~~~~~~~~~~~~~~~~~\n\nThe command line mode allows running one batch or one work at a time.\nArguments specify the parameters, options.\n\nYou also must choose a **repository mode** which determines if the\nimages are on a local file system (the ``fs`` mode), or on an AWS S3\nsystem (the ``s3``) mode.\n\nCommon parameters\n^^^^^^^^^^^^^^^^^\n\nThis section describes the parameters which are independent of the\nrepository mode.\n\n.. code-block:: bash\n\n     $ manifestforwork -h usage: manifestforwork [common\n    options] { fs [fs options] \\| s3 [s3 options]}\n\n    Prepares an inventory of image dimensions\n\n    optional arguments: -h, \u2013help show this help message and exit -d\n    {info,warning,error,debug,critical}, \u2013debugLevel\n    {info,warning,error,debug,critical} choice values are from python\n    logging module -l LOG_PARENT, \u2013logDir LOG_PARENT Path to log file\n    directory -f WORK_LIST_FILE, \u2013workListFile WORK_LIST_FILE File\n    containing one RID per line. -w WORK_RID, \u2013work-Rid WORK_RID name or\n    partially qualified path to one work\n\n    Repository Parser: Handles repository alternatives\n\n    {s3,fs}\n\n   Common usage Notes:\n\n   `-f/--workListFile` is a file which contains a list of RIDS, **or a list of paths\n   to work RIDs, in the `fs` mode (see below.)**\n   `-w/--workRID` is a single work.\n\n   - The `--workListFile` and `--workRid` arguments are mutually exclusive\n\n   - The system logs its activity into a file named _yyyy-MM-DD_HH_MM_PID_.local_v_m_b.log`\n     in the folder given in the `-l/--logDir` argument (default `/var/log`)\n     mode.\n\n     Before release 1.3.0, `manifestforwork` used an externally generated list of files (fileList.json) in the source directory to specify the population to process.\n     After that, the entire directory is scanned (this was needed to be able to process arbitrary image groups.), and the file list is disregarded.\n\n     The use of a file list was in response to many badly formed entries in `dimensions.json` due to random files being scanned. In Release 1.3.0, these files \n     are now explicitly tagged in the `dimensions.json`\n\n   {\n     \"filename\":\"SomeFile.ext\",\n     \"error\":\"UnidentifiedImageError\"\n   }\n\n\n\nfs Mode Usage\n^^^^^^^^^^^^^\n\n.. code-block:: bash\n\n    \u276f manifestforwork fs -h usage: manifestforwork [common\n    options] { fs [fs options] \\| s3 [s3 options]} fs [-h] [-c CONTAINER]\n    [-i IMAGE_FOLDER_NAME]\n\n    optional arguments: -h, \u2013help show this help message and exit -c\n    CONTAINER, \u2013container CONTAINER container for all work_rid archives.\n    Prefixes entries in \u2013source_rid or \u2013workList -i IMAGE_FOLDER_NAME,\n    \u2013image-folder-name IMAGE_FOLDER_NAME name of parent folder of image\n    files\n\n   Notes:\n\n   + the `-c/--container` defines a path to the RIDS (or the RID subpaths) given.\n     It is optional. It prepends its value to the WorkRID paths or individual workRIDs\n     in the input file (`-f`) or to the individual work (`-w`)\n\n   In the `-w` or `-f` options above. The system supports user expansion\n   (`~[uid]/path...` in Linux) and environment variable expansion in both the `-c`\n   and the `-f` options. That is, the file given in the `-f` option can contain\n\n   - Environment variables\n   - User alias pathnames (`~[user]/...`)\n   - Fully qualified pathnames\n\n   e.g.\n\n   > pwd\n   /data\n   >ls\n   Works\n   >ls ~/tmp\n   /home/me/tmp/Works\n   > export THISWORK=\"Works/FromThom\"\n   > cat workList\n   $WORKS/W12345\n   ~/tmp/$WORKS/W12345\n   /home/me/tmp/Works/W89012\n\n\nusing this list in\n\n.. code-block:: bash\n\n    manifestforwork -f worklist fs\n\n\nwill process files from\n\n-  /data/Works/FromThom\n-  /home/me/tmp/Works/FromThom\n-  /home/me/tmp/Works/W89012 if the ``--container`` argument is not\n   given. (``-c`` defaults to the current working directory)\n\ns3 mode usage\n^^^^^^^^^^^^^\n\n.. code-block:: bash\n\n    \u276f manifestforwork s3 \u2013help usage: manifestforwork\n    [common options] { fs [fs options] \\| s3 [s3 options]} s3 [-h] [-b\n    BUCKET]\n\n    optional arguments: -h, \u2013help show this help message and exit -b BUCKET,\n    \u2013bucket BUCKET Bucket - source and destination\n\n   The S3 mode uses a bucket named with the optional `-b/--bucket` argument. The default bucket\n   is closely held. note that the `--container` argument is not applicable in this mode, and\n   that if a worklist is given, it must contain only RIDs, not paths.\n\n   The bucket example takes the aws s3api form, e.g. `--bucket somewhere.over.the.rainbow`\n\n\nInstallation\n------------\n\nPIP\n~~~\n\nPyPI contains `bdrc-volume-manifest-builder`\n\nGlobal installation\n^^^^^^^^^^^^^^^^^^^^\n\nInstall is simply\n`sudo python3 -m pip install --upgrade bdrc-volume-manifest-builder` to install system-wide (which is needed to run as a\nservice)\n\nLocal installation\n^^^^^^^^^^^^^^^^^^\n\nTo install and run locally, `python3 -m pip install --upgrade bdrc-volume-manifest-builder` will do. Best to do this in\na virtual python environment, see [venv](https://docs.python.org/3/library/venv.html)\n\nWhen you install `volume-manifest-builder` three entry points are defined in `/usr/local/bin` (or your local\nenvironment):\n\n- `manifestforlist` the command mode, which operates on a list of RIDs\n- `manifestforwork` alternate command line mode, which works on one path\n\n## Service\n\nSee [Service Readme](service/README.md) for details on installing manifestFromS3 as a service on `systemctl` supporting\nplatforms.\n\nDevelopment\n-----------\n\n`volume-manifest-builder` is hosted\non [BUDA Github volume-manifest-builder](https://github.com/buda-base/volume-manifest-builder/)\n\n- Credentials: you must have the input credentials for a specific AWS user installed to deposit into the archives on s3.\n\nUsage\n-----\n\n`volume-manifest-builder` has two use cases:\n\n+ command line, which allows using a list of workRIDS on a local system\n+ service, which continually polls a well-known location, `s3://manifest.bdrc.org/processing/todo/` for a file.\n\nBuilding a distribution\n-----------------------\n\nBe sure to check PyPI for current release, and update accordingly.\nUse `PEP440 <https://www.python.org/dev/peps/pep-0440/#post-releases>`__ for naming releases.\n\nPrerequisites\n~~~~~~~~~~~~~\n\n.. code-block:: bash\n\n    pip3 install setuptools\n    pip3 install wheel\n    pip3 install twine\n\nBuilding\n~~~~~~~~\n\n.. code-block:: bash\n\n    python3 setup.py sdist bdist_wheel\n    twine upload dist/<thing you built>\n\nProject changelog\n=================\n\n======= =================================================================================================================== ===============================\nRelease Commit                                                                                                              Changes\n======= =================================================================================================================== ===============================\n1.3.0   `30a3b2c3 <https://github.com/buda-base/volume-manifest-builder/commit/30a3b2c3d58e8a6f8ed3106888dcefa148ff695f>`__ use specific image groups\n\\       `82adb9f <https://github.com/buda-base/volume-manifest-builder/commit/a65c81f80e47542d92e63b50e9fcd0889a26484b>`__  Use only image files in search\n1.2.10                                                                                                                      Clean up S3 error message\n1.2.9                                                                                                                       Error diags in generateManifest\n1.2.8                                                                                                                       Update changelog to readme\n1.2.7                                                                                                                       Use bdrc-util logging\n1.2.6                                                                                                                       Use BUDA only for resolution\n\\                                                                                                                           Use BUDA first for resolution\n1.2.0                                                                                                                       Sort all output by filename\n======= =================================================================================================================== ===============================\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Creates manifests for syncd works.",
    "version": "1.3.2",
    "project_urls": {
        "Homepage": "https://github.com/buda-base/volume-manifest-builder/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0cf8c02df9dcf69290d2bd392fe37a7a9ddd9131cbf4fb921ad4ebc8624f6b77",
                "md5": "a07e132cd1a9cbafc6f242fe9b2b8881",
                "sha256": "058a3eea72d88519f89063c2e384482db881db6e2721af767a7255855e1aa55f"
            },
            "downloads": -1,
            "filename": "bdrc_volume_manifest_builder-1.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a07e132cd1a9cbafc6f242fe9b2b8881",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 42741,
            "upload_time": "2024-09-06T16:26:19",
            "upload_time_iso_8601": "2024-09-06T16:26:19.391647Z",
            "url": "https://files.pythonhosted.org/packages/0c/f8/c02df9dcf69290d2bd392fe37a7a9ddd9131cbf4fb921ad4ebc8624f6b77/bdrc_volume_manifest_builder-1.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-06 16:26:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "buda-base",
    "github_project": "volume-manifest-builder",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "boto3",
            "specs": []
        },
        {
            "name": "Pillow",
            "specs": []
        },
        {
            "name": "botocore",
            "specs": []
        },
        {
            "name": "s3transfer",
            "specs": []
        },
        {
            "name": "setuptools",
            "specs": []
        },
        {
            "name": "boto",
            "specs": []
        },
        {
            "name": "aiofiles",
            "specs": []
        },
        {
            "name": "lxml",
            "specs": []
        },
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "s3pathlib",
            "specs": []
        }
    ],
    "lcname": "bdrc-volume-manifest-builder"
}
        
Elapsed time: 0.28052s