bdrc-volume-manifest-builder


Namebdrc-volume-manifest-builder JSON
Version 1.2.10 PyPI version JSON
download
home_pagehttps://github.com/buda-base/volume-manifest-builder/
SummaryCreates manifests for syncd works.
upload_time2023-07-28 23:03:35
maintainer
docs_urlNone
authorjimk
requires_python>=3.7
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # `bdrc-volume-manifest-builder`

## New in Release 1.1

- Ability to use either file system or S3 for image repository

## Intent

This project originated as a script to extract image dimensions from a work, and:

+ write the dimensions to a json file
+ report on images which broke certain rules.

## Implementation

Archival Operations determined that this would be most useful to BUDA to implement as a service which could be injected
into the current sync process. To do this, the system needed to:

- be more modular
- be distributable onto an instance which could be cloned in AWS.

This branch expands the original tool by:

- Adding the ability to use the eXist db as a source for the image dimensions.
- Use a pre-built BOM Bill of Materials) to derive the files which should be included in the dimesnsions file
- Read input from either S3 or local file system repositories
- Create and save log files.
- Manage input files.
- Run as a service on a Linux platform

### Standalone tool

Internal tool to create json manifests of image format data for volumes present in S3 to support the BUDA IIIF
presentation server.

##### Language

Python 3.7 or newer. It is highly recommended to use `pip` to install, to manage dependencies. If you **must** do it
yourself, you can refer to `setup.py` for the dependency list.

##### Environment

1. Write access to `/var/log/VolumeManifestBuilder` which must exist.
2. `systemctl` service management, if you want to use the existing materials to install as a service.

## Usage

### Command line usage

The command line mode allows running one batch or one work at a time. Arguments
specify the parameters, options.

You also must choose a **repository mode** which determines if the images
are on a local file system (the `fs` mode), or on an AWS S3 system (the `s3`)
mode.

#### Common parameters

This section describes the parameters which are independent of the repository mode.

```shell script
$ manifestforwork -h
usage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]}

Prepares an inventory of image dimensions

optional arguments:
  -h, --help            show this help message and exit
  -d {info,warning,error,debug,critical}, --debugLevel {info,warning,error,debug,critical}
                        choice values are from python logging module
  -l LOG_PARENT, --logDir LOG_PARENT
                        Path to log file directory
  -f WORK_LIST_FILE, --workListFile WORK_LIST_FILE
                        File containing one RID per line.
  -w WORK_RID, --work-Rid WORK_RID
                        name or partially qualified path to one work
  -p POLL_INTERVAL, --poll-interval POLL_INTERVAL
                        Seconds between alerts for file.

Repository Parser:
  Handles repository alternatives


  {s3,fs}

```

Common usage Notes:

`-f/--workListFile` is a file which contains a list of RIDS, **or a list of paths
to work RIDs, in the `fs` mode (see below.)**
`-w/--workRID` is a single work.

- The `--workListFile` and `--workRid` arguments are mutually exclusive

- `-p` is disregarded in this mode. It is an argument to the `manifestFromS3`
- The system logs its activity into a file named _yyyy-MM-DD_HH_MM_PID_.local_v_m_b.log`
  in the folder given in the `-l/--logDir` argument (default `/var/log`)
  mode.

#### fs Mode Usage

```shell script
❯ manifestforwork fs -h
usage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]} fs
       [-h] [-c CONTAINER] [-i IMAGE_FOLDER_NAME]

optional arguments:
  -h, --help            show this help message and exit
  -c CONTAINER, --container CONTAINER
                        container for all work_Rid archives. Prefixes entries
                        in --source_rid or --workList
  -i IMAGE_FOLDER_NAME, --image-folder-name IMAGE_FOLDER_NAME
                        name of parent folder of image files
```

Notes:

+ the `-c/--container` defines a path to the RIDS (or the RID subpaths) given.
  It is optional. It prepends its value to the WorkRID paths or individual workRIDs
  in the input file (`-f`) or to the individual work (`-w`)

In the `-w` or `-f` options above. The system supports user expansion
(`~[uid]/path...` in Linux) and environment variable expansion in both the `-c`
and the `-f` options. That is, the file given in the `-f` option can contain

- Environment variables
- User alias pathnames (`~[user]/...`)
- Fully qualified pathnames

e.g.

```shell script
> pwd
/data
>ls
Works
>ls ~/tmp
/home/me/tmp/Works
> export THISWORK="Works/FromThom"
> cat workList
$WORKS/W12345
~/tmp/$WORKS/W12345
/home/me/tmp/Works/W89012
```

using this list in

```shell script
> manifestforwork -f worklist fs
```

will process files from

- /data/Works/FromThom
- /home/me/tmp/Works/FromThom
- /home/me/tmp/Works/W89012
  if the `--container` argument is not given. (`-c` defaults to the current working
  directory)

#### s3 mode usage

```shell script
❯ manifestforwork s3 --help
usage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]} s3
       [-h] [-b BUCKET]

optional arguments:
  -h, --help            show this help message and exit
  -b BUCKET, --bucket BUCKET
                        Bucket - source and destination
```

The S3 mode uses a bucket named with the optional `-b/--bucket` argument. The default bucket
is closely held. note that the `--container` argument is not applicable in this mode, and
that if a worklist is given, it must contain only RIDs, not paths.

### manifestFromS3 input

`manifestFromS3` is a mode which waits for a list of RIDs or paths to appear in a well known location
and then processes what it finds there as if it were given in the `--workFile` argument.

All the other parameters are the same - `manifestFromS3` can work on local file system (`fs`)
or on `s3` targets.

- Upload an input list (file name does not matter)
  to [s3://manifest.bdrc.org/processing/todo/](s3://manifest.bdrc.org/processing/todo/)
- run `manifestFromS3 -p n [ -l {info,debug,error} {fs [ fs arguments ] | s3 [ -b alternative.bucket]}`
  from the command line.

`manifestFromS3` does the following:

1. Moves the input list from `s3://manifest.bdrc.org/processing/input` to `.../processing/inprocess` and changes the
   name from <input> to <input-timestamp-instance-id>
2. Runs the processing, uploading a dimensions.json file for each volume in each
   RID in the input list.
3. When complete, it moves the file from `.../processing/inprocess` to `../processing/done`

## Installation

### PIP

PyPI contains `bdrc-volume-manifest-builder`

#### Global installation

Install is simply
`sudo python3 -m pip install --upgrade bdrc-volume-manifest-builder` to install system-wide (which is needed to run as a
service)

#### Local installation

To install and run locally, `python3 -m pip install --upgrade bdrc-volume-manifest-builder` will do. Best to do this in
a virtual python environment, see [venv](https://docs.python.org/3/library/venv.html)

When you install `volume-manifest-builder` three entry points are defined in `/usr/local/bin` (or your local
environment):

- `manifestforlist` the command mode, which operates on a list of RIDs
- `manifestforwork` alternate command line mode, which works on one path
- `manifestFromS3` the mode which runs continuously, polling an S3 resource for a file, and processing all the files it
  finds.
  This is the mode which runs on a service.

## Service

See [Service Readme](service/README.md) for details on installing manifestFromS3 as a service on `systemctl` supporting
platforms.

## Development

`volume-manifest-builder` is hosted
on [BUDA Github volume-manifest-builder](https://github.com/buda-base/volume-manifest-builder/)

- Credentials: you must have the input credentials for a specific AWS user installed to deposit into the archives on s3.

## Usage

`volume-manifest-builder` has two use cases:

+ command line, which allows using a list of workRIDS on a local system
+ service, which continually polls a well-known location, `s3://manifest.bdrc.org/processing/todo/` for a file.

## Building a distribution

Be sure to check PyPI for current release, and update accordingly.
Use [PEP440](https://www.python.org/dev/peps/pep-0440/#post-releases) for naming releases.

### Prerequisites

- `pip3 install wheel`
- `pip3 install twine`

```bash
python3 setup.py bdist_wheel
twine upload dist/<thing you built
```

# Project changelog

| Release | Changes                         |
|---------|---------------------------------|
| 1.2.9   | Error diags in generateManifest |
| 1.2.8   | Update changelog to readme      |
| 1.2.7   | Use bdrc-util logging           |
| 1.2.6   | Use BUDA only  for resolution   |
|         | Use BUDA first for resolution   |
| 1.2.0   | Sort all output by filename     |



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/buda-base/volume-manifest-builder/",
    "name": "bdrc-volume-manifest-builder",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "jimk",
    "author_email": "jimk@tbrc.org",
    "download_url": "",
    "platform": null,
    "description": "# `bdrc-volume-manifest-builder`\n\n## New in Release 1.1\n\n- Ability to use either file system or S3 for image repository\n\n## Intent\n\nThis project originated as a script to extract image dimensions from a work, and:\n\n+ write the dimensions to a json file\n+ report on images which broke certain rules.\n\n## Implementation\n\nArchival Operations determined that this would be most useful to BUDA to implement as a service which could be injected\ninto the current sync process. To do this, the system needed to:\n\n- be more modular\n- be distributable onto an instance which could be cloned in AWS.\n\nThis branch expands the original tool by:\n\n- Adding the ability to use the eXist db as a source for the image dimensions.\n- Use a pre-built BOM Bill of Materials) to derive the files which should be included in the dimesnsions file\n- Read input from either S3 or local file system repositories\n- Create and save log files.\n- Manage input files.\n- Run as a service on a Linux platform\n\n### Standalone tool\n\nInternal tool to create json manifests of image format data for volumes present in S3 to support the BUDA IIIF\npresentation server.\n\n##### Language\n\nPython 3.7 or newer. It is highly recommended to use `pip` to install, to manage dependencies. If you **must** do it\nyourself, you can refer to `setup.py` for the dependency list.\n\n##### Environment\n\n1. Write access to `/var/log/VolumeManifestBuilder` which must exist.\n2. `systemctl` service management, if you want to use the existing materials to install as a service.\n\n## Usage\n\n### Command line usage\n\nThe command line mode allows running one batch or one work at a time. Arguments\nspecify the parameters, options.\n\nYou also must choose a **repository mode** which determines if the images\nare on a local file system (the `fs` mode), or on an AWS S3 system (the `s3`)\nmode.\n\n#### Common parameters\n\nThis section describes the parameters which are independent of the repository mode.\n\n```shell script\n$ manifestforwork -h\nusage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]}\n\nPrepares an inventory of image dimensions\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -d {info,warning,error,debug,critical}, --debugLevel {info,warning,error,debug,critical}\n                        choice values are from python logging module\n  -l LOG_PARENT, --logDir LOG_PARENT\n                        Path to log file directory\n  -f WORK_LIST_FILE, --workListFile WORK_LIST_FILE\n                        File containing one RID per line.\n  -w WORK_RID, --work-Rid WORK_RID\n                        name or partially qualified path to one work\n  -p POLL_INTERVAL, --poll-interval POLL_INTERVAL\n                        Seconds between alerts for file.\n\nRepository Parser:\n  Handles repository alternatives\n\n\n  {s3,fs}\n\n```\n\nCommon usage Notes:\n\n`-f/--workListFile` is a file which contains a list of RIDS, **or a list of paths\nto work RIDs, in the `fs` mode (see below.)**\n`-w/--workRID` is a single work.\n\n- The `--workListFile` and `--workRid` arguments are mutually exclusive\n\n- `-p` is disregarded in this mode. It is an argument to the `manifestFromS3`\n- The system logs its activity into a file named _yyyy-MM-DD_HH_MM_PID_.local_v_m_b.log`\n  in the folder given in the `-l/--logDir` argument (default `/var/log`)\n  mode.\n\n#### fs Mode Usage\n\n```shell script\n\u276f manifestforwork fs -h\nusage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]} fs\n       [-h] [-c CONTAINER] [-i IMAGE_FOLDER_NAME]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -c CONTAINER, --container CONTAINER\n                        container for all work_Rid archives. Prefixes entries\n                        in --source_rid or --workList\n  -i IMAGE_FOLDER_NAME, --image-folder-name IMAGE_FOLDER_NAME\n                        name of parent folder of image files\n```\n\nNotes:\n\n+ the `-c/--container` defines a path to the RIDS (or the RID subpaths) given.\n  It is optional. It prepends its value to the WorkRID paths or individual workRIDs\n  in the input file (`-f`) or to the individual work (`-w`)\n\nIn the `-w` or `-f` options above. The system supports user expansion\n(`~[uid]/path...` in Linux) and environment variable expansion in both the `-c`\nand the `-f` options. That is, the file given in the `-f` option can contain\n\n- Environment variables\n- User alias pathnames (`~[user]/...`)\n- Fully qualified pathnames\n\ne.g.\n\n```shell script\n> pwd\n/data\n>ls\nWorks\n>ls ~/tmp\n/home/me/tmp/Works\n> export THISWORK=\"Works/FromThom\"\n> cat workList\n$WORKS/W12345\n~/tmp/$WORKS/W12345\n/home/me/tmp/Works/W89012\n```\n\nusing this list in\n\n```shell script\n> manifestforwork -f worklist fs\n```\n\nwill process files from\n\n- /data/Works/FromThom\n- /home/me/tmp/Works/FromThom\n- /home/me/tmp/Works/W89012\n  if the `--container` argument is not given. (`-c` defaults to the current working\n  directory)\n\n#### s3 mode usage\n\n```shell script\n\u276f manifestforwork s3 --help\nusage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]} s3\n       [-h] [-b BUCKET]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -b BUCKET, --bucket BUCKET\n                        Bucket - source and destination\n```\n\nThe S3 mode uses a bucket named with the optional `-b/--bucket` argument. The default bucket\nis closely held. note that the `--container` argument is not applicable in this mode, and\nthat if a worklist is given, it must contain only RIDs, not paths.\n\n### manifestFromS3 input\n\n`manifestFromS3` is a mode which waits for a list of RIDs or paths to appear in a well known location\nand then processes what it finds there as if it were given in the `--workFile` argument.\n\nAll the other parameters are the same - `manifestFromS3` can work on local file system (`fs`)\nor on `s3` targets.\n\n- Upload an input list (file name does not matter)\n  to [s3://manifest.bdrc.org/processing/todo/](s3://manifest.bdrc.org/processing/todo/)\n- run `manifestFromS3 -p n [ -l {info,debug,error} {fs [ fs arguments ] | s3 [ -b alternative.bucket]}`\n  from the command line.\n\n`manifestFromS3` does the following:\n\n1. Moves the input list from `s3://manifest.bdrc.org/processing/input` to `.../processing/inprocess` and changes the\n   name from <input> to <input-timestamp-instance-id>\n2. Runs the processing, uploading a dimensions.json file for each volume in each\n   RID in the input list.\n3. When complete, it moves the file from `.../processing/inprocess` to `../processing/done`\n\n## Installation\n\n### PIP\n\nPyPI contains `bdrc-volume-manifest-builder`\n\n#### Global installation\n\nInstall is simply\n`sudo python3 -m pip install --upgrade bdrc-volume-manifest-builder` to install system-wide (which is needed to run as a\nservice)\n\n#### Local installation\n\nTo install and run locally, `python3 -m pip install --upgrade bdrc-volume-manifest-builder` will do. Best to do this in\na virtual python environment, see [venv](https://docs.python.org/3/library/venv.html)\n\nWhen you install `volume-manifest-builder` three entry points are defined in `/usr/local/bin` (or your local\nenvironment):\n\n- `manifestforlist` the command mode, which operates on a list of RIDs\n- `manifestforwork` alternate command line mode, which works on one path\n- `manifestFromS3` the mode which runs continuously, polling an S3 resource for a file, and processing all the files it\n  finds.\n  This is the mode which runs on a service.\n\n## Service\n\nSee [Service Readme](service/README.md) for details on installing manifestFromS3 as a service on `systemctl` supporting\nplatforms.\n\n## Development\n\n`volume-manifest-builder` is hosted\non [BUDA Github volume-manifest-builder](https://github.com/buda-base/volume-manifest-builder/)\n\n- Credentials: you must have the input credentials for a specific AWS user installed to deposit into the archives on s3.\n\n## Usage\n\n`volume-manifest-builder` has two use cases:\n\n+ command line, which allows using a list of workRIDS on a local system\n+ service, which continually polls a well-known location, `s3://manifest.bdrc.org/processing/todo/` for a file.\n\n## Building a distribution\n\nBe sure to check PyPI for current release, and update accordingly.\nUse [PEP440](https://www.python.org/dev/peps/pep-0440/#post-releases) for naming releases.\n\n### Prerequisites\n\n- `pip3 install wheel`\n- `pip3 install twine`\n\n```bash\npython3 setup.py bdist_wheel\ntwine upload dist/<thing you built\n```\n\n# Project changelog\n\n| Release | Changes                         |\n|---------|---------------------------------|\n| 1.2.9   | Error diags in generateManifest |\n| 1.2.8   | Update changelog to readme      |\n| 1.2.7   | Use bdrc-util logging           |\n| 1.2.6   | Use BUDA only  for resolution   |\n|         | Use BUDA first for resolution   |\n| 1.2.0   | Sort all output by filename     |\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Creates manifests for syncd works.",
    "version": "1.2.10",
    "project_urls": {
        "Homepage": "https://github.com/buda-base/volume-manifest-builder/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "498901b5538173181a222a3a31b175de9e28aa16021598ff47ca5f061925ae73",
                "md5": "cec1a87da349e7965c4f1c873fe8075b",
                "sha256": "03d0080e07d5b1ea3c9137ec76677d35da82b9bd269007aa06d15caff930c374"
            },
            "downloads": -1,
            "filename": "bdrc_volume_manifest_builder-1.2.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cec1a87da349e7965c4f1c873fe8075b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 45806,
            "upload_time": "2023-07-28T23:03:35",
            "upload_time_iso_8601": "2023-07-28T23:03:35.440226Z",
            "url": "https://files.pythonhosted.org/packages/49/89/01b5538173181a222a3a31b175de9e28aa16021598ff47ca5f061925ae73/bdrc_volume_manifest_builder-1.2.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-28 23:03:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "buda-base",
    "github_project": "volume-manifest-builder",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "bdrc-volume-manifest-builder"
}
        
Elapsed time: 0.09659s