# `bdrc-volume-manifest-builder`
## New in Release 1.1
- Ability to use either file system or S3 for image repository
## Intent
This project originated as a script to extract image dimensions from a work, and:
+ write the dimensions to a json file
+ report on images which broke certain rules.
## Implementation
Archival Operations determined that this would be most useful to BUDA to implement as a service which could be injected
into the current sync process. To do this, the system needed to:
- be more modular
- be distributable onto an instance which could be cloned in AWS.
This branch expands the original tool by:
- Adding the ability to use the eXist db as a source for the image dimensions.
- Use a pre-built BOM Bill of Materials) to derive the files which should be included in the dimesnsions file
- Read input from either S3 or local file system repositories
- Create and save log files.
- Manage input files.
- Run as a service on a Linux platform
### Standalone tool
Internal tool to create json manifests of image format data for volumes present in S3 to support the BUDA IIIF
presentation server.
##### Language
Python 3.7 or newer. It is highly recommended to use `pip` to install, to manage dependencies. If you **must** do it
yourself, you can refer to `setup.py` for the dependency list.
##### Environment
1. Write access to `/var/log/VolumeManifestBuilder` which must exist.
2. `systemctl` service management, if you want to use the existing materials to install as a service.
## Usage
### Command line usage
The command line mode allows running one batch or one work at a time. Arguments
specify the parameters, options.
You also must choose a **repository mode** which determines if the images
are on a local file system (the `fs` mode), or on an AWS S3 system (the `s3`)
mode.
#### Common parameters
This section describes the parameters which are independent of the repository mode.
```shell script
$ manifestforwork -h
usage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]}
Prepares an inventory of image dimensions
optional arguments:
-h, --help show this help message and exit
-d {info,warning,error,debug,critical}, --debugLevel {info,warning,error,debug,critical}
choice values are from python logging module
-l LOG_PARENT, --logDir LOG_PARENT
Path to log file directory
-f WORK_LIST_FILE, --workListFile WORK_LIST_FILE
File containing one RID per line.
-w WORK_RID, --work-Rid WORK_RID
name or partially qualified path to one work
-p POLL_INTERVAL, --poll-interval POLL_INTERVAL
Seconds between alerts for file.
Repository Parser:
Handles repository alternatives
{s3,fs}
```
Common usage Notes:
`-f/--workListFile` is a file which contains a list of RIDS, **or a list of paths
to work RIDs, in the `fs` mode (see below.)**
`-w/--workRID` is a single work.
- The `--workListFile` and `--workRid` arguments are mutually exclusive
- `-p` is disregarded in this mode. It is an argument to the `manifestFromS3`
- The system logs its activity into a file named _yyyy-MM-DD_HH_MM_PID_.local_v_m_b.log`
in the folder given in the `-l/--logDir` argument (default `/var/log`)
mode.
#### fs Mode Usage
```shell script
❯ manifestforwork fs -h
usage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]} fs
[-h] [-c CONTAINER] [-i IMAGE_FOLDER_NAME]
optional arguments:
-h, --help show this help message and exit
-c CONTAINER, --container CONTAINER
container for all work_Rid archives. Prefixes entries
in --source_rid or --workList
-i IMAGE_FOLDER_NAME, --image-folder-name IMAGE_FOLDER_NAME
name of parent folder of image files
```
Notes:
+ the `-c/--container` defines a path to the RIDS (or the RID subpaths) given.
It is optional. It prepends its value to the WorkRID paths or individual workRIDs
in the input file (`-f`) or to the individual work (`-w`)
In the `-w` or `-f` options above. The system supports user expansion
(`~[uid]/path...` in Linux) and environment variable expansion in both the `-c`
and the `-f` options. That is, the file given in the `-f` option can contain
- Environment variables
- User alias pathnames (`~[user]/...`)
- Fully qualified pathnames
e.g.
```shell script
> pwd
/data
>ls
Works
>ls ~/tmp
/home/me/tmp/Works
> export THISWORK="Works/FromThom"
> cat workList
$WORKS/W12345
~/tmp/$WORKS/W12345
/home/me/tmp/Works/W89012
```
using this list in
```shell script
> manifestforwork -f worklist fs
```
will process files from
- /data/Works/FromThom
- /home/me/tmp/Works/FromThom
- /home/me/tmp/Works/W89012
if the `--container` argument is not given. (`-c` defaults to the current working
directory)
#### s3 mode usage
```shell script
❯ manifestforwork s3 --help
usage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]} s3
[-h] [-b BUCKET]
optional arguments:
-h, --help show this help message and exit
-b BUCKET, --bucket BUCKET
Bucket - source and destination
```
The S3 mode uses a bucket named with the optional `-b/--bucket` argument. The default bucket
is closely held. note that the `--container` argument is not applicable in this mode, and
that if a worklist is given, it must contain only RIDs, not paths.
### manifestFromS3 input
`manifestFromS3` is a mode which waits for a list of RIDs or paths to appear in a well known location
and then processes what it finds there as if it were given in the `--workFile` argument.
All the other parameters are the same - `manifestFromS3` can work on local file system (`fs`)
or on `s3` targets.
- Upload an input list (file name does not matter)
to [s3://manifest.bdrc.org/processing/todo/](s3://manifest.bdrc.org/processing/todo/)
- run `manifestFromS3 -p n [ -l {info,debug,error} {fs [ fs arguments ] | s3 [ -b alternative.bucket]}`
from the command line.
`manifestFromS3` does the following:
1. Moves the input list from `s3://manifest.bdrc.org/processing/input` to `.../processing/inprocess` and changes the
name from <input> to <input-timestamp-instance-id>
2. Runs the processing, uploading a dimensions.json file for each volume in each
RID in the input list.
3. When complete, it moves the file from `.../processing/inprocess` to `../processing/done`
## Installation
### PIP
PyPI contains `bdrc-volume-manifest-builder`
#### Global installation
Install is simply
`sudo python3 -m pip install --upgrade bdrc-volume-manifest-builder` to install system-wide (which is needed to run as a
service)
#### Local installation
To install and run locally, `python3 -m pip install --upgrade bdrc-volume-manifest-builder` will do. Best to do this in
a virtual python environment, see [venv](https://docs.python.org/3/library/venv.html)
When you install `volume-manifest-builder` three entry points are defined in `/usr/local/bin` (or your local
environment):
- `manifestforlist` the command mode, which operates on a list of RIDs
- `manifestforwork` alternate command line mode, which works on one path
- `manifestFromS3` the mode which runs continuously, polling an S3 resource for a file, and processing all the files it
finds.
This is the mode which runs on a service.
## Service
See [Service Readme](service/README.md) for details on installing manifestFromS3 as a service on `systemctl` supporting
platforms.
## Development
`volume-manifest-builder` is hosted
on [BUDA Github volume-manifest-builder](https://github.com/buda-base/volume-manifest-builder/)
- Credentials: you must have the input credentials for a specific AWS user installed to deposit into the archives on s3.
## Usage
`volume-manifest-builder` has two use cases:
+ command line, which allows using a list of workRIDS on a local system
+ service, which continually polls a well-known location, `s3://manifest.bdrc.org/processing/todo/` for a file.
## Building a distribution
Be sure to check PyPI for current release, and update accordingly.
Use [PEP440](https://www.python.org/dev/peps/pep-0440/#post-releases) for naming releases.
### Prerequisites
- `pip3 install wheel`
- `pip3 install twine`
```bash
python3 setup.py bdist_wheel
twine upload dist/<thing you built
```
# Project changelog
| Release | Changes |
|---------|---------------------------------|
| 1.2.9 | Error diags in generateManifest |
| 1.2.8 | Update changelog to readme |
| 1.2.7 | Use bdrc-util logging |
| 1.2.6 | Use BUDA only for resolution |
| | Use BUDA first for resolution |
| 1.2.0 | Sort all output by filename |
Raw data
{
"_id": null,
"home_page": "https://github.com/buda-base/volume-manifest-builder/",
"name": "bdrc-volume-manifest-builder",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "jimk",
"author_email": "jimk@tbrc.org",
"download_url": "",
"platform": null,
"description": "# `bdrc-volume-manifest-builder`\n\n## New in Release 1.1\n\n- Ability to use either file system or S3 for image repository\n\n## Intent\n\nThis project originated as a script to extract image dimensions from a work, and:\n\n+ write the dimensions to a json file\n+ report on images which broke certain rules.\n\n## Implementation\n\nArchival Operations determined that this would be most useful to BUDA to implement as a service which could be injected\ninto the current sync process. To do this, the system needed to:\n\n- be more modular\n- be distributable onto an instance which could be cloned in AWS.\n\nThis branch expands the original tool by:\n\n- Adding the ability to use the eXist db as a source for the image dimensions.\n- Use a pre-built BOM Bill of Materials) to derive the files which should be included in the dimesnsions file\n- Read input from either S3 or local file system repositories\n- Create and save log files.\n- Manage input files.\n- Run as a service on a Linux platform\n\n### Standalone tool\n\nInternal tool to create json manifests of image format data for volumes present in S3 to support the BUDA IIIF\npresentation server.\n\n##### Language\n\nPython 3.7 or newer. It is highly recommended to use `pip` to install, to manage dependencies. If you **must** do it\nyourself, you can refer to `setup.py` for the dependency list.\n\n##### Environment\n\n1. Write access to `/var/log/VolumeManifestBuilder` which must exist.\n2. `systemctl` service management, if you want to use the existing materials to install as a service.\n\n## Usage\n\n### Command line usage\n\nThe command line mode allows running one batch or one work at a time. Arguments\nspecify the parameters, options.\n\nYou also must choose a **repository mode** which determines if the images\nare on a local file system (the `fs` mode), or on an AWS S3 system (the `s3`)\nmode.\n\n#### Common parameters\n\nThis section describes the parameters which are independent of the repository mode.\n\n```shell script\n$ manifestforwork -h\nusage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]}\n\nPrepares an inventory of image dimensions\n\noptional arguments:\n -h, --help show this help message and exit\n -d {info,warning,error,debug,critical}, --debugLevel {info,warning,error,debug,critical}\n choice values are from python logging module\n -l LOG_PARENT, --logDir LOG_PARENT\n Path to log file directory\n -f WORK_LIST_FILE, --workListFile WORK_LIST_FILE\n File containing one RID per line.\n -w WORK_RID, --work-Rid WORK_RID\n name or partially qualified path to one work\n -p POLL_INTERVAL, --poll-interval POLL_INTERVAL\n Seconds between alerts for file.\n\nRepository Parser:\n Handles repository alternatives\n\n\n {s3,fs}\n\n```\n\nCommon usage Notes:\n\n`-f/--workListFile` is a file which contains a list of RIDS, **or a list of paths\nto work RIDs, in the `fs` mode (see below.)**\n`-w/--workRID` is a single work.\n\n- The `--workListFile` and `--workRid` arguments are mutually exclusive\n\n- `-p` is disregarded in this mode. It is an argument to the `manifestFromS3`\n- The system logs its activity into a file named _yyyy-MM-DD_HH_MM_PID_.local_v_m_b.log`\n in the folder given in the `-l/--logDir` argument (default `/var/log`)\n mode.\n\n#### fs Mode Usage\n\n```shell script\n\u276f manifestforwork fs -h\nusage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]} fs\n [-h] [-c CONTAINER] [-i IMAGE_FOLDER_NAME]\n\noptional arguments:\n -h, --help show this help message and exit\n -c CONTAINER, --container CONTAINER\n container for all work_Rid archives. Prefixes entries\n in --source_rid or --workList\n -i IMAGE_FOLDER_NAME, --image-folder-name IMAGE_FOLDER_NAME\n name of parent folder of image files\n```\n\nNotes:\n\n+ the `-c/--container` defines a path to the RIDS (or the RID subpaths) given.\n It is optional. It prepends its value to the WorkRID paths or individual workRIDs\n in the input file (`-f`) or to the individual work (`-w`)\n\nIn the `-w` or `-f` options above. The system supports user expansion\n(`~[uid]/path...` in Linux) and environment variable expansion in both the `-c`\nand the `-f` options. That is, the file given in the `-f` option can contain\n\n- Environment variables\n- User alias pathnames (`~[user]/...`)\n- Fully qualified pathnames\n\ne.g.\n\n```shell script\n> pwd\n/data\n>ls\nWorks\n>ls ~/tmp\n/home/me/tmp/Works\n> export THISWORK=\"Works/FromThom\"\n> cat workList\n$WORKS/W12345\n~/tmp/$WORKS/W12345\n/home/me/tmp/Works/W89012\n```\n\nusing this list in\n\n```shell script\n> manifestforwork -f worklist fs\n```\n\nwill process files from\n\n- /data/Works/FromThom\n- /home/me/tmp/Works/FromThom\n- /home/me/tmp/Works/W89012\n if the `--container` argument is not given. (`-c` defaults to the current working\n directory)\n\n#### s3 mode usage\n\n```shell script\n\u276f manifestforwork s3 --help\nusage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]} s3\n [-h] [-b BUCKET]\n\noptional arguments:\n -h, --help show this help message and exit\n -b BUCKET, --bucket BUCKET\n Bucket - source and destination\n```\n\nThe S3 mode uses a bucket named with the optional `-b/--bucket` argument. The default bucket\nis closely held. note that the `--container` argument is not applicable in this mode, and\nthat if a worklist is given, it must contain only RIDs, not paths.\n\n### manifestFromS3 input\n\n`manifestFromS3` is a mode which waits for a list of RIDs or paths to appear in a well known location\nand then processes what it finds there as if it were given in the `--workFile` argument.\n\nAll the other parameters are the same - `manifestFromS3` can work on local file system (`fs`)\nor on `s3` targets.\n\n- Upload an input list (file name does not matter)\n to [s3://manifest.bdrc.org/processing/todo/](s3://manifest.bdrc.org/processing/todo/)\n- run `manifestFromS3 -p n [ -l {info,debug,error} {fs [ fs arguments ] | s3 [ -b alternative.bucket]}`\n from the command line.\n\n`manifestFromS3` does the following:\n\n1. Moves the input list from `s3://manifest.bdrc.org/processing/input` to `.../processing/inprocess` and changes the\n name from <input> to <input-timestamp-instance-id>\n2. Runs the processing, uploading a dimensions.json file for each volume in each\n RID in the input list.\n3. When complete, it moves the file from `.../processing/inprocess` to `../processing/done`\n\n## Installation\n\n### PIP\n\nPyPI contains `bdrc-volume-manifest-builder`\n\n#### Global installation\n\nInstall is simply\n`sudo python3 -m pip install --upgrade bdrc-volume-manifest-builder` to install system-wide (which is needed to run as a\nservice)\n\n#### Local installation\n\nTo install and run locally, `python3 -m pip install --upgrade bdrc-volume-manifest-builder` will do. Best to do this in\na virtual python environment, see [venv](https://docs.python.org/3/library/venv.html)\n\nWhen you install `volume-manifest-builder` three entry points are defined in `/usr/local/bin` (or your local\nenvironment):\n\n- `manifestforlist` the command mode, which operates on a list of RIDs\n- `manifestforwork` alternate command line mode, which works on one path\n- `manifestFromS3` the mode which runs continuously, polling an S3 resource for a file, and processing all the files it\n finds.\n This is the mode which runs on a service.\n\n## Service\n\nSee [Service Readme](service/README.md) for details on installing manifestFromS3 as a service on `systemctl` supporting\nplatforms.\n\n## Development\n\n`volume-manifest-builder` is hosted\non [BUDA Github volume-manifest-builder](https://github.com/buda-base/volume-manifest-builder/)\n\n- Credentials: you must have the input credentials for a specific AWS user installed to deposit into the archives on s3.\n\n## Usage\n\n`volume-manifest-builder` has two use cases:\n\n+ command line, which allows using a list of workRIDS on a local system\n+ service, which continually polls a well-known location, `s3://manifest.bdrc.org/processing/todo/` for a file.\n\n## Building a distribution\n\nBe sure to check PyPI for current release, and update accordingly.\nUse [PEP440](https://www.python.org/dev/peps/pep-0440/#post-releases) for naming releases.\n\n### Prerequisites\n\n- `pip3 install wheel`\n- `pip3 install twine`\n\n```bash\npython3 setup.py bdist_wheel\ntwine upload dist/<thing you built\n```\n\n# Project changelog\n\n| Release | Changes |\n|---------|---------------------------------|\n| 1.2.9 | Error diags in generateManifest |\n| 1.2.8 | Update changelog to readme |\n| 1.2.7 | Use bdrc-util logging |\n| 1.2.6 | Use BUDA only for resolution |\n| | Use BUDA first for resolution |\n| 1.2.0 | Sort all output by filename |\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Creates manifests for syncd works.",
"version": "1.2.10",
"project_urls": {
"Homepage": "https://github.com/buda-base/volume-manifest-builder/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "498901b5538173181a222a3a31b175de9e28aa16021598ff47ca5f061925ae73",
"md5": "cec1a87da349e7965c4f1c873fe8075b",
"sha256": "03d0080e07d5b1ea3c9137ec76677d35da82b9bd269007aa06d15caff930c374"
},
"downloads": -1,
"filename": "bdrc_volume_manifest_builder-1.2.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cec1a87da349e7965c4f1c873fe8075b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 45806,
"upload_time": "2023-07-28T23:03:35",
"upload_time_iso_8601": "2023-07-28T23:03:35.440226Z",
"url": "https://files.pythonhosted.org/packages/49/89/01b5538173181a222a3a31b175de9e28aa16021598ff47ca5f061925ae73/bdrc_volume_manifest_builder-1.2.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-28 23:03:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "buda-base",
"github_project": "volume-manifest-builder",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "bdrc-volume-manifest-builder"
}