# marcgrep
A CLI for searching MARC files like [MARCgrep.pl](https://pusc.it/bib/MARCgrep) but in Python and a bit different syntax.
[marcli](https://github.com/hectorcorrea/marcli) is also a similar project that's faster but a little less flexible.
## Installation
Python 3.9 or later.
```sh
pipx install marcgrep # install globally with pipx
pip install marcgrep # or use pip/pip3
```
## Usage
```sh
# general command format
$ marcgrep OPTIONS FILE.mrc
$ cat FILE.mrc | marcgrep OPTIONS
# full usage information
$ marcgrep -h
Usage: marcgrep [OPTIONS] [FILE]
Find MARC records matching patterns in a file.
Options:
-h, --help Show this message and exit.
-c, --count Count matching records
-i, --include TEXT Include matching records (repeatable)
-e, --exclude TEXT Exclude matching records (repeatable)
-f, --fields TEXT Comma-separated list of fields to print
-l, --limit INTEGER Limit number of records to process
--version Show the version and exit.
```
The `--include` and `--exclude` flags can be used multiple times to specify multiple criteria. They accept a pattern which is a sort of comma-separated filter expression for matching MARC fields. Examples:
```sh
# records with a 780 field
$ marcgrep -i 780 FILE.mrc
# records with Ulysses in the 245 field
$ marcgrep -i '245,Ulysses' FILE.mrc
# titles _without_ "Collected Poems" in the 245 $a subfield
$ marcgrep -e '245,a,Collected Poems' FILE.mrc
# titles with second indicator = 4 that do not start with "The "
$ marcgrep -i '245,,4,,^(?!The )' FILE.mrc
```
The meaning of the pattern's components depends upon their number:
- 1: field, `910` -> 910 is in record
- 2: field and value (regular expression), `100,Lorde` -> 100 contains string "Lorde"
- 3: field, subfield, and value, `506,a,Open Access` -> 506$a contains string "Open Access"
- 4: field, subfield, first indicator, and value, `856,0,u,@lcsh\.gov` -> 856$u with 1st indicator 0 contains string "@lcsh.gov"
- 5: field, subfield, first & second indicators, and value, `245,0,4,a,The Communist Manifesto`
The intention of this syntax is to facilitate searching subfields and field values more easily than MARCgrep.pl since we care about them more often than indicators. To ignore a component but use one of lesser priority, leave the component empty. For instance, `856,s,` refers to records with an `856` field with a `$s` subfield but the trailing comma means we don't care about the subfield's value. The pattern `245,,4,,` refers to records with a `245` field with a second indicator of `4` regardless its subfields or value.
Multiple criteria are combined with logical AND. Multiple `--include` flags is narrower than one, as is an `--include` and an `--exclude`.
## Development
[Poetry](https://python-poetry.org/) is used for development.
- [x] -c count
- [x] -v version
- [x] -l limit (number of records to process)
- [x] -i include criteria (multiple)
- [x] -e exclude criteria (multiple)
- [x] -f fields to print
- [ ] work with MARC leader
- [ ] regex for all components? e.g. `24.,text in any 240-249 field`
- [ ] relatedly, specify _not_ to treat value as a regex?
- [ ] colorize output?
```sh
poetry install # install dependencies
poetry run pytest # run tests
```
Any tag triggers a release to [Test PyPI](https://test.pypi.org/project/marcgrep/). Any tag beginning with the letter `v` requires manual approval to be released to [PyPI](https://pypi.org/project/marcgrep/) and [GitHub](https://github.com/phette23/marcgreppy/releases). There are protection rules on the `pypi` and `testpypi` [environments](https://github.com/phette23/marcgreppy/settings/environments) to this effect, too.
## License
[MIT](https://opensource.org/license/mit) © Eric Phetteplace 2024.
Raw data
{
"_id": null,
"home_page": "https://github.com/phette23/marcgreppy",
"name": "marcgrep",
"maintainer": null,
"docs_url": null,
"requires_python": ">3.8",
"maintainer_email": null,
"keywords": "marc, grep, regex, libraries, cli, metadata, bibliographic, cataloging",
"author": "phette23",
"author_email": "phette23@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f4/4a/44f24f6294d52b0218805a75953ea442a9b5a99a5249b81070f8c58468fd/marcgrep-0.4.0.tar.gz",
"platform": null,
"description": "# marcgrep\n\nA CLI for searching MARC files like [MARCgrep.pl](https://pusc.it/bib/MARCgrep) but in Python and a bit different syntax.\n\n[marcli](https://github.com/hectorcorrea/marcli) is also a similar project that's faster but a little less flexible.\n\n## Installation\n\nPython 3.9 or later.\n\n```sh\npipx install marcgrep # install globally with pipx\npip install marcgrep # or use pip/pip3\n```\n\n## Usage\n\n```sh\n# general command format\n$ marcgrep OPTIONS FILE.mrc\n$ cat FILE.mrc | marcgrep OPTIONS\n# full usage information\n$ marcgrep -h\nUsage: marcgrep [OPTIONS] [FILE]\n\n Find MARC records matching patterns in a file.\n\nOptions:\n -h, --help Show this message and exit.\n -c, --count Count matching records\n -i, --include TEXT Include matching records (repeatable)\n -e, --exclude TEXT Exclude matching records (repeatable)\n -f, --fields TEXT Comma-separated list of fields to print\n -l, --limit INTEGER Limit number of records to process\n --version Show the version and exit.\n```\n\nThe `--include` and `--exclude` flags can be used multiple times to specify multiple criteria. They accept a pattern which is a sort of comma-separated filter expression for matching MARC fields. Examples:\n\n```sh\n# records with a 780 field\n$ marcgrep -i 780 FILE.mrc\n# records with Ulysses in the 245 field\n$ marcgrep -i '245,Ulysses' FILE.mrc\n# titles _without_ \"Collected Poems\" in the 245 $a subfield\n$ marcgrep -e '245,a,Collected Poems' FILE.mrc\n# titles with second indicator = 4 that do not start with \"The \"\n$ marcgrep -i '245,,4,,^(?!The )' FILE.mrc\n```\n\nThe meaning of the pattern's components depends upon their number:\n\n- 1: field, `910` -> 910 is in record\n- 2: field and value (regular expression), `100,Lorde` -> 100 contains string \"Lorde\"\n- 3: field, subfield, and value, `506,a,Open Access` -> 506$a contains string \"Open Access\"\n- 4: field, subfield, first indicator, and value, `856,0,u,@lcsh\\.gov` -> 856$u with 1st indicator 0 contains string \"@lcsh.gov\"\n- 5: field, subfield, first & second indicators, and value, `245,0,4,a,The Communist Manifesto`\n\nThe intention of this syntax is to facilitate searching subfields and field values more easily than MARCgrep.pl since we care about them more often than indicators. To ignore a component but use one of lesser priority, leave the component empty. For instance, `856,s,` refers to records with an `856` field with a `$s` subfield but the trailing comma means we don't care about the subfield's value. The pattern `245,,4,,` refers to records with a `245` field with a second indicator of `4` regardless its subfields or value.\n\nMultiple criteria are combined with logical AND. Multiple `--include` flags is narrower than one, as is an `--include` and an `--exclude`.\n\n## Development\n\n[Poetry](https://python-poetry.org/) is used for development.\n\n- [x] -c count\n- [x] -v version\n- [x] -l limit (number of records to process)\n- [x] -i include criteria (multiple)\n- [x] -e exclude criteria (multiple)\n- [x] -f fields to print\n- [ ] work with MARC leader\n- [ ] regex for all components? e.g. `24.,text in any 240-249 field`\n- [ ] relatedly, specify _not_ to treat value as a regex?\n- [ ] colorize output?\n\n```sh\npoetry install # install dependencies\npoetry run pytest # run tests\n```\n\nAny tag triggers a release to [Test PyPI](https://test.pypi.org/project/marcgrep/). Any tag beginning with the letter `v` requires manual approval to be released to [PyPI](https://pypi.org/project/marcgrep/) and [GitHub](https://github.com/phette23/marcgreppy/releases). There are protection rules on the `pypi` and `testpypi` [environments](https://github.com/phette23/marcgreppy/settings/environments) to this effect, too.\n\n## License\n\n[MIT](https://opensource.org/license/mit) \u00a9 Eric Phetteplace 2024.\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "search MARC files for regex matches",
"version": "0.4.0",
"project_urls": {
"Homepage": "https://github.com/phette23/marcgreppy",
"Issues": "https://github.com/phette23/marcgreppy/issues",
"Repository": "https://github.com/phette23/marcgreppy"
},
"split_keywords": [
"marc",
" grep",
" regex",
" libraries",
" cli",
" metadata",
" bibliographic",
" cataloging"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "de2f494ea9da305803906d1df1cf2acf152a2138484d432a46e6ec25aa2d3813",
"md5": "b305a2353030b235c26b951a535f3068",
"sha256": "24b9663b951fac06d76887e65bf905afb4748cc63da264078aaec4bfb52c9fc6"
},
"downloads": -1,
"filename": "marcgrep-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b305a2353030b235c26b951a535f3068",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">3.8",
"size": 5204,
"upload_time": "2024-04-12T03:42:19",
"upload_time_iso_8601": "2024-04-12T03:42:19.588747Z",
"url": "https://files.pythonhosted.org/packages/de/2f/494ea9da305803906d1df1cf2acf152a2138484d432a46e6ec25aa2d3813/marcgrep-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f44a44f24f6294d52b0218805a75953ea442a9b5a99a5249b81070f8c58468fd",
"md5": "57bc63a536d62c122b73bf5970547c1b",
"sha256": "167d248c865a4f32ca11195d43e13c38fb4a2308f00d2e2d13eb5bf0308706d1"
},
"downloads": -1,
"filename": "marcgrep-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "57bc63a536d62c122b73bf5970547c1b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">3.8",
"size": 4493,
"upload_time": "2024-04-12T03:42:21",
"upload_time_iso_8601": "2024-04-12T03:42:21.144645Z",
"url": "https://files.pythonhosted.org/packages/f4/4a/44f24f6294d52b0218805a75953ea442a9b5a99a5249b81070f8c58468fd/marcgrep-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-12 03:42:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "phette23",
"github_project": "marcgreppy",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "marcgrep"
}