marcgrep


Namemarcgrep JSON
Version 0.4.0 PyPI version JSON
download
home_pagehttps://github.com/phette23/marcgreppy
Summarysearch MARC files for regex matches
upload_time2024-04-12 03:42:21
maintainerNone
docs_urlNone
authorphette23
requires_python>3.8
licenseMIT
keywords marc grep regex libraries cli metadata bibliographic cataloging
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # marcgrep

A CLI for searching MARC files like [MARCgrep.pl](https://pusc.it/bib/MARCgrep) but in Python and a bit different syntax.

[marcli](https://github.com/hectorcorrea/marcli) is also a similar project that's faster but a little less flexible.

## Installation

Python 3.9 or later.

```sh
pipx install marcgrep # install globally with pipx
pip install marcgrep # or use pip/pip3
```

## Usage

```sh
# general command format
$ marcgrep OPTIONS FILE.mrc
$ cat FILE.mrc | marcgrep OPTIONS
# full usage information
$ marcgrep -h
Usage: marcgrep [OPTIONS] [FILE]

  Find MARC records matching patterns in a file.

Options:
  -h, --help           Show this message and exit.
  -c, --count          Count matching records
  -i, --include TEXT   Include matching records (repeatable)
  -e, --exclude TEXT   Exclude matching records (repeatable)
  -f, --fields TEXT    Comma-separated list of fields to print
  -l, --limit INTEGER  Limit number of records to process
  --version            Show the version and exit.
```

The `--include` and `--exclude` flags can be used multiple times to specify multiple criteria. They accept a pattern which is a sort of comma-separated filter expression for matching MARC fields. Examples:

```sh
# records with a 780 field
$ marcgrep -i 780 FILE.mrc
# records with Ulysses in the 245 field
$ marcgrep -i '245,Ulysses' FILE.mrc
# titles _without_ "Collected Poems" in the 245 $a subfield
$ marcgrep -e '245,a,Collected Poems' FILE.mrc
# titles with second indicator = 4 that do not start with "The "
$ marcgrep -i '245,,4,,^(?!The )' FILE.mrc
```

The meaning of the pattern's components depends upon their number:

- 1: field, `910` -> 910 is in record
- 2: field and value (regular expression), `100,Lorde` -> 100 contains string "Lorde"
- 3: field, subfield, and value, `506,a,Open Access` -> 506$a contains string "Open Access"
- 4: field, subfield, first indicator, and value, `856,0,u,@lcsh\.gov` -> 856$u with 1st indicator 0 contains string "@lcsh.gov"
- 5: field, subfield, first & second indicators, and value, `245,0,4,a,The Communist Manifesto`

The intention of this syntax is to facilitate searching subfields and field values more easily than MARCgrep.pl since we care about them more often than indicators. To ignore a component but use one of lesser priority, leave the component empty. For instance, `856,s,` refers to records with an `856` field with a `$s` subfield but the trailing comma means we don't care about the subfield's value. The pattern `245,,4,,` refers to records with a `245` field with a second indicator of `4` regardless its subfields or value.

Multiple criteria are combined with logical AND. Multiple `--include` flags is narrower than one, as is an `--include` and an `--exclude`.

## Development

[Poetry](https://python-poetry.org/) is used for development.

- [x] -c count
- [x] -v version
- [x] -l limit (number of records to process)
- [x] -i include criteria (multiple)
- [x] -e exclude criteria (multiple)
- [x] -f fields to print
- [ ] work with MARC leader
- [ ] regex for all components? e.g. `24.,text in any 240-249 field`
- [ ] relatedly, specify _not_ to treat value as a regex?
- [ ] colorize output?

```sh
poetry install # install dependencies
poetry run pytest # run tests
```

Any tag triggers a release to [Test PyPI](https://test.pypi.org/project/marcgrep/). Any tag beginning with the letter `v` requires manual approval to be released to [PyPI](https://pypi.org/project/marcgrep/) and [GitHub](https://github.com/phette23/marcgreppy/releases). There are protection rules on the `pypi` and `testpypi` [environments](https://github.com/phette23/marcgreppy/settings/environments) to this effect, too.

## License

[MIT](https://opensource.org/license/mit) © Eric Phetteplace 2024.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/phette23/marcgreppy",
    "name": "marcgrep",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">3.8",
    "maintainer_email": null,
    "keywords": "marc, grep, regex, libraries, cli, metadata, bibliographic, cataloging",
    "author": "phette23",
    "author_email": "phette23@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f4/4a/44f24f6294d52b0218805a75953ea442a9b5a99a5249b81070f8c58468fd/marcgrep-0.4.0.tar.gz",
    "platform": null,
    "description": "# marcgrep\n\nA CLI for searching MARC files like [MARCgrep.pl](https://pusc.it/bib/MARCgrep) but in Python and a bit different syntax.\n\n[marcli](https://github.com/hectorcorrea/marcli) is also a similar project that's faster but a little less flexible.\n\n## Installation\n\nPython 3.9 or later.\n\n```sh\npipx install marcgrep # install globally with pipx\npip install marcgrep # or use pip/pip3\n```\n\n## Usage\n\n```sh\n# general command format\n$ marcgrep OPTIONS FILE.mrc\n$ cat FILE.mrc | marcgrep OPTIONS\n# full usage information\n$ marcgrep -h\nUsage: marcgrep [OPTIONS] [FILE]\n\n  Find MARC records matching patterns in a file.\n\nOptions:\n  -h, --help           Show this message and exit.\n  -c, --count          Count matching records\n  -i, --include TEXT   Include matching records (repeatable)\n  -e, --exclude TEXT   Exclude matching records (repeatable)\n  -f, --fields TEXT    Comma-separated list of fields to print\n  -l, --limit INTEGER  Limit number of records to process\n  --version            Show the version and exit.\n```\n\nThe `--include` and `--exclude` flags can be used multiple times to specify multiple criteria. They accept a pattern which is a sort of comma-separated filter expression for matching MARC fields. Examples:\n\n```sh\n# records with a 780 field\n$ marcgrep -i 780 FILE.mrc\n# records with Ulysses in the 245 field\n$ marcgrep -i '245,Ulysses' FILE.mrc\n# titles _without_ \"Collected Poems\" in the 245 $a subfield\n$ marcgrep -e '245,a,Collected Poems' FILE.mrc\n# titles with second indicator = 4 that do not start with \"The \"\n$ marcgrep -i '245,,4,,^(?!The )' FILE.mrc\n```\n\nThe meaning of the pattern's components depends upon their number:\n\n- 1: field, `910` -> 910 is in record\n- 2: field and value (regular expression), `100,Lorde` -> 100 contains string \"Lorde\"\n- 3: field, subfield, and value, `506,a,Open Access` -> 506$a contains string \"Open Access\"\n- 4: field, subfield, first indicator, and value, `856,0,u,@lcsh\\.gov` -> 856$u with 1st indicator 0 contains string \"@lcsh.gov\"\n- 5: field, subfield, first & second indicators, and value, `245,0,4,a,The Communist Manifesto`\n\nThe intention of this syntax is to facilitate searching subfields and field values more easily than MARCgrep.pl since we care about them more often than indicators. To ignore a component but use one of lesser priority, leave the component empty. For instance, `856,s,` refers to records with an `856` field with a `$s` subfield but the trailing comma means we don't care about the subfield's value. The pattern `245,,4,,` refers to records with a `245` field with a second indicator of `4` regardless its subfields or value.\n\nMultiple criteria are combined with logical AND. Multiple `--include` flags is narrower than one, as is an `--include` and an `--exclude`.\n\n## Development\n\n[Poetry](https://python-poetry.org/) is used for development.\n\n- [x] -c count\n- [x] -v version\n- [x] -l limit (number of records to process)\n- [x] -i include criteria (multiple)\n- [x] -e exclude criteria (multiple)\n- [x] -f fields to print\n- [ ] work with MARC leader\n- [ ] regex for all components? e.g. `24.,text in any 240-249 field`\n- [ ] relatedly, specify _not_ to treat value as a regex?\n- [ ] colorize output?\n\n```sh\npoetry install # install dependencies\npoetry run pytest # run tests\n```\n\nAny tag triggers a release to [Test PyPI](https://test.pypi.org/project/marcgrep/). Any tag beginning with the letter `v` requires manual approval to be released to [PyPI](https://pypi.org/project/marcgrep/) and [GitHub](https://github.com/phette23/marcgreppy/releases). There are protection rules on the `pypi` and `testpypi` [environments](https://github.com/phette23/marcgreppy/settings/environments) to this effect, too.\n\n## License\n\n[MIT](https://opensource.org/license/mit) \u00a9 Eric Phetteplace 2024.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "search MARC files for regex matches",
    "version": "0.4.0",
    "project_urls": {
        "Homepage": "https://github.com/phette23/marcgreppy",
        "Issues": "https://github.com/phette23/marcgreppy/issues",
        "Repository": "https://github.com/phette23/marcgreppy"
    },
    "split_keywords": [
        "marc",
        " grep",
        " regex",
        " libraries",
        " cli",
        " metadata",
        " bibliographic",
        " cataloging"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "de2f494ea9da305803906d1df1cf2acf152a2138484d432a46e6ec25aa2d3813",
                "md5": "b305a2353030b235c26b951a535f3068",
                "sha256": "24b9663b951fac06d76887e65bf905afb4748cc63da264078aaec4bfb52c9fc6"
            },
            "downloads": -1,
            "filename": "marcgrep-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b305a2353030b235c26b951a535f3068",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">3.8",
            "size": 5204,
            "upload_time": "2024-04-12T03:42:19",
            "upload_time_iso_8601": "2024-04-12T03:42:19.588747Z",
            "url": "https://files.pythonhosted.org/packages/de/2f/494ea9da305803906d1df1cf2acf152a2138484d432a46e6ec25aa2d3813/marcgrep-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f44a44f24f6294d52b0218805a75953ea442a9b5a99a5249b81070f8c58468fd",
                "md5": "57bc63a536d62c122b73bf5970547c1b",
                "sha256": "167d248c865a4f32ca11195d43e13c38fb4a2308f00d2e2d13eb5bf0308706d1"
            },
            "downloads": -1,
            "filename": "marcgrep-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "57bc63a536d62c122b73bf5970547c1b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">3.8",
            "size": 4493,
            "upload_time": "2024-04-12T03:42:21",
            "upload_time_iso_8601": "2024-04-12T03:42:21.144645Z",
            "url": "https://files.pythonhosted.org/packages/f4/4a/44f24f6294d52b0218805a75953ea442a9b5a99a5249b81070f8c58468fd/marcgrep-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-12 03:42:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "phette23",
    "github_project": "marcgreppy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "marcgrep"
}
        
Elapsed time: 0.41200s