bib2


Namebib2 JSON
Version 1.6.1 PyPI version JSON
download
home_pageNone
SummaryA simple converter of MARC/MARCXML/PICAXML to CSV/TSV/parquet
upload_time2025-07-24 07:46:26
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords marc marcxml pica xml bibliographic data data conversion
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # bibxml2

A simple converter of (possibly compressed) MARCXML/PICAXML to (possibly compressed) CSV/TSV/parquet.

The resulting CSV/TSV/parquet has been designed to be easy to use as a data table, but also to retain all ordering informaation in the original when such is needed. The format is as follows:
`record_number,field_number,subfield_number,field_code,subfield_code,value`

Here, `record_number` identifies the MARC/PICA+ record, while `field_number` and `subfield_number` can be used for more exact filtering / reconstructing the original field structure/order if needed.

For MARC data fields, `ind1` and `ind2` values are reported as separate rows with the `subfield_code` being `Y` or `Z`, but only when non-empty (MARC requires subfield codes to be lowercase, so this should be relatively safe). The MARC leader is output with field code `LDR`.

## Installation

Install from pypi with e.g. `pipx install bibxml2`.

## Usage

```sh
Usage: marcxml2 [OPTIONS] [INPUT]...

  Convert from MARCXML (compressed) input files into (compressed) CSV/TSV/parquet

Options:
  -o, --output TEXT  Output CSV/TSV (compressed) / parquet file  [required]
  --help             Show this message and exit.
```

```sh
Usage: picaxml2csv [OPTIONS] [INPUT]...

  Convert from PICAXML (compressed) input files into (compressed) CSV/TSV/parquet

Options:
  -o, --output TEXT  Output CSV/TSV (compressed) / parquet file  [required]
  --help             Show this message and exit.
```

If the output file extension is `.parquet`, the output will be in parquet format, compressed with `zstd`, and with field typings maximally compatible with common R and Python ecosystems. Otherwise, compressed files will be read/written if the filename ends with an identifier recognised by fsspec. TSV format will be used if the output filename contains `.tsv`, otherwise CSV will be used.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "bib2",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "MARC, MARCXML, PICA XML, bibliographic data, data conversion",
    "author": null,
    "author_email": "Eetu M\u00e4kel\u00e4 <eetu.makela@helsinki.fi>",
    "download_url": "https://files.pythonhosted.org/packages/c9/72/878c46598a3108d406b4d146ed67cdaa23cbf1681da90bee3df197cb76d8/bib2-1.6.1.tar.gz",
    "platform": null,
    "description": "# bibxml2\n\nA simple converter of (possibly compressed) MARCXML/PICAXML to (possibly compressed) CSV/TSV/parquet.\n\nThe resulting CSV/TSV/parquet has been designed to be easy to use as a data table, but also to retain all ordering informaation in the original when such is needed. The format is as follows:\n`record_number,field_number,subfield_number,field_code,subfield_code,value`\n\nHere, `record_number` identifies the MARC/PICA+ record, while `field_number` and `subfield_number` can be used for more exact filtering / reconstructing the original field structure/order if needed.\n\nFor MARC data fields, `ind1` and `ind2` values are reported as separate rows with the `subfield_code` being `Y` or `Z`, but only when non-empty (MARC requires subfield codes to be lowercase, so this should be relatively safe). The MARC leader is output with field code `LDR`.\n\n## Installation\n\nInstall from pypi with e.g. `pipx install bibxml2`.\n\n## Usage\n\n```sh\nUsage: marcxml2 [OPTIONS] [INPUT]...\n\n  Convert from MARCXML (compressed) input files into (compressed) CSV/TSV/parquet\n\nOptions:\n  -o, --output TEXT  Output CSV/TSV (compressed) / parquet file  [required]\n  --help             Show this message and exit.\n```\n\n```sh\nUsage: picaxml2csv [OPTIONS] [INPUT]...\n\n  Convert from PICAXML (compressed) input files into (compressed) CSV/TSV/parquet\n\nOptions:\n  -o, --output TEXT  Output CSV/TSV (compressed) / parquet file  [required]\n  --help             Show this message and exit.\n```\n\nIf the output file extension is `.parquet`, the output will be in parquet format, compressed with `zstd`, and with field typings maximally compatible with common R and Python ecosystems. Otherwise, compressed files will be read/written if the filename ends with an identifier recognised by fsspec. TSV format will be used if the output filename contains `.tsv`, otherwise CSV will be used.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A simple converter of MARC/MARCXML/PICAXML to CSV/TSV/parquet",
    "version": "1.6.1",
    "project_urls": {
        "repository": "https://github.com/hsci-r/bibxml2"
    },
    "split_keywords": [
        "marc",
        " marcxml",
        " pica xml",
        " bibliographic data",
        " data conversion"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1ad7d64781baca9f2e690e2213296c52a8771e217c3ad8d0c70007ee0f1da693",
                "md5": "6b1c367f4c76f9a8a6c0381bbd41cc42",
                "sha256": "549f17d75351d7efdccad26cc7198000e1d02e7435bfb7c97b4ff88c8288edf6"
            },
            "downloads": -1,
            "filename": "bib2-1.6.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6b1c367f4c76f9a8a6c0381bbd41cc42",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 5557,
            "upload_time": "2025-07-24T07:46:24",
            "upload_time_iso_8601": "2025-07-24T07:46:24.313751Z",
            "url": "https://files.pythonhosted.org/packages/1a/d7/d64781baca9f2e690e2213296c52a8771e217c3ad8d0c70007ee0f1da693/bib2-1.6.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c972878c46598a3108d406b4d146ed67cdaa23cbf1681da90bee3df197cb76d8",
                "md5": "f7934862ad2fa7fbe30d647c34ba7950",
                "sha256": "0b49495096895bd4d856851f1785515e774fc97391aa1a09d302a357cb3f7119"
            },
            "downloads": -1,
            "filename": "bib2-1.6.1.tar.gz",
            "has_sig": false,
            "md5_digest": "f7934862ad2fa7fbe30d647c34ba7950",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 218789,
            "upload_time": "2025-07-24T07:46:26",
            "upload_time_iso_8601": "2025-07-24T07:46:26.072613Z",
            "url": "https://files.pythonhosted.org/packages/c9/72/878c46598a3108d406b4d146ed67cdaa23cbf1681da90bee3df197cb76d8/bib2-1.6.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-24 07:46:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hsci-r",
    "github_project": "bibxml2",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "bib2"
}
        
Elapsed time: 0.85360s