# bibxml2
A simple converter of (possibly compressed) MARCXML/PICAXML to (possibly compressed) CSV/TSV/parquet.
The resulting CSV/TSV/parquet has been designed to be easy to use as a data table, but also to retain all ordering informaation in the original when such is needed. The format is as follows:
`record_number,field_number,subfield_number,field_code,subfield_code,value`
Here, `record_number` identifies the MARC/PICA+ record, while `field_number` and `subfield_number` can be used for more exact filtering / reconstructing the original field structure/order if needed.
For MARC data fields, `ind1` and `ind2` values are reported as separate rows with the `subfield_code` being `Y` or `Z`, but only when non-empty (MARC requires subfield codes to be lowercase, so this should be relatively safe). The MARC leader is output with field code `LDR`.
## Installation
Install from pypi with e.g. `pipx install bibxml2`.
## Usage
```sh
Usage: marcxml2 [OPTIONS] [INPUT]...
Convert from MARCXML (compressed) input files into (compressed) CSV/TSV/parquet
Options:
-o, --output TEXT Output CSV/TSV (compressed) / parquet file [required]
--help Show this message and exit.
```
```sh
Usage: picaxml2csv [OPTIONS] [INPUT]...
Convert from PICAXML (compressed) input files into (compressed) CSV/TSV/parquet
Options:
-o, --output TEXT Output CSV/TSV (compressed) / parquet file [required]
--help Show this message and exit.
```
If the output file extension is `.parquet`, the output will be in parquet format, compressed with `zstd`, and with field typings maximally compatible with common R and Python ecosystems. Otherwise, compressed files will be read/written if the filename ends with an identifier recognised by fsspec. TSV format will be used if the output filename contains `.tsv`, otherwise CSV will be used.
Raw data
{
"_id": null,
"home_page": null,
"name": "bib2",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "MARC, MARCXML, PICA XML, bibliographic data, data conversion",
"author": null,
"author_email": "Eetu M\u00e4kel\u00e4 <eetu.makela@helsinki.fi>",
"download_url": "https://files.pythonhosted.org/packages/67/2d/b2e0c8577b5eab952fa66300a6a545ecf47d7c9b33b614af6f0962c81a29/bib2-1.5.4.tar.gz",
"platform": null,
"description": "# bibxml2\n\nA simple converter of (possibly compressed) MARCXML/PICAXML to (possibly compressed) CSV/TSV/parquet.\n\nThe resulting CSV/TSV/parquet has been designed to be easy to use as a data table, but also to retain all ordering informaation in the original when such is needed. The format is as follows:\n`record_number,field_number,subfield_number,field_code,subfield_code,value`\n\nHere, `record_number` identifies the MARC/PICA+ record, while `field_number` and `subfield_number` can be used for more exact filtering / reconstructing the original field structure/order if needed.\n\nFor MARC data fields, `ind1` and `ind2` values are reported as separate rows with the `subfield_code` being `Y` or `Z`, but only when non-empty (MARC requires subfield codes to be lowercase, so this should be relatively safe). The MARC leader is output with field code `LDR`.\n\n## Installation\n\nInstall from pypi with e.g. `pipx install bibxml2`.\n\n## Usage\n\n```sh\nUsage: marcxml2 [OPTIONS] [INPUT]...\n\n Convert from MARCXML (compressed) input files into (compressed) CSV/TSV/parquet\n\nOptions:\n -o, --output TEXT Output CSV/TSV (compressed) / parquet file [required]\n --help Show this message and exit.\n```\n\n```sh\nUsage: picaxml2csv [OPTIONS] [INPUT]...\n\n Convert from PICAXML (compressed) input files into (compressed) CSV/TSV/parquet\n\nOptions:\n -o, --output TEXT Output CSV/TSV (compressed) / parquet file [required]\n --help Show this message and exit.\n```\n\nIf the output file extension is `.parquet`, the output will be in parquet format, compressed with `zstd`, and with field typings maximally compatible with common R and Python ecosystems. Otherwise, compressed files will be read/written if the filename ends with an identifier recognised by fsspec. TSV format will be used if the output filename contains `.tsv`, otherwise CSV will be used.\n",
"bugtrack_url": null,
"license": null,
"summary": "A simple converter of MARC/MARCXML/PICAXML to CSV/TSV/parquet",
"version": "1.5.4",
"project_urls": {
"repository": "https://github.com/hsci-r/bibxml2"
},
"split_keywords": [
"marc",
" marcxml",
" pica xml",
" bibliographic data",
" data conversion"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "5673ad155a188383aa386ca68884c87af3bdb772c455a3b0a140a0f8201418a7",
"md5": "86ab6051b3deba80f5895dfafb1f205c",
"sha256": "4e6249132d26521c717e01f0bd4507259487e3d95d29261cbb08a4ce63cd1b9f"
},
"downloads": -1,
"filename": "bib2-1.5.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "86ab6051b3deba80f5895dfafb1f205c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 4857,
"upload_time": "2025-07-08T10:15:17",
"upload_time_iso_8601": "2025-07-08T10:15:17.966398Z",
"url": "https://files.pythonhosted.org/packages/56/73/ad155a188383aa386ca68884c87af3bdb772c455a3b0a140a0f8201418a7/bib2-1.5.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "672db2e0c8577b5eab952fa66300a6a545ecf47d7c9b33b614af6f0962c81a29",
"md5": "12375aa086f6564570221b48409b124f",
"sha256": "4e585111a08792ab97906a1bd7837396650a4f3ef3abae1dfb7e4a11a6ebfe26"
},
"downloads": -1,
"filename": "bib2-1.5.4.tar.gz",
"has_sig": false,
"md5_digest": "12375aa086f6564570221b48409b124f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 126083,
"upload_time": "2025-07-08T10:15:19",
"upload_time_iso_8601": "2025-07-08T10:15:19.609935Z",
"url": "https://files.pythonhosted.org/packages/67/2d/b2e0c8577b5eab952fa66300a6a545ecf47d7c9b33b614af6f0962c81a29/bib2-1.5.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-08 10:15:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hsci-r",
"github_project": "bibxml2",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "bib2"
}