pymods
======
pymods is utility module for working with the Library of Congress's MODS
XML standard: Metadata Description Schema (MODS). It is a utility
wrapper for the lxml module specific to deserializing data out of
MODSXML into python data types.
If you need a module to serialize data into MODSXML, see the other
`pymods by Matt Cordial <https://github.com/cordmata/pymods>`_.
Installing
==========
Recommended:
``pip install pymods``
Using
=====
Basics
------
XML is parsed using the MODSReader class:
``mods_records = pymods.MODSReader('some_file.xml')``
Individual records are stored as an iterator of the MODSRecord object:
.. code:: python
In [5]: for record in mods_records:
....: print(record)
....:
<Element {http://www.loc.gov/mods/v3}mods at 0x47a69f8>
<Element {http://www.loc.gov/mods/v3}mods at 0x47fd908>
<Element {http://www.loc.gov/mods/v3}mods at 0x47fda48>
MODSReader will work with ``mods:modsCollection`` documents, outputs
from OAI-PMH feeds, or individual MODSXML documents with ``mods:mods``
as the root element.
pymods.MODSRecord
^^^^^^^^^^^^^^^^^
The MODSReader class parses each ``mods:mods`` element into a
pymods.MODSRecord object. pymods.MODSRecord is a custom wrapper class
for the lxml.ElementBase class. All children of pymods.Record inherit
the lxml.\_Element and lxml.ElementBase methods.
.. code:: python
In [6]: record = next(pymods.MODSReader('example.xml'))
In [7]: print(record.nsmap)
{'dcterms': 'http://purl.org/dc/terms/', 'xsi': 'http://www.w3.org/2001/XMLSchema-instance', None: 'http://www.loc.gov/mods/v3', 'flvc': 'info:flvc/manifest/v1', 'xlink': 'http://www.w3.org/1999/xlink', 'mods': 'http://www.loc.gov/mods/v3'}
.. code:: python
In [8]: for child in record.iterdescendants():
....: print(child.tag)
{http://www.loc.gov/mods/v3}identifier
{http://www.loc.gov/mods/v3}extension
{info:flvc/manifest/v1}flvc
{info:flvc/manifest/v1}owningInstitution
{info:flvc/manifest/v1}submittingInstitution
{http://www.loc.gov/mods/v3}titleInfo
{http://www.loc.gov/mods/v3}title
{http://www.loc.gov/mods/v3}name
{http://www.loc.gov/mods/v3}namePart
{http://www.loc.gov/mods/v3}role
{http://www.loc.gov/mods/v3}roleTerm
{http://www.loc.gov/mods/v3}roleTerm
{http://www.loc.gov/mods/v3}typeOfResource
{http://www.loc.gov/mods/v3}genre
...
Methods
-------
All functions return data either as a string, list, list of named
tuples. See the `API documentation <http://pymods.readthedocs.io>`_ or appropriate docstring for details.
.. code:: python
>>> record.genre?
Type: property
String form: <property object at 0x0000000004812C78>
Docstring:
Accesses mods:genre element.
:return: A list containing Genre elements with term, authority,
authorityURI, and valueURI attributes.
Examples
========
Importing
.. code:: python
from pymods import MODSReader, MODSRecord
Parsing a file
.. code:: python
In [10]: mods = MODSReader('example.xml')
In [11]: for record in mods:
....: print(record.dates)
....:
[Date(text='1966-12-08', type='{http://www.loc.gov/mods/v3}dateCreated')]
None
[Date(text='1987-02', type='{http://www.loc.gov/mods/v3}dateIssued')]
Simple tasks
------------
Generating a title list
.. code:: python
In [14]: for record in mods:
....: print(record.titles)
....:
['Fire Line System']
['$93,668.90. One Mill Tax Apportioned by Various Ways Proposed']
['Broward NOW News: National Organization for Women, February 1987']
Creating a subject list
.. code:: python
In [17]: for record in mods:
....: for subject in record.subjects:
....: print(subject.text)
....:
Concert halls
Architecture
Architectural drawings
Structural systems
Structural systems drawings
Structural drawings
Safety equipment
Construction
Mechanics
Structural optimization
Architectural design
Fire prevention--Safety measures
Taxes
Tax payers
Tax collection
Organizations
Feminism
Sex discrimination against women
Women's rights
Equal rights amendments
Women--Societies and clubs
National Organization for Women
More complex tasks
------------------
Creating a list of subject URI's only for LCSH subjects
.. code:: python
In [18]: for record in mods:
....: for subject in record.subjects:
....: if 'lcsh' == subject.authority:
....: print(subject.uri)
....:
http://id.loc.gov/authorities/subjects/sh85082767
http://id.loc.gov/authorities/subjects/sh88004614
http://id.loc.gov/authorities/subjects/sh85132810
http://id.loc.gov/authorities/subjects/sh85147343
Get URLs for objects using a No Copyright US rightsstatement.org URI
.. code:: python
In [23]: for record in mods:
....: for rights_elem in record.rights
....: if rights_elem.uri == 'http://rightsstatements.org/vocab/NoC-US/1.0/':
....: print(record.purl)
....:
http://purl.flvc.org/fsu/fd/FSU_MSS0204_B01_F10_09
http://purl.flvc.org/fsu/fd/FSU_MSS2008003_B18_F01_004
.. |Build Status| image:: https://travis-ci.org/mrmiguez/pymods.svg?branch=master
:target: https://travis-ci.org/mrmiguez/pymods
Raw data
{
"_id": null,
"home_page": "https://github.com/mrmiguez/pymods",
"name": "pymods",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "MODS metadata xml",
"author": "Matthew Miguez",
"author_email": "r.m.miguez@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/8d/7f/a1e2b88ce10cb6c5dcff3c378b74a29fddc91bd839c3d96c71e16c4a28a5/pymods-2.0.12.tar.gz",
"platform": "",
"description": "pymods\n======\n\npymods is utility module for working with the Library of Congress's MODS\nXML standard: Metadata Description Schema (MODS). It is a utility\nwrapper for the lxml module specific to deserializing data out of\nMODSXML into python data types.\n\nIf you need a module to serialize data into MODSXML, see the other\n`pymods by Matt Cordial <https://github.com/cordmata/pymods>`_.\n\nInstalling\n==========\n\nRecommended:\n\n``pip install pymods``\n\nUsing\n=====\n\nBasics\n------\n\nXML is parsed using the MODSReader class:\n\n``mods_records = pymods.MODSReader('some_file.xml')``\n\nIndividual records are stored as an iterator of the MODSRecord object:\n\n.. code:: python\n\n In [5]: for record in mods_records:\n ....: print(record)\n ....:\n <Element {http://www.loc.gov/mods/v3}mods at 0x47a69f8>\n <Element {http://www.loc.gov/mods/v3}mods at 0x47fd908>\n <Element {http://www.loc.gov/mods/v3}mods at 0x47fda48>\n\nMODSReader will work with ``mods:modsCollection`` documents, outputs\nfrom OAI-PMH feeds, or individual MODSXML documents with ``mods:mods``\nas the root element.\n\npymods.MODSRecord\n^^^^^^^^^^^^^^^^^\n\nThe MODSReader class parses each ``mods:mods`` element into a\npymods.MODSRecord object. pymods.MODSRecord is a custom wrapper class\nfor the lxml.ElementBase class. All children of pymods.Record inherit\nthe lxml.\\_Element and lxml.ElementBase methods.\n\n.. code:: python\n\n In [6]: record = next(pymods.MODSReader('example.xml'))\n In [7]: print(record.nsmap)\n {'dcterms': 'http://purl.org/dc/terms/', 'xsi': 'http://www.w3.org/2001/XMLSchema-instance', None: 'http://www.loc.gov/mods/v3', 'flvc': 'info:flvc/manifest/v1', 'xlink': 'http://www.w3.org/1999/xlink', 'mods': 'http://www.loc.gov/mods/v3'}\n\n.. code:: python\n\n In [8]: for child in record.iterdescendants():\n ....: print(child.tag)\n\n {http://www.loc.gov/mods/v3}identifier\n {http://www.loc.gov/mods/v3}extension\n {info:flvc/manifest/v1}flvc\n {info:flvc/manifest/v1}owningInstitution\n {info:flvc/manifest/v1}submittingInstitution\n {http://www.loc.gov/mods/v3}titleInfo\n {http://www.loc.gov/mods/v3}title\n {http://www.loc.gov/mods/v3}name\n {http://www.loc.gov/mods/v3}namePart\n {http://www.loc.gov/mods/v3}role\n {http://www.loc.gov/mods/v3}roleTerm\n {http://www.loc.gov/mods/v3}roleTerm\n {http://www.loc.gov/mods/v3}typeOfResource\n {http://www.loc.gov/mods/v3}genre\n ...\n\nMethods\n-------\n\nAll functions return data either as a string, list, list of named\ntuples. See the `API documentation <http://pymods.readthedocs.io>`_ or appropriate docstring for details.\n\n.. code:: python\n\n >>> record.genre?\n Type: property\n String form: <property object at 0x0000000004812C78>\n Docstring:\n Accesses mods:genre element.\n :return: A list containing Genre elements with term, authority,\n authorityURI, and valueURI attributes.\n\nExamples\n========\n\nImporting\n\n.. code:: python\n\n from pymods import MODSReader, MODSRecord\n\nParsing a file\n\n.. code:: python\n\n In [10]: mods = MODSReader('example.xml')\n In [11]: for record in mods:\n ....: print(record.dates)\n ....:\n [Date(text='1966-12-08', type='{http://www.loc.gov/mods/v3}dateCreated')]\n None\n [Date(text='1987-02', type='{http://www.loc.gov/mods/v3}dateIssued')]\n\nSimple tasks\n------------\n\nGenerating a title list\n\n.. code:: python\n\n In [14]: for record in mods:\n ....: print(record.titles)\n ....:\n ['Fire Line System']\n ['$93,668.90. One Mill Tax Apportioned by Various Ways Proposed']\n ['Broward NOW News: National Organization for Women, February 1987']\n\nCreating a subject list\n\n.. code:: python\n\n In [17]: for record in mods:\n ....: for subject in record.subjects:\n ....: print(subject.text)\n ....:\n Concert halls\n Architecture\n Architectural drawings\n Structural systems\n Structural systems drawings\n Structural drawings\n Safety equipment\n Construction\n Mechanics\n Structural optimization\n Architectural design\n Fire prevention--Safety measures\n Taxes\n Tax payers\n Tax collection\n Organizations\n Feminism\n Sex discrimination against women\n Women's rights\n Equal rights amendments\n Women--Societies and clubs\n National Organization for Women\n\nMore complex tasks\n------------------\n\nCreating a list of subject URI's only for LCSH subjects\n\n.. code:: python\n\n In [18]: for record in mods:\n ....: for subject in record.subjects:\n ....: if 'lcsh' == subject.authority:\n ....: print(subject.uri)\n ....:\n http://id.loc.gov/authorities/subjects/sh85082767\n http://id.loc.gov/authorities/subjects/sh88004614\n http://id.loc.gov/authorities/subjects/sh85132810\n http://id.loc.gov/authorities/subjects/sh85147343\n\nGet URLs for objects using a No Copyright US rightsstatement.org URI\n\n.. code:: python\n\n In [23]: for record in mods:\n ....: for rights_elem in record.rights\n ....: if rights_elem.uri == 'http://rightsstatements.org/vocab/NoC-US/1.0/':\n ....: print(record.purl)\n ....:\n http://purl.flvc.org/fsu/fd/FSU_MSS0204_B01_F10_09\n http://purl.flvc.org/fsu/fd/FSU_MSS2008003_B18_F01_004\n\n.. |Build Status| image:: https://travis-ci.org/mrmiguez/pymods.svg?branch=master\n :target: https://travis-ci.org/mrmiguez/pymods\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Utility class wrapping lxml for reading data from MODS v3.4 XML metadata into Python data types.",
"version": "2.0.12",
"split_keywords": [
"mods",
"metadata",
"xml"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e2485330359492e48becd20abe21b92538dbccaff4337f3617a760c2eec4ff67",
"md5": "0dfa6e58cc85c696f96a5e0752673a94",
"sha256": "02d599dd3627efe40fb7e313e1b11cb93ee3f74a7f7e76d07b296d8477b0d9ba"
},
"downloads": -1,
"filename": "pymods-2.0.12-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0dfa6e58cc85c696f96a5e0752673a94",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 17779,
"upload_time": "2021-10-25T12:28:50",
"upload_time_iso_8601": "2021-10-25T12:28:50.144940Z",
"url": "https://files.pythonhosted.org/packages/e2/48/5330359492e48becd20abe21b92538dbccaff4337f3617a760c2eec4ff67/pymods-2.0.12-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8d7fa1e2b88ce10cb6c5dcff3c378b74a29fddc91bd839c3d96c71e16c4a28a5",
"md5": "68ffc0b3297eb933c8039d9ed2afcfeb",
"sha256": "d92f8e9298bab47a424ac859e81a6e66124563a18856e9fc346fb53df97667fd"
},
"downloads": -1,
"filename": "pymods-2.0.12.tar.gz",
"has_sig": false,
"md5_digest": "68ffc0b3297eb933c8039d9ed2afcfeb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 16536,
"upload_time": "2021-10-25T12:28:52",
"upload_time_iso_8601": "2021-10-25T12:28:52.879029Z",
"url": "https://files.pythonhosted.org/packages/8d/7f/a1e2b88ce10cb6c5dcff3c378b74a29fddc91bd839c3d96c71e16c4a28a5/pymods-2.0.12.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-10-25 12:28:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "mrmiguez",
"github_project": "pymods",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "pymods"
}