pymods


Namepymods JSON
Version 2.0.12 PyPI version JSON
download
home_pagehttps://github.com/mrmiguez/pymods
SummaryUtility class wrapping lxml for reading data from MODS v3.4 XML metadata into Python data types.
upload_time2021-10-25 12:28:52
maintainer
docs_urlNone
authorMatthew Miguez
requires_python
licenseMIT
keywords mods metadata xml
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            pymods
======

pymods is utility module for working with the Library of Congress's MODS
XML standard: Metadata Description Schema (MODS). It is a utility
wrapper for the lxml module specific to deserializing data out of
MODSXML into python data types.

If you need a module to serialize data into MODSXML, see the other
`pymods by Matt Cordial <https://github.com/cordmata/pymods>`_.

Installing
==========

Recommended:

``pip install pymods``

Using
=====

Basics
------

XML is parsed using the MODSReader class:

``mods_records = pymods.MODSReader('some_file.xml')``

Individual records are stored as an iterator of the MODSRecord object:

.. code:: python

    In [5]: for record in mods_records:
      ....:    print(record)
      ....:
    <Element {http://www.loc.gov/mods/v3}mods at 0x47a69f8>
    <Element {http://www.loc.gov/mods/v3}mods at 0x47fd908>
    <Element {http://www.loc.gov/mods/v3}mods at 0x47fda48>

MODSReader will work with ``mods:modsCollection`` documents, outputs
from OAI-PMH feeds, or individual MODSXML documents with ``mods:mods``
as the root element.

pymods.MODSRecord
^^^^^^^^^^^^^^^^^

The MODSReader class parses each ``mods:mods`` element into a
pymods.MODSRecord object. pymods.MODSRecord is a custom wrapper class
for the lxml.ElementBase class. All children of pymods.Record inherit
the lxml.\_Element and lxml.ElementBase methods.

.. code:: python

    In [6]: record = next(pymods.MODSReader('example.xml'))
    In [7]: print(record.nsmap)
    {'dcterms': 'http://purl.org/dc/terms/', 'xsi': 'http://www.w3.org/2001/XMLSchema-instance', None: 'http://www.loc.gov/mods/v3', 'flvc': 'info:flvc/manifest/v1', 'xlink': 'http://www.w3.org/1999/xlink', 'mods': 'http://www.loc.gov/mods/v3'}

.. code:: python

    In [8]: for child in record.iterdescendants():
      ....:    print(child.tag)

    {http://www.loc.gov/mods/v3}identifier
    {http://www.loc.gov/mods/v3}extension
    {info:flvc/manifest/v1}flvc
    {info:flvc/manifest/v1}owningInstitution
    {info:flvc/manifest/v1}submittingInstitution
    {http://www.loc.gov/mods/v3}titleInfo
    {http://www.loc.gov/mods/v3}title
    {http://www.loc.gov/mods/v3}name
    {http://www.loc.gov/mods/v3}namePart
    {http://www.loc.gov/mods/v3}role
    {http://www.loc.gov/mods/v3}roleTerm
    {http://www.loc.gov/mods/v3}roleTerm
    {http://www.loc.gov/mods/v3}typeOfResource
    {http://www.loc.gov/mods/v3}genre
    ...

Methods
-------

All functions return data either as a string, list, list of named
tuples. See the `API documentation <http://pymods.readthedocs.io>`_ or appropriate docstring for details.

.. code:: python

    >>> record.genre?
    Type:        property
    String form: <property object at 0x0000000004812C78>
    Docstring:
    Accesses mods:genre element.
    :return: A list containing Genre elements with term, authority,
        authorityURI, and valueURI attributes.

Examples
========

Importing

.. code:: python

    from pymods import MODSReader, MODSRecord

Parsing a file

.. code:: python

    In [10]: mods = MODSReader('example.xml')
    In [11]: for record in mods:
       ....:    print(record.dates)
       ....:
    [Date(text='1966-12-08', type='{http://www.loc.gov/mods/v3}dateCreated')]
    None
    [Date(text='1987-02', type='{http://www.loc.gov/mods/v3}dateIssued')]

Simple tasks
------------

Generating a title list

.. code:: python

    In [14]: for record in mods:
       ....:     print(record.titles)
       ....:
    ['Fire Line System']
    ['$93,668.90. One Mill Tax Apportioned by Various Ways Proposed']
    ['Broward NOW News: National Organization for Women, February 1987']

Creating a subject list

.. code:: python

    In [17]: for record in mods:
       ....:     for subject in record.subjects:
       ....:         print(subject.text)
       ....:
    Concert halls
    Architecture
    Architectural drawings
    Structural systems
    Structural systems drawings
    Structural drawings
    Safety equipment
    Construction
    Mechanics
    Structural optimization
    Architectural design
    Fire prevention--Safety measures
    Taxes
    Tax payers
    Tax collection
    Organizations
    Feminism
    Sex discrimination against women
    Women's rights
    Equal rights amendments
    Women--Societies and clubs
    National Organization for Women

More complex tasks
------------------

Creating a list of subject URI's only for LCSH subjects

.. code:: python

    In [18]: for record in mods:
       ....:     for subject in record.subjects:
       ....:         if 'lcsh' == subject.authority:
       ....:             print(subject.uri)
       ....:
    http://id.loc.gov/authorities/subjects/sh85082767
    http://id.loc.gov/authorities/subjects/sh88004614
    http://id.loc.gov/authorities/subjects/sh85132810
    http://id.loc.gov/authorities/subjects/sh85147343

Get URLs for objects using a No Copyright US rightsstatement.org URI

.. code:: python

    In [23]: for record in mods:
       ....:     for rights_elem in record.rights
       ....:         if rights_elem.uri == 'http://rightsstatements.org/vocab/NoC-US/1.0/':
       ....:             print(record.purl)
       ....:
    http://purl.flvc.org/fsu/fd/FSU_MSS0204_B01_F10_09
    http://purl.flvc.org/fsu/fd/FSU_MSS2008003_B18_F01_004

.. |Build Status| image:: https://travis-ci.org/mrmiguez/pymods.svg?branch=master
   :target: https://travis-ci.org/mrmiguez/pymods



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mrmiguez/pymods",
    "name": "pymods",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "MODS metadata xml",
    "author": "Matthew Miguez",
    "author_email": "r.m.miguez@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/8d/7f/a1e2b88ce10cb6c5dcff3c378b74a29fddc91bd839c3d96c71e16c4a28a5/pymods-2.0.12.tar.gz",
    "platform": "",
    "description": "pymods\n======\n\npymods is utility module for working with the Library of Congress's MODS\nXML standard: Metadata Description Schema (MODS). It is a utility\nwrapper for the lxml module specific to deserializing data out of\nMODSXML into python data types.\n\nIf you need a module to serialize data into MODSXML, see the other\n`pymods by Matt Cordial <https://github.com/cordmata/pymods>`_.\n\nInstalling\n==========\n\nRecommended:\n\n``pip install pymods``\n\nUsing\n=====\n\nBasics\n------\n\nXML is parsed using the MODSReader class:\n\n``mods_records = pymods.MODSReader('some_file.xml')``\n\nIndividual records are stored as an iterator of the MODSRecord object:\n\n.. code:: python\n\n    In [5]: for record in mods_records:\n      ....:    print(record)\n      ....:\n    <Element {http://www.loc.gov/mods/v3}mods at 0x47a69f8>\n    <Element {http://www.loc.gov/mods/v3}mods at 0x47fd908>\n    <Element {http://www.loc.gov/mods/v3}mods at 0x47fda48>\n\nMODSReader will work with ``mods:modsCollection`` documents, outputs\nfrom OAI-PMH feeds, or individual MODSXML documents with ``mods:mods``\nas the root element.\n\npymods.MODSRecord\n^^^^^^^^^^^^^^^^^\n\nThe MODSReader class parses each ``mods:mods`` element into a\npymods.MODSRecord object. pymods.MODSRecord is a custom wrapper class\nfor the lxml.ElementBase class. All children of pymods.Record inherit\nthe lxml.\\_Element and lxml.ElementBase methods.\n\n.. code:: python\n\n    In [6]: record = next(pymods.MODSReader('example.xml'))\n    In [7]: print(record.nsmap)\n    {'dcterms': 'http://purl.org/dc/terms/', 'xsi': 'http://www.w3.org/2001/XMLSchema-instance', None: 'http://www.loc.gov/mods/v3', 'flvc': 'info:flvc/manifest/v1', 'xlink': 'http://www.w3.org/1999/xlink', 'mods': 'http://www.loc.gov/mods/v3'}\n\n.. code:: python\n\n    In [8]: for child in record.iterdescendants():\n      ....:    print(child.tag)\n\n    {http://www.loc.gov/mods/v3}identifier\n    {http://www.loc.gov/mods/v3}extension\n    {info:flvc/manifest/v1}flvc\n    {info:flvc/manifest/v1}owningInstitution\n    {info:flvc/manifest/v1}submittingInstitution\n    {http://www.loc.gov/mods/v3}titleInfo\n    {http://www.loc.gov/mods/v3}title\n    {http://www.loc.gov/mods/v3}name\n    {http://www.loc.gov/mods/v3}namePart\n    {http://www.loc.gov/mods/v3}role\n    {http://www.loc.gov/mods/v3}roleTerm\n    {http://www.loc.gov/mods/v3}roleTerm\n    {http://www.loc.gov/mods/v3}typeOfResource\n    {http://www.loc.gov/mods/v3}genre\n    ...\n\nMethods\n-------\n\nAll functions return data either as a string, list, list of named\ntuples. See the `API documentation <http://pymods.readthedocs.io>`_ or appropriate docstring for details.\n\n.. code:: python\n\n    >>> record.genre?\n    Type:        property\n    String form: <property object at 0x0000000004812C78>\n    Docstring:\n    Accesses mods:genre element.\n    :return: A list containing Genre elements with term, authority,\n        authorityURI, and valueURI attributes.\n\nExamples\n========\n\nImporting\n\n.. code:: python\n\n    from pymods import MODSReader, MODSRecord\n\nParsing a file\n\n.. code:: python\n\n    In [10]: mods = MODSReader('example.xml')\n    In [11]: for record in mods:\n       ....:    print(record.dates)\n       ....:\n    [Date(text='1966-12-08', type='{http://www.loc.gov/mods/v3}dateCreated')]\n    None\n    [Date(text='1987-02', type='{http://www.loc.gov/mods/v3}dateIssued')]\n\nSimple tasks\n------------\n\nGenerating a title list\n\n.. code:: python\n\n    In [14]: for record in mods:\n       ....:     print(record.titles)\n       ....:\n    ['Fire Line System']\n    ['$93,668.90. One Mill Tax Apportioned by Various Ways Proposed']\n    ['Broward NOW News: National Organization for Women, February 1987']\n\nCreating a subject list\n\n.. code:: python\n\n    In [17]: for record in mods:\n       ....:     for subject in record.subjects:\n       ....:         print(subject.text)\n       ....:\n    Concert halls\n    Architecture\n    Architectural drawings\n    Structural systems\n    Structural systems drawings\n    Structural drawings\n    Safety equipment\n    Construction\n    Mechanics\n    Structural optimization\n    Architectural design\n    Fire prevention--Safety measures\n    Taxes\n    Tax payers\n    Tax collection\n    Organizations\n    Feminism\n    Sex discrimination against women\n    Women's rights\n    Equal rights amendments\n    Women--Societies and clubs\n    National Organization for Women\n\nMore complex tasks\n------------------\n\nCreating a list of subject URI's only for LCSH subjects\n\n.. code:: python\n\n    In [18]: for record in mods:\n       ....:     for subject in record.subjects:\n       ....:         if 'lcsh' == subject.authority:\n       ....:             print(subject.uri)\n       ....:\n    http://id.loc.gov/authorities/subjects/sh85082767\n    http://id.loc.gov/authorities/subjects/sh88004614\n    http://id.loc.gov/authorities/subjects/sh85132810\n    http://id.loc.gov/authorities/subjects/sh85147343\n\nGet URLs for objects using a No Copyright US rightsstatement.org URI\n\n.. code:: python\n\n    In [23]: for record in mods:\n       ....:     for rights_elem in record.rights\n       ....:         if rights_elem.uri == 'http://rightsstatements.org/vocab/NoC-US/1.0/':\n       ....:             print(record.purl)\n       ....:\n    http://purl.flvc.org/fsu/fd/FSU_MSS0204_B01_F10_09\n    http://purl.flvc.org/fsu/fd/FSU_MSS2008003_B18_F01_004\n\n.. |Build Status| image:: https://travis-ci.org/mrmiguez/pymods.svg?branch=master\n   :target: https://travis-ci.org/mrmiguez/pymods\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Utility class wrapping lxml for reading data from MODS v3.4 XML metadata into Python data types.",
    "version": "2.0.12",
    "split_keywords": [
        "mods",
        "metadata",
        "xml"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e2485330359492e48becd20abe21b92538dbccaff4337f3617a760c2eec4ff67",
                "md5": "0dfa6e58cc85c696f96a5e0752673a94",
                "sha256": "02d599dd3627efe40fb7e313e1b11cb93ee3f74a7f7e76d07b296d8477b0d9ba"
            },
            "downloads": -1,
            "filename": "pymods-2.0.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0dfa6e58cc85c696f96a5e0752673a94",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 17779,
            "upload_time": "2021-10-25T12:28:50",
            "upload_time_iso_8601": "2021-10-25T12:28:50.144940Z",
            "url": "https://files.pythonhosted.org/packages/e2/48/5330359492e48becd20abe21b92538dbccaff4337f3617a760c2eec4ff67/pymods-2.0.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8d7fa1e2b88ce10cb6c5dcff3c378b74a29fddc91bd839c3d96c71e16c4a28a5",
                "md5": "68ffc0b3297eb933c8039d9ed2afcfeb",
                "sha256": "d92f8e9298bab47a424ac859e81a6e66124563a18856e9fc346fb53df97667fd"
            },
            "downloads": -1,
            "filename": "pymods-2.0.12.tar.gz",
            "has_sig": false,
            "md5_digest": "68ffc0b3297eb933c8039d9ed2afcfeb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 16536,
            "upload_time": "2021-10-25T12:28:52",
            "upload_time_iso_8601": "2021-10-25T12:28:52.879029Z",
            "url": "https://files.pythonhosted.org/packages/8d/7f/a1e2b88ce10cb6c5dcff3c378b74a29fddc91bd839c3d96c71e16c4a28a5/pymods-2.0.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-10-25 12:28:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "mrmiguez",
    "github_project": "pymods",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "pymods"
}
        
Elapsed time: 0.06049s