xport


Namexport JSON
Version 3.6.1 PyPI version JSON
download
home_pagehttps://github.com/selik/xport
SummarySAS XPORT file reader
upload_time2022-02-16 07:35:07
maintainer
docs_urlNone
authorMichael Selik
requires_python>=3.7
licenseMIT
keywords sas xport xpt cport sas7bdat
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ########################################################################
  Xport
########################################################################

.. sphinx-page-start

Read and write SAS Transport files (``*.xpt``).

SAS uses a handful of archaic file formats: XPORT/XPT, CPORT, SAS7BDAT.
If someone publishes their data in one of those formats, this Python
package will help you convert the data into a more useful format.  If
someone, like the FDA, asks you for an XPT file, this package can write
it for you.


What's it for?
==============

XPORT is the binary file format used by a bunch of `United States
government agencies`_ for publishing data sets. It made a lot of sense
if you were trying to read data files on your IBM mainframe back in
1988.

The official `SAS specification for XPORT`_ is relatively
straightforward. The hardest part is converting IBM-format floating
point to IEEE-format, which the specification explains in detail.

There was an `update to the XPT specification`_ for SAS v8 and above.
This module *has not yet been updated* to work with the new version.
However, if you're using SAS v8+, you're probably not using XPT
format. The changes to the format appear to be trivial changes to the
metadata, but this module's current error-checking will raise a
``ValueError``. If you'd like an update for v8, please let me know by
`submitting an issue`_.

.. _United States government agencies: https://www.google.com/search?q=site:.gov+xpt+file

.. _SAS specification for XPORT: http://support.sas.com/techsup/technote/ts140.pdf

.. _update to the XPT specification: https://support.sas.com/techsup/technote/ts140_2.pdf

.. _submitting an issue: https://github.com/selik/xport/issues/new



Installation
============

This project requires Python v3.7+.  Grab the latest stable version from
PyPI.

.. code:: bash

    $ python -m pip install --upgrade xport



Reading XPT
===========

This module follows the common pattern of providing ``load`` and
``loads`` functions for reading data from a SAS file format.

.. code:: python

    import xport.v56

    with open('example.xpt', 'rb') as f:
        library = xport.v56.load(f)


The XPT decoders, ``xport.load`` and ``xport.loads``, return a
``xport.Library``, which is a mapping (``dict``-like) of
``xport.Dataset``s.  The ``xport.Dataset``` is a subclass of
``pandas.DataFrame`` with SAS metadata attributes (name, label, etc.).
The columns of a ``xport.Dataset`` are ``xport.Variable`` types, which
are subclasses of ``pandas.Series`` with SAS metadata (name, label,
format, etc.).

If you're not familiar with `Pandas`_'s dataframes, it's easy to think
of them as a dictionary of columns, mapping variable names to variable
data.

The SAS Transport (XPORT) format only supports two kinds of data.  Each
value is either numeric or character, so ``xport.load`` decodes the
values as either ``str`` or ``float``.

Note that since XPT files are in an unusual binary format, you should
open them using mode ``'rb'``.

.. _Pandas: http://pandas.pydata.org/


You can also use the ``xport`` module as a command-line tool to convert
an XPT file to CSV (comma-separated values) file.  The ``xport``
executable is a friendly alias for ``python -m xport``. Caution: if this command-line does not work with the lastest version, it should be working with version 2.0.2. To get this version, we can either download the files from this `link`_ or simply type the following command line your bash terminal: ``pip install xport==2.0.2``.

.. _link: https://pypi.org/project/xport/2.0.2/#files

.. code:: bash

    $ xport example.xpt > example.csv


Writing XPT
===========

The ``xport`` package follows the common pattern of providing ``dump``
and ``dumps`` functions for writing data to a SAS file format.

.. code:: python

    import xport
    import xport.v56

    ds = xport.Dataset()
    with open('example.xpt', 'wb') as f:
        xport.v56.dump(ds, f)


Because the ``xport.Dataset`` is an extension of ``pandas.DataFrame``,
you can create datasets in a variety of ways, converting easily from a
dataframe to a dataset.

.. code:: python

    import pandas as pd
    import xport
    import xport.v56

    df = pandas.DataFrame({'NUMBERS': [1, 2], 'TEXT': ['a', 'b']})
    ds = xport.Dataset(df, name='MAX8CHRS', label='Up to 40!')
    with open('example.xpt', 'wb') as f:
        xport.v56.dump(ds, f)


SAS Transport v5 restricts variable names to 8 characters (with a
strange preference for uppercase) and labels to 40 characters.  If you
want the relative comfort of SAS Transport v8's limit of 246 characters,
please `make an enhancement request`_.


It's likely that most people will be using Pandas_ dataframes for the
bulk of their analysis work, and will want to convert to XPT at the
very end of their process.

.. code:: python

    import pandas as pd
    import xport
    import xport.v56

    df = pd.DataFrame({
        'alpha': [10, 20, 30],
        'beta': ['x', 'y', 'z'],
    })

    ...  # Analysis work ...

    ds = xport.Dataset(df, name='DATA', label='Wonderful data')

    # SAS variable names are limited to 8 characters.  As with Pandas
    # dataframes, you must change the name on the dataset rather than
    # the column directly.
    ds = ds.rename(columns={k: k.upper()[:8] for k in ds})

    # Other SAS metadata can be set on the columns themselves.
    for k, v in ds.items():
        v.label = k.title()
        if v.dtype == 'object':
            v.format = '$CHAR20.'
        else:
            v.format = '10.2'

    # Libraries can have multiple datasets.
    library = xport.Library({'DATA': ds})

    with open('example.xpt', 'wb') as f:
        xport.v56.dump(library, f)


Feature requests
================

I'm happy to fix bugs, improve the interface, or make the module
faster.  Just `submit an issue`_ and I'll take a look.  If you work for
a corporation or well-funded non-profit, please consider a sponsorship_.

.. _make an enhancement request: https://github.com/selik/xport/issues/new
.. _submit an issue: https://github.com/selik/xport/issues/new
.. _sponsorship: https://github.com/sponsors/selik


Thanks
======

Current and past sponsors include:

|ProtocolFirst|

.. |ProtocolFirst| image:: docs/_static/protocolfirst.png
   :alt: Protocol First
   :target: https://www.protocolfirst.com


Contributing
============

This project is configured to be developed in a Conda environment.

.. code:: bash

    $ git clone git@github.com:selik/xport.git
    $ cd xport
    $ make install          # Install into a Conda environment
    $ conda activate xport  # Activate the Conda environment
    $ make install-html     # Build the docs website


Authors
=======

Original version by `Jack Cushman`_, 2012.

Major revisions by `Michael Selik`_, 2016 and 2020.

Minor revisions by `Alfred Chan`_, 2020.

Minor revisions by `Derek Croote`_, 2021.

.. _Jack Cushman: https://github.com/jcushman

.. _Michael Selik: https://github.com/selik

.. _Alfred Chan: https://github.com/alfred-b-chan

.. _Derek Croote: https://github.com/dcroote

Change Log
==========

v0.1.0, 2012-05-02
  Initial release.

v0.2.0, 2016-03-22
  Major revision.

v0.2.0, 2016-03-23
  Add numpy and pandas converters.

v1.0.0, 2016-10-21
  Revise API to the pattern of from/to <format>

v2.0.0, 2016-10-21
  Reader yields regular tuples, not namedtuples

v3.0.0, 2020-04-20
  Revise API to the load/dump pattern.
  Enable specifying dataset name, variable names, labels, and formats.

v3.1.0, 2020-04-20
  Allow ``dumps(dataframe)`` instead of requiring a ``Dataset``.

v3.2.2, 2020-09-03
  Fix a bug that incorrectly displays a - (dash) when it's a null for numeric field.

v3.3.0, 2021-12-25
  Enable reading Transport Version 8/9 files.  Merry Christmas!

v3.4.0, 2021-12-25
  Add support for special missing values, like `.A`, that extend `float`.

v3.5.0, 2021-12-31
  Enable writing Transport Version 8 files.  Happy New Year!

v3.5.1, 2022-02-01
  Fix issues with writing ``Dataset.label`` and ``Variable.label``.

v3.6.0, 2022-02-02
  Add beta support for changing the text encoding for data and metadata.

v3.6.1, 2022-02-15
  Fix issue with v8 format when the dataset has no long labels.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/selik/xport",
    "name": "xport",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "sas,xport,xpt,cport,sas7bdat",
    "author": "Michael Selik",
    "author_email": "michael.selik@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/8c/02/7fb6ff8572b9c6e725598c72ea9a14833fc0a3073889d265a3c6b9f4a8f0/xport-3.6.1.tar.gz",
    "platform": "any",
    "description": "########################################################################\n  Xport\n########################################################################\n\n.. sphinx-page-start\n\nRead and write SAS Transport files (``*.xpt``).\n\nSAS uses a handful of archaic file formats: XPORT/XPT, CPORT, SAS7BDAT.\nIf someone publishes their data in one of those formats, this Python\npackage will help you convert the data into a more useful format.  If\nsomeone, like the FDA, asks you for an XPT file, this package can write\nit for you.\n\n\nWhat's it for?\n==============\n\nXPORT is the binary file format used by a bunch of `United States\ngovernment agencies`_ for publishing data sets. It made a lot of sense\nif you were trying to read data files on your IBM mainframe back in\n1988.\n\nThe official `SAS specification for XPORT`_ is relatively\nstraightforward. The hardest part is converting IBM-format floating\npoint to IEEE-format, which the specification explains in detail.\n\nThere was an `update to the XPT specification`_ for SAS v8 and above.\nThis module *has not yet been updated* to work with the new version.\nHowever, if you're using SAS v8+, you're probably not using XPT\nformat. The changes to the format appear to be trivial changes to the\nmetadata, but this module's current error-checking will raise a\n``ValueError``. If you'd like an update for v8, please let me know by\n`submitting an issue`_.\n\n.. _United States government agencies: https://www.google.com/search?q=site:.gov+xpt+file\n\n.. _SAS specification for XPORT: http://support.sas.com/techsup/technote/ts140.pdf\n\n.. _update to the XPT specification: https://support.sas.com/techsup/technote/ts140_2.pdf\n\n.. _submitting an issue: https://github.com/selik/xport/issues/new\n\n\n\nInstallation\n============\n\nThis project requires Python v3.7+.  Grab the latest stable version from\nPyPI.\n\n.. code:: bash\n\n    $ python -m pip install --upgrade xport\n\n\n\nReading XPT\n===========\n\nThis module follows the common pattern of providing ``load`` and\n``loads`` functions for reading data from a SAS file format.\n\n.. code:: python\n\n    import xport.v56\n\n    with open('example.xpt', 'rb') as f:\n        library = xport.v56.load(f)\n\n\nThe XPT decoders, ``xport.load`` and ``xport.loads``, return a\n``xport.Library``, which is a mapping (``dict``-like) of\n``xport.Dataset``s.  The ``xport.Dataset``` is a subclass of\n``pandas.DataFrame`` with SAS metadata attributes (name, label, etc.).\nThe columns of a ``xport.Dataset`` are ``xport.Variable`` types, which\nare subclasses of ``pandas.Series`` with SAS metadata (name, label,\nformat, etc.).\n\nIf you're not familiar with `Pandas`_'s dataframes, it's easy to think\nof them as a dictionary of columns, mapping variable names to variable\ndata.\n\nThe SAS Transport (XPORT) format only supports two kinds of data.  Each\nvalue is either numeric or character, so ``xport.load`` decodes the\nvalues as either ``str`` or ``float``.\n\nNote that since XPT files are in an unusual binary format, you should\nopen them using mode ``'rb'``.\n\n.. _Pandas: http://pandas.pydata.org/\n\n\nYou can also use the ``xport`` module as a command-line tool to convert\nan XPT file to CSV (comma-separated values) file.  The ``xport``\nexecutable is a friendly alias for ``python -m xport``. Caution: if this command-line does not work with the lastest version, it should be working with version 2.0.2. To get this version, we can either download the files from this `link`_ or simply type the following command line your bash terminal: ``pip install xport==2.0.2``.\n\n.. _link: https://pypi.org/project/xport/2.0.2/#files\n\n.. code:: bash\n\n    $ xport example.xpt > example.csv\n\n\nWriting XPT\n===========\n\nThe ``xport`` package follows the common pattern of providing ``dump``\nand ``dumps`` functions for writing data to a SAS file format.\n\n.. code:: python\n\n    import xport\n    import xport.v56\n\n    ds = xport.Dataset()\n    with open('example.xpt', 'wb') as f:\n        xport.v56.dump(ds, f)\n\n\nBecause the ``xport.Dataset`` is an extension of ``pandas.DataFrame``,\nyou can create datasets in a variety of ways, converting easily from a\ndataframe to a dataset.\n\n.. code:: python\n\n    import pandas as pd\n    import xport\n    import xport.v56\n\n    df = pandas.DataFrame({'NUMBERS': [1, 2], 'TEXT': ['a', 'b']})\n    ds = xport.Dataset(df, name='MAX8CHRS', label='Up to 40!')\n    with open('example.xpt', 'wb') as f:\n        xport.v56.dump(ds, f)\n\n\nSAS Transport v5 restricts variable names to 8 characters (with a\nstrange preference for uppercase) and labels to 40 characters.  If you\nwant the relative comfort of SAS Transport v8's limit of 246 characters,\nplease `make an enhancement request`_.\n\n\nIt's likely that most people will be using Pandas_ dataframes for the\nbulk of their analysis work, and will want to convert to XPT at the\nvery end of their process.\n\n.. code:: python\n\n    import pandas as pd\n    import xport\n    import xport.v56\n\n    df = pd.DataFrame({\n        'alpha': [10, 20, 30],\n        'beta': ['x', 'y', 'z'],\n    })\n\n    ...  # Analysis work ...\n\n    ds = xport.Dataset(df, name='DATA', label='Wonderful data')\n\n    # SAS variable names are limited to 8 characters.  As with Pandas\n    # dataframes, you must change the name on the dataset rather than\n    # the column directly.\n    ds = ds.rename(columns={k: k.upper()[:8] for k in ds})\n\n    # Other SAS metadata can be set on the columns themselves.\n    for k, v in ds.items():\n        v.label = k.title()\n        if v.dtype == 'object':\n            v.format = '$CHAR20.'\n        else:\n            v.format = '10.2'\n\n    # Libraries can have multiple datasets.\n    library = xport.Library({'DATA': ds})\n\n    with open('example.xpt', 'wb') as f:\n        xport.v56.dump(library, f)\n\n\nFeature requests\n================\n\nI'm happy to fix bugs, improve the interface, or make the module\nfaster.  Just `submit an issue`_ and I'll take a look.  If you work for\na corporation or well-funded non-profit, please consider a sponsorship_.\n\n.. _make an enhancement request: https://github.com/selik/xport/issues/new\n.. _submit an issue: https://github.com/selik/xport/issues/new\n.. _sponsorship: https://github.com/sponsors/selik\n\n\nThanks\n======\n\nCurrent and past sponsors include:\n\n|ProtocolFirst|\n\n.. |ProtocolFirst| image:: docs/_static/protocolfirst.png\n   :alt: Protocol First\n   :target: https://www.protocolfirst.com\n\n\nContributing\n============\n\nThis project is configured to be developed in a Conda environment.\n\n.. code:: bash\n\n    $ git clone git@github.com:selik/xport.git\n    $ cd xport\n    $ make install          # Install into a Conda environment\n    $ conda activate xport  # Activate the Conda environment\n    $ make install-html     # Build the docs website\n\n\nAuthors\n=======\n\nOriginal version by `Jack Cushman`_, 2012.\n\nMajor revisions by `Michael Selik`_, 2016 and 2020.\n\nMinor revisions by `Alfred Chan`_, 2020.\n\nMinor revisions by `Derek Croote`_, 2021.\n\n.. _Jack Cushman: https://github.com/jcushman\n\n.. _Michael Selik: https://github.com/selik\n\n.. _Alfred Chan: https://github.com/alfred-b-chan\n\n.. _Derek Croote: https://github.com/dcroote\n\nChange Log\n==========\n\nv0.1.0, 2012-05-02\n  Initial release.\n\nv0.2.0, 2016-03-22\n  Major revision.\n\nv0.2.0, 2016-03-23\n  Add numpy and pandas converters.\n\nv1.0.0, 2016-10-21\n  Revise API to the pattern of from/to <format>\n\nv2.0.0, 2016-10-21\n  Reader yields regular tuples, not namedtuples\n\nv3.0.0, 2020-04-20\n  Revise API to the load/dump pattern.\n  Enable specifying dataset name, variable names, labels, and formats.\n\nv3.1.0, 2020-04-20\n  Allow ``dumps(dataframe)`` instead of requiring a ``Dataset``.\n\nv3.2.2, 2020-09-03\n  Fix a bug that incorrectly displays a - (dash) when it's a null for numeric field.\n\nv3.3.0, 2021-12-25\n  Enable reading Transport Version 8/9 files.  Merry Christmas!\n\nv3.4.0, 2021-12-25\n  Add support for special missing values, like `.A`, that extend `float`.\n\nv3.5.0, 2021-12-31\n  Enable writing Transport Version 8 files.  Happy New Year!\n\nv3.5.1, 2022-02-01\n  Fix issues with writing ``Dataset.label`` and ``Variable.label``.\n\nv3.6.0, 2022-02-02\n  Add beta support for changing the text encoding for data and metadata.\n\nv3.6.1, 2022-02-15\n  Fix issue with v8 format when the dataset has no long labels.\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "SAS XPORT file reader",
    "version": "3.6.1",
    "split_keywords": [
        "sas",
        "xport",
        "xpt",
        "cport",
        "sas7bdat"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "947ac842e37b6221934aca6dd8810246e0f1a027c372238f7aa5fcdb3f7938f0",
                "md5": "3ec706c9782420043d42c45b912203cd",
                "sha256": "f3e189468a34252d17cac26989d55c951c6dfd868bdc6f23c3b735e2cef8aafe"
            },
            "downloads": -1,
            "filename": "xport-3.6.1-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3ec706c9782420043d42c45b912203cd",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.7",
            "size": 29358,
            "upload_time": "2022-02-16T07:35:05",
            "upload_time_iso_8601": "2022-02-16T07:35:05.386214Z",
            "url": "https://files.pythonhosted.org/packages/94/7a/c842e37b6221934aca6dd8810246e0f1a027c372238f7aa5fcdb3f7938f0/xport-3.6.1-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8c027fb6ff8572b9c6e725598c72ea9a14833fc0a3073889d265a3c6b9f4a8f0",
                "md5": "e3f73f482a25d7f3364184ea209f15fa",
                "sha256": "da1e461bd35235498a56fcb61f01824c72bdf9760f049eac8adc4e0cbbc2e17e"
            },
            "downloads": -1,
            "filename": "xport-3.6.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e3f73f482a25d7f3364184ea209f15fa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 38432,
            "upload_time": "2022-02-16T07:35:07",
            "upload_time_iso_8601": "2022-02-16T07:35:07.357421Z",
            "url": "https://files.pythonhosted.org/packages/8c/02/7fb6ff8572b9c6e725598c72ea9a14833fc0a3073889d265a3c6b9f4a8f0/xport-3.6.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-02-16 07:35:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "selik",
    "github_project": "xport",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "xport"
}
        
Elapsed time: 0.42734s