########################################################################
Xport
########################################################################
.. sphinx-page-start
Read and write SAS Transport files (``*.xpt``).
SAS uses a handful of archaic file formats: XPORT/XPT, CPORT, SAS7BDAT.
If someone publishes their data in one of those formats, this Python
package will help you convert the data into a more useful format. If
someone, like the FDA, asks you for an XPT file, this package can write
it for you.
What's it for?
==============
XPORT is the binary file format used by a bunch of `United States
government agencies`_ for publishing data sets. It made a lot of sense
if you were trying to read data files on your IBM mainframe back in
1988.
The official `SAS specification for XPORT`_ is relatively
straightforward. The hardest part is converting IBM-format floating
point to IEEE-format, which the specification explains in detail.
There was an `update to the XPT specification`_ for SAS v8 and above.
This module *has not yet been updated* to work with the new version.
However, if you're using SAS v8+, you're probably not using XPT
format. The changes to the format appear to be trivial changes to the
metadata, but this module's current error-checking will raise a
``ValueError``. If you'd like an update for v8, please let me know by
`submitting an issue`_.
.. _United States government agencies: https://www.google.com/search?q=site:.gov+xpt+file
.. _SAS specification for XPORT: http://support.sas.com/techsup/technote/ts140.pdf
.. _update to the XPT specification: https://support.sas.com/techsup/technote/ts140_2.pdf
.. _submitting an issue: https://github.com/selik/xport/issues/new
Installation
============
This project requires Python v3.7+. Grab the latest stable version from
PyPI.
.. code:: bash
$ python -m pip install --upgrade xport
Reading XPT
===========
This module follows the common pattern of providing ``load`` and
``loads`` functions for reading data from a SAS file format.
.. code:: python
import xport.v56
with open('example.xpt', 'rb') as f:
library = xport.v56.load(f)
The XPT decoders, ``xport.load`` and ``xport.loads``, return a
``xport.Library``, which is a mapping (``dict``-like) of
``xport.Dataset``s. The ``xport.Dataset``` is a subclass of
``pandas.DataFrame`` with SAS metadata attributes (name, label, etc.).
The columns of a ``xport.Dataset`` are ``xport.Variable`` types, which
are subclasses of ``pandas.Series`` with SAS metadata (name, label,
format, etc.).
If you're not familiar with `Pandas`_'s dataframes, it's easy to think
of them as a dictionary of columns, mapping variable names to variable
data.
The SAS Transport (XPORT) format only supports two kinds of data. Each
value is either numeric or character, so ``xport.load`` decodes the
values as either ``str`` or ``float``.
Note that since XPT files are in an unusual binary format, you should
open them using mode ``'rb'``.
.. _Pandas: http://pandas.pydata.org/
You can also use the ``xport`` module as a command-line tool to convert
an XPT file to CSV (comma-separated values) file. The ``xport``
executable is a friendly alias for ``python -m xport``. Caution: if this command-line does not work with the lastest version, it should be working with version 2.0.2. To get this version, we can either download the files from this `link`_ or simply type the following command line your bash terminal: ``pip install xport==2.0.2``.
.. _link: https://pypi.org/project/xport/2.0.2/#files
.. code:: bash
$ xport example.xpt > example.csv
Writing XPT
===========
The ``xport`` package follows the common pattern of providing ``dump``
and ``dumps`` functions for writing data to a SAS file format.
.. code:: python
import xport
import xport.v56
ds = xport.Dataset()
with open('example.xpt', 'wb') as f:
xport.v56.dump(ds, f)
Because the ``xport.Dataset`` is an extension of ``pandas.DataFrame``,
you can create datasets in a variety of ways, converting easily from a
dataframe to a dataset.
.. code:: python
import pandas as pd
import xport
import xport.v56
df = pandas.DataFrame({'NUMBERS': [1, 2], 'TEXT': ['a', 'b']})
ds = xport.Dataset(df, name='MAX8CHRS', label='Up to 40!')
with open('example.xpt', 'wb') as f:
xport.v56.dump(ds, f)
SAS Transport v5 restricts variable names to 8 characters (with a
strange preference for uppercase) and labels to 40 characters. If you
want the relative comfort of SAS Transport v8's limit of 246 characters,
please `make an enhancement request`_.
It's likely that most people will be using Pandas_ dataframes for the
bulk of their analysis work, and will want to convert to XPT at the
very end of their process.
.. code:: python
import pandas as pd
import xport
import xport.v56
df = pd.DataFrame({
'alpha': [10, 20, 30],
'beta': ['x', 'y', 'z'],
})
... # Analysis work ...
ds = xport.Dataset(df, name='DATA', label='Wonderful data')
# SAS variable names are limited to 8 characters. As with Pandas
# dataframes, you must change the name on the dataset rather than
# the column directly.
ds = ds.rename(columns={k: k.upper()[:8] for k in ds})
# Other SAS metadata can be set on the columns themselves.
for k, v in ds.items():
v.label = k.title()
if v.dtype == 'object':
v.format = '$CHAR20.'
else:
v.format = '10.2'
# Libraries can have multiple datasets.
library = xport.Library({'DATA': ds})
with open('example.xpt', 'wb') as f:
xport.v56.dump(library, f)
Feature requests
================
I'm happy to fix bugs, improve the interface, or make the module
faster. Just `submit an issue`_ and I'll take a look. If you work for
a corporation or well-funded non-profit, please consider a sponsorship_.
.. _make an enhancement request: https://github.com/selik/xport/issues/new
.. _submit an issue: https://github.com/selik/xport/issues/new
.. _sponsorship: https://github.com/sponsors/selik
Thanks
======
Current and past sponsors include:
|ProtocolFirst|
.. |ProtocolFirst| image:: docs/_static/protocolfirst.png
:alt: Protocol First
:target: https://www.protocolfirst.com
Contributing
============
This project is configured to be developed in a Conda environment.
.. code:: bash
$ git clone git@github.com:selik/xport.git
$ cd xport
$ make install # Install into a Conda environment
$ conda activate xport # Activate the Conda environment
$ make install-html # Build the docs website
Authors
=======
Original version by `Jack Cushman`_, 2012.
Major revisions by `Michael Selik`_, 2016 and 2020.
Minor revisions by `Alfred Chan`_, 2020.
Minor revisions by `Derek Croote`_, 2021.
.. _Jack Cushman: https://github.com/jcushman
.. _Michael Selik: https://github.com/selik
.. _Alfred Chan: https://github.com/alfred-b-chan
.. _Derek Croote: https://github.com/dcroote
Change Log
==========
v0.1.0, 2012-05-02
Initial release.
v0.2.0, 2016-03-22
Major revision.
v0.2.0, 2016-03-23
Add numpy and pandas converters.
v1.0.0, 2016-10-21
Revise API to the pattern of from/to <format>
v2.0.0, 2016-10-21
Reader yields regular tuples, not namedtuples
v3.0.0, 2020-04-20
Revise API to the load/dump pattern.
Enable specifying dataset name, variable names, labels, and formats.
v3.1.0, 2020-04-20
Allow ``dumps(dataframe)`` instead of requiring a ``Dataset``.
v3.2.2, 2020-09-03
Fix a bug that incorrectly displays a - (dash) when it's a null for numeric field.
v3.3.0, 2021-12-25
Enable reading Transport Version 8/9 files. Merry Christmas!
v3.4.0, 2021-12-25
Add support for special missing values, like `.A`, that extend `float`.
v3.5.0, 2021-12-31
Enable writing Transport Version 8 files. Happy New Year!
v3.5.1, 2022-02-01
Fix issues with writing ``Dataset.label`` and ``Variable.label``.
v3.6.0, 2022-02-02
Add beta support for changing the text encoding for data and metadata.
v3.6.1, 2022-02-15
Fix issue with v8 format when the dataset has no long labels.
Raw data
{
"_id": null,
"home_page": "https://github.com/selik/xport",
"name": "xport",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "sas,xport,xpt,cport,sas7bdat",
"author": "Michael Selik",
"author_email": "michael.selik@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/8c/02/7fb6ff8572b9c6e725598c72ea9a14833fc0a3073889d265a3c6b9f4a8f0/xport-3.6.1.tar.gz",
"platform": "any",
"description": "########################################################################\n Xport\n########################################################################\n\n.. sphinx-page-start\n\nRead and write SAS Transport files (``*.xpt``).\n\nSAS uses a handful of archaic file formats: XPORT/XPT, CPORT, SAS7BDAT.\nIf someone publishes their data in one of those formats, this Python\npackage will help you convert the data into a more useful format. If\nsomeone, like the FDA, asks you for an XPT file, this package can write\nit for you.\n\n\nWhat's it for?\n==============\n\nXPORT is the binary file format used by a bunch of `United States\ngovernment agencies`_ for publishing data sets. It made a lot of sense\nif you were trying to read data files on your IBM mainframe back in\n1988.\n\nThe official `SAS specification for XPORT`_ is relatively\nstraightforward. The hardest part is converting IBM-format floating\npoint to IEEE-format, which the specification explains in detail.\n\nThere was an `update to the XPT specification`_ for SAS v8 and above.\nThis module *has not yet been updated* to work with the new version.\nHowever, if you're using SAS v8+, you're probably not using XPT\nformat. The changes to the format appear to be trivial changes to the\nmetadata, but this module's current error-checking will raise a\n``ValueError``. If you'd like an update for v8, please let me know by\n`submitting an issue`_.\n\n.. _United States government agencies: https://www.google.com/search?q=site:.gov+xpt+file\n\n.. _SAS specification for XPORT: http://support.sas.com/techsup/technote/ts140.pdf\n\n.. _update to the XPT specification: https://support.sas.com/techsup/technote/ts140_2.pdf\n\n.. _submitting an issue: https://github.com/selik/xport/issues/new\n\n\n\nInstallation\n============\n\nThis project requires Python v3.7+. Grab the latest stable version from\nPyPI.\n\n.. code:: bash\n\n $ python -m pip install --upgrade xport\n\n\n\nReading XPT\n===========\n\nThis module follows the common pattern of providing ``load`` and\n``loads`` functions for reading data from a SAS file format.\n\n.. code:: python\n\n import xport.v56\n\n with open('example.xpt', 'rb') as f:\n library = xport.v56.load(f)\n\n\nThe XPT decoders, ``xport.load`` and ``xport.loads``, return a\n``xport.Library``, which is a mapping (``dict``-like) of\n``xport.Dataset``s. The ``xport.Dataset``` is a subclass of\n``pandas.DataFrame`` with SAS metadata attributes (name, label, etc.).\nThe columns of a ``xport.Dataset`` are ``xport.Variable`` types, which\nare subclasses of ``pandas.Series`` with SAS metadata (name, label,\nformat, etc.).\n\nIf you're not familiar with `Pandas`_'s dataframes, it's easy to think\nof them as a dictionary of columns, mapping variable names to variable\ndata.\n\nThe SAS Transport (XPORT) format only supports two kinds of data. Each\nvalue is either numeric or character, so ``xport.load`` decodes the\nvalues as either ``str`` or ``float``.\n\nNote that since XPT files are in an unusual binary format, you should\nopen them using mode ``'rb'``.\n\n.. _Pandas: http://pandas.pydata.org/\n\n\nYou can also use the ``xport`` module as a command-line tool to convert\nan XPT file to CSV (comma-separated values) file. The ``xport``\nexecutable is a friendly alias for ``python -m xport``. Caution: if this command-line does not work with the lastest version, it should be working with version 2.0.2. To get this version, we can either download the files from this `link`_ or simply type the following command line your bash terminal: ``pip install xport==2.0.2``.\n\n.. _link: https://pypi.org/project/xport/2.0.2/#files\n\n.. code:: bash\n\n $ xport example.xpt > example.csv\n\n\nWriting XPT\n===========\n\nThe ``xport`` package follows the common pattern of providing ``dump``\nand ``dumps`` functions for writing data to a SAS file format.\n\n.. code:: python\n\n import xport\n import xport.v56\n\n ds = xport.Dataset()\n with open('example.xpt', 'wb') as f:\n xport.v56.dump(ds, f)\n\n\nBecause the ``xport.Dataset`` is an extension of ``pandas.DataFrame``,\nyou can create datasets in a variety of ways, converting easily from a\ndataframe to a dataset.\n\n.. code:: python\n\n import pandas as pd\n import xport\n import xport.v56\n\n df = pandas.DataFrame({'NUMBERS': [1, 2], 'TEXT': ['a', 'b']})\n ds = xport.Dataset(df, name='MAX8CHRS', label='Up to 40!')\n with open('example.xpt', 'wb') as f:\n xport.v56.dump(ds, f)\n\n\nSAS Transport v5 restricts variable names to 8 characters (with a\nstrange preference for uppercase) and labels to 40 characters. If you\nwant the relative comfort of SAS Transport v8's limit of 246 characters,\nplease `make an enhancement request`_.\n\n\nIt's likely that most people will be using Pandas_ dataframes for the\nbulk of their analysis work, and will want to convert to XPT at the\nvery end of their process.\n\n.. code:: python\n\n import pandas as pd\n import xport\n import xport.v56\n\n df = pd.DataFrame({\n 'alpha': [10, 20, 30],\n 'beta': ['x', 'y', 'z'],\n })\n\n ... # Analysis work ...\n\n ds = xport.Dataset(df, name='DATA', label='Wonderful data')\n\n # SAS variable names are limited to 8 characters. As with Pandas\n # dataframes, you must change the name on the dataset rather than\n # the column directly.\n ds = ds.rename(columns={k: k.upper()[:8] for k in ds})\n\n # Other SAS metadata can be set on the columns themselves.\n for k, v in ds.items():\n v.label = k.title()\n if v.dtype == 'object':\n v.format = '$CHAR20.'\n else:\n v.format = '10.2'\n\n # Libraries can have multiple datasets.\n library = xport.Library({'DATA': ds})\n\n with open('example.xpt', 'wb') as f:\n xport.v56.dump(library, f)\n\n\nFeature requests\n================\n\nI'm happy to fix bugs, improve the interface, or make the module\nfaster. Just `submit an issue`_ and I'll take a look. If you work for\na corporation or well-funded non-profit, please consider a sponsorship_.\n\n.. _make an enhancement request: https://github.com/selik/xport/issues/new\n.. _submit an issue: https://github.com/selik/xport/issues/new\n.. _sponsorship: https://github.com/sponsors/selik\n\n\nThanks\n======\n\nCurrent and past sponsors include:\n\n|ProtocolFirst|\n\n.. |ProtocolFirst| image:: docs/_static/protocolfirst.png\n :alt: Protocol First\n :target: https://www.protocolfirst.com\n\n\nContributing\n============\n\nThis project is configured to be developed in a Conda environment.\n\n.. code:: bash\n\n $ git clone git@github.com:selik/xport.git\n $ cd xport\n $ make install # Install into a Conda environment\n $ conda activate xport # Activate the Conda environment\n $ make install-html # Build the docs website\n\n\nAuthors\n=======\n\nOriginal version by `Jack Cushman`_, 2012.\n\nMajor revisions by `Michael Selik`_, 2016 and 2020.\n\nMinor revisions by `Alfred Chan`_, 2020.\n\nMinor revisions by `Derek Croote`_, 2021.\n\n.. _Jack Cushman: https://github.com/jcushman\n\n.. _Michael Selik: https://github.com/selik\n\n.. _Alfred Chan: https://github.com/alfred-b-chan\n\n.. _Derek Croote: https://github.com/dcroote\n\nChange Log\n==========\n\nv0.1.0, 2012-05-02\n Initial release.\n\nv0.2.0, 2016-03-22\n Major revision.\n\nv0.2.0, 2016-03-23\n Add numpy and pandas converters.\n\nv1.0.0, 2016-10-21\n Revise API to the pattern of from/to <format>\n\nv2.0.0, 2016-10-21\n Reader yields regular tuples, not namedtuples\n\nv3.0.0, 2020-04-20\n Revise API to the load/dump pattern.\n Enable specifying dataset name, variable names, labels, and formats.\n\nv3.1.0, 2020-04-20\n Allow ``dumps(dataframe)`` instead of requiring a ``Dataset``.\n\nv3.2.2, 2020-09-03\n Fix a bug that incorrectly displays a - (dash) when it's a null for numeric field.\n\nv3.3.0, 2021-12-25\n Enable reading Transport Version 8/9 files. Merry Christmas!\n\nv3.4.0, 2021-12-25\n Add support for special missing values, like `.A`, that extend `float`.\n\nv3.5.0, 2021-12-31\n Enable writing Transport Version 8 files. Happy New Year!\n\nv3.5.1, 2022-02-01\n Fix issues with writing ``Dataset.label`` and ``Variable.label``.\n\nv3.6.0, 2022-02-02\n Add beta support for changing the text encoding for data and metadata.\n\nv3.6.1, 2022-02-15\n Fix issue with v8 format when the dataset has no long labels.\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "SAS XPORT file reader",
"version": "3.6.1",
"split_keywords": [
"sas",
"xport",
"xpt",
"cport",
"sas7bdat"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "947ac842e37b6221934aca6dd8810246e0f1a027c372238f7aa5fcdb3f7938f0",
"md5": "3ec706c9782420043d42c45b912203cd",
"sha256": "f3e189468a34252d17cac26989d55c951c6dfd868bdc6f23c3b735e2cef8aafe"
},
"downloads": -1,
"filename": "xport-3.6.1-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "3ec706c9782420043d42c45b912203cd",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.7",
"size": 29358,
"upload_time": "2022-02-16T07:35:05",
"upload_time_iso_8601": "2022-02-16T07:35:05.386214Z",
"url": "https://files.pythonhosted.org/packages/94/7a/c842e37b6221934aca6dd8810246e0f1a027c372238f7aa5fcdb3f7938f0/xport-3.6.1-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8c027fb6ff8572b9c6e725598c72ea9a14833fc0a3073889d265a3c6b9f4a8f0",
"md5": "e3f73f482a25d7f3364184ea209f15fa",
"sha256": "da1e461bd35235498a56fcb61f01824c72bdf9760f049eac8adc4e0cbbc2e17e"
},
"downloads": -1,
"filename": "xport-3.6.1.tar.gz",
"has_sig": false,
"md5_digest": "e3f73f482a25d7f3364184ea209f15fa",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 38432,
"upload_time": "2022-02-16T07:35:07",
"upload_time_iso_8601": "2022-02-16T07:35:07.357421Z",
"url": "https://files.pythonhosted.org/packages/8c/02/7fb6ff8572b9c6e725598c72ea9a14833fc0a3073889d265a3c6b9f4a8f0/xport-3.6.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-02-16 07:35:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "selik",
"github_project": "xport",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "xport"
}