UpSetPlot


NameUpSetPlot JSON
Version 0.9.0 PyPI version JSON
download
home_pagehttps://upsetplot.readthedocs.io
SummaryDraw Lex et al.'s UpSet plots with Pandas and Matplotlib
upload_time2023-12-31 01:07:59
maintainer
docs_urlNone
authorJoel Nothman
requires_python
licenseBSD-3-Clause
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            UpSetPlot documentation
============================

|version| |licence| |py-versions|

|issues| |build| |docs| |coverage|

This is another Python implementation of UpSet plots by Lex et al. [Lex2014]_.
UpSet plots are used to visualise set overlaps; like Venn diagrams but
more readable. Documentation is at https://upsetplot.readthedocs.io.

This ``upsetplot`` library tries to provide a simple interface backed by an
extensible, object-oriented design.

There are many ways to represent the categorisation of data, as covered in
our `Data Format Guide <https://upsetplot.readthedocs.io/en/stable/formats.html>`_.

Our internal input format uses a `pandas.Series` containing counts
corresponding to subset sizes, where each subset is an intersection of named
categories.  The index of the Series indicates which rows pertain to which
categories, by having multiple boolean indices, like ``example`` in the
following::

    >>> from upsetplot import generate_counts
    >>> example = generate_counts()
    >>> example
    cat0   cat1   cat2
    False  False  False      56
                  True      283
           True   False    1279
                  True     5882
    True   False  False      24
                  True       90
           True   False     429
                  True     1957
    Name: value, dtype: int64

Then::

    >>> from upsetplot import plot
    >>> plot(example)  # doctest: +SKIP
    >>> from matplotlib import pyplot
    >>> pyplot.show()  # doctest: +SKIP

makes:

.. image:: http://upsetplot.readthedocs.io/en/latest/_images/sphx_glr_plot_generated_001.png
   :target: ../auto_examples/plot_generated.html

And you can save the image in various formats::

    >>> pyplot.savefig("/path/to/myplot.pdf")  # doctest: +SKIP
    >>> pyplot.savefig("/path/to/myplot.png")  # doctest: +SKIP

This plot shows the cardinality of every category combination seen in our data.
The leftmost column counts items absent from any category. The next three
columns count items only in ``cat1``, ``cat2`` and ``cat3`` respectively, with
following columns showing cardinalities for items in each combination of
exactly two named sets. The rightmost column counts items in all three sets.

Rotation
........

We call the above plot style "horizontal" because the category intersections
are presented from left to right.  `Vertical plots
<http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_vertical.html>`__
are also supported!

.. image:: http://upsetplot.readthedocs.io/en/latest/_images/sphx_glr_plot_vertical_001.png
   :target: http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_vertical.html

Distributions
.............

Providing a DataFrame rather than a Series as input allows us to expressively
`plot the distribution of variables
<http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_diabetes.html>`__
in each subset.

.. image:: http://upsetplot.readthedocs.io/en/latest/_images/sphx_glr_plot_diabetes_001.png
   :target: http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_diabetes.html

Loading datasets
................

While the dataset above is randomly generated, you can prepare your own dataset
for input to upsetplot.  A helpful tool is `from_memberships`, which allows
us to reconstruct the example above by indicating each data point's category
membership::

    >>> from upsetplot import from_memberships
    >>> example = from_memberships(
    ...     [[],
    ...      ['cat2'],
    ...      ['cat1'],
    ...      ['cat1', 'cat2'],
    ...      ['cat0'],
    ...      ['cat0', 'cat2'],
    ...      ['cat0', 'cat1'],
    ...      ['cat0', 'cat1', 'cat2'],
    ...      ],
    ...      data=[56, 283, 1279, 5882, 24, 90, 429, 1957]
    ... )
    >>> example
    cat0   cat1   cat2
    False  False  False      56
                  True      283
           True   False    1279
                  True     5882
    True   False  False      24
                  True       90
           True   False     429
                  True     1957
    dtype: int64

See also `from_contents`, another way to describe categorised data, and
`from_indicators` which allows each category to be indicated by a column in
the data frame (or a function of the column's data such as whether it is a
missing value).

Installation
------------

To install the library, you can use `pip`::

    $ pip install upsetplot

Installation requires:

* pandas
* matplotlib >= 2.0
* seaborn to use `UpSet.add_catplot`

It should then be possible to::

    >>> import upsetplot

in Python.

Why an alternative to py-upset?
-------------------------------

Probably for petty reasons. It appeared `py-upset
<https://github.com/ImSoErgodic/py-upset>`_ was not being maintained.  Its
input format was undocumented, inefficient and, IMO, inappropriate.  It did not
facilitate showing plots of each subset's distribution as in Lex et al's work
introducing UpSet plots. Nor did it include the horizontal bar plots
illustrated there. It did not support Python 2. I decided it would be easier to
construct a cleaner version than to fix it.

References
----------

.. [Lex2014] Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, Hanspeter Pfister,
   *UpSet: Visualization of Intersecting Sets*,
   IEEE Transactions on Visualization and Computer Graphics (InfoVis '14), vol. 20, no. 12, pp. 1983–1992, 2014.
   doi: `doi.org/10.1109/TVCG.2014.2346248 <https://doi.org/10.1109/TVCG.2014.2346248>`_


.. |py-versions| image:: https://img.shields.io/pypi/pyversions/upsetplot.svg
    :alt: Python versions supported

.. |version| image:: https://badge.fury.io/py/UpSetPlot.svg
    :alt: Latest version on PyPi
    :target: https://badge.fury.io/py/UpSetPlot

.. |build| image:: https://github.com/jnothman/upsetplot/actions/workflows/test.yml/badge.svg
    :alt: Github Workflows CI build status
    :scale: 100%
    :target: https://github.com/jnothman/UpSetPlot/actions/workflows/test.yml

.. |issues| image:: https://img.shields.io/github/issues/jnothman/UpSetPlot.svg
    :alt: Issue tracker
    :target: https://github.com/jnothman/UpSetPlot

.. |coverage| image:: https://coveralls.io/repos/github/jnothman/UpSetPlot/badge.svg
    :alt: Test coverage
    :target: https://coveralls.io/github/jnothman/UpSetPlot

.. |docs| image:: https://readthedocs.org/projects/upsetplot/badge/?version=latest
     :alt: Documentation Status
     :scale: 100%
     :target: https://upsetplot.readthedocs.io/en/latest/?badge=latest

.. |licence| image:: https://img.shields.io/badge/Licence-BSD-blue.svg
     :target: https://opensource.org/licenses/BSD-3-Clause

            

Raw data

            {
    "_id": null,
    "home_page": "https://upsetplot.readthedocs.io",
    "name": "UpSetPlot",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Joel Nothman",
    "author_email": "joel.nothman@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/fb/63/a5a1628898729bcfb67ea785a152e424d78e0f57dfbba46213a652532d6a/UpSetPlot-0.9.0.tar.gz",
    "platform": null,
    "description": "UpSetPlot documentation\n============================\n\n|version| |licence| |py-versions|\n\n|issues| |build| |docs| |coverage|\n\nThis is another Python implementation of UpSet plots by Lex et al. [Lex2014]_.\nUpSet plots are used to visualise set overlaps; like Venn diagrams but\nmore readable. Documentation is at https://upsetplot.readthedocs.io.\n\nThis ``upsetplot`` library tries to provide a simple interface backed by an\nextensible, object-oriented design.\n\nThere are many ways to represent the categorisation of data, as covered in\nour `Data Format Guide <https://upsetplot.readthedocs.io/en/stable/formats.html>`_.\n\nOur internal input format uses a `pandas.Series` containing counts\ncorresponding to subset sizes, where each subset is an intersection of named\ncategories.  The index of the Series indicates which rows pertain to which\ncategories, by having multiple boolean indices, like ``example`` in the\nfollowing::\n\n    >>> from upsetplot import generate_counts\n    >>> example = generate_counts()\n    >>> example\n    cat0   cat1   cat2\n    False  False  False      56\n                  True      283\n           True   False    1279\n                  True     5882\n    True   False  False      24\n                  True       90\n           True   False     429\n                  True     1957\n    Name: value, dtype: int64\n\nThen::\n\n    >>> from upsetplot import plot\n    >>> plot(example)  # doctest: +SKIP\n    >>> from matplotlib import pyplot\n    >>> pyplot.show()  # doctest: +SKIP\n\nmakes:\n\n.. image:: http://upsetplot.readthedocs.io/en/latest/_images/sphx_glr_plot_generated_001.png\n   :target: ../auto_examples/plot_generated.html\n\nAnd you can save the image in various formats::\n\n    >>> pyplot.savefig(\"/path/to/myplot.pdf\")  # doctest: +SKIP\n    >>> pyplot.savefig(\"/path/to/myplot.png\")  # doctest: +SKIP\n\nThis plot shows the cardinality of every category combination seen in our data.\nThe leftmost column counts items absent from any category. The next three\ncolumns count items only in ``cat1``, ``cat2`` and ``cat3`` respectively, with\nfollowing columns showing cardinalities for items in each combination of\nexactly two named sets. The rightmost column counts items in all three sets.\n\nRotation\n........\n\nWe call the above plot style \"horizontal\" because the category intersections\nare presented from left to right.  `Vertical plots\n<http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_vertical.html>`__\nare also supported!\n\n.. image:: http://upsetplot.readthedocs.io/en/latest/_images/sphx_glr_plot_vertical_001.png\n   :target: http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_vertical.html\n\nDistributions\n.............\n\nProviding a DataFrame rather than a Series as input allows us to expressively\n`plot the distribution of variables\n<http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_diabetes.html>`__\nin each subset.\n\n.. image:: http://upsetplot.readthedocs.io/en/latest/_images/sphx_glr_plot_diabetes_001.png\n   :target: http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_diabetes.html\n\nLoading datasets\n................\n\nWhile the dataset above is randomly generated, you can prepare your own dataset\nfor input to upsetplot.  A helpful tool is `from_memberships`, which allows\nus to reconstruct the example above by indicating each data point's category\nmembership::\n\n    >>> from upsetplot import from_memberships\n    >>> example = from_memberships(\n    ...     [[],\n    ...      ['cat2'],\n    ...      ['cat1'],\n    ...      ['cat1', 'cat2'],\n    ...      ['cat0'],\n    ...      ['cat0', 'cat2'],\n    ...      ['cat0', 'cat1'],\n    ...      ['cat0', 'cat1', 'cat2'],\n    ...      ],\n    ...      data=[56, 283, 1279, 5882, 24, 90, 429, 1957]\n    ... )\n    >>> example\n    cat0   cat1   cat2\n    False  False  False      56\n                  True      283\n           True   False    1279\n                  True     5882\n    True   False  False      24\n                  True       90\n           True   False     429\n                  True     1957\n    dtype: int64\n\nSee also `from_contents`, another way to describe categorised data, and\n`from_indicators` which allows each category to be indicated by a column in\nthe data frame (or a function of the column's data such as whether it is a\nmissing value).\n\nInstallation\n------------\n\nTo install the library, you can use `pip`::\n\n    $ pip install upsetplot\n\nInstallation requires:\n\n* pandas\n* matplotlib >= 2.0\n* seaborn to use `UpSet.add_catplot`\n\nIt should then be possible to::\n\n    >>> import upsetplot\n\nin Python.\n\nWhy an alternative to py-upset?\n-------------------------------\n\nProbably for petty reasons. It appeared `py-upset\n<https://github.com/ImSoErgodic/py-upset>`_ was not being maintained.  Its\ninput format was undocumented, inefficient and, IMO, inappropriate.  It did not\nfacilitate showing plots of each subset's distribution as in Lex et al's work\nintroducing UpSet plots. Nor did it include the horizontal bar plots\nillustrated there. It did not support Python 2. I decided it would be easier to\nconstruct a cleaner version than to fix it.\n\nReferences\n----------\n\n.. [Lex2014] Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, Hanspeter Pfister,\n   *UpSet: Visualization of Intersecting Sets*,\n   IEEE Transactions on Visualization and Computer Graphics (InfoVis '14), vol. 20, no. 12, pp. 1983\u20131992, 2014.\n   doi: `doi.org/10.1109/TVCG.2014.2346248 <https://doi.org/10.1109/TVCG.2014.2346248>`_\n\n\n.. |py-versions| image:: https://img.shields.io/pypi/pyversions/upsetplot.svg\n    :alt: Python versions supported\n\n.. |version| image:: https://badge.fury.io/py/UpSetPlot.svg\n    :alt: Latest version on PyPi\n    :target: https://badge.fury.io/py/UpSetPlot\n\n.. |build| image:: https://github.com/jnothman/upsetplot/actions/workflows/test.yml/badge.svg\n    :alt: Github Workflows CI build status\n    :scale: 100%\n    :target: https://github.com/jnothman/UpSetPlot/actions/workflows/test.yml\n\n.. |issues| image:: https://img.shields.io/github/issues/jnothman/UpSetPlot.svg\n    :alt: Issue tracker\n    :target: https://github.com/jnothman/UpSetPlot\n\n.. |coverage| image:: https://coveralls.io/repos/github/jnothman/UpSetPlot/badge.svg\n    :alt: Test coverage\n    :target: https://coveralls.io/github/jnothman/UpSetPlot\n\n.. |docs| image:: https://readthedocs.org/projects/upsetplot/badge/?version=latest\n     :alt: Documentation Status\n     :scale: 100%\n     :target: https://upsetplot.readthedocs.io/en/latest/?badge=latest\n\n.. |licence| image:: https://img.shields.io/badge/Licence-BSD-blue.svg\n     :target: https://opensource.org/licenses/BSD-3-Clause\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "Draw Lex et al.'s UpSet plots with Pandas and Matplotlib",
    "version": "0.9.0",
    "project_urls": {
        "Homepage": "https://upsetplot.readthedocs.io"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fb63a5a1628898729bcfb67ea785a152e424d78e0f57dfbba46213a652532d6a",
                "md5": "037b04ce411badee12a09ea84969daf2",
                "sha256": "95b76ac38c624c9dfb1eca1de1a37e30e07e83678b1c57839c943184247b8592"
            },
            "downloads": -1,
            "filename": "UpSetPlot-0.9.0.tar.gz",
            "has_sig": false,
            "md5_digest": "037b04ce411badee12a09ea84969daf2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 23797,
            "upload_time": "2023-12-31T01:07:59",
            "upload_time_iso_8601": "2023-12-31T01:07:59.185237Z",
            "url": "https://files.pythonhosted.org/packages/fb/63/a5a1628898729bcfb67ea785a152e424d78e0f57dfbba46213a652532d6a/UpSetPlot-0.9.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-31 01:07:59",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "upsetplot"
}
        
Elapsed time: 0.21523s