matplotlib-set-diagrams


Namematplotlib-set-diagrams JSON
Version 0.0.2 PyPI version JSON
download
home_pageNone
SummaryPython drawing utilities for Venn and Euler diagrams visualizing the relationships between two or more sets.
upload_time2024-05-10 15:36:18
maintainerNone
docs_urlNone
authorNone
requires_python>3.6
licenseGNU General Public License v3 (GPLv3)
keywords matplotlib euler diagram venn diagram set visualisation visualization
VCS
bugtrack_url
requirements numpy scipy matplotlib shapely wordcloud
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Matplotlib Set Diagrams

*Draw Euler diagrams and Venn diagrams with Matplotlib.*

[Euler](https://en.wikipedia.org/wiki/Euler_diagram) and [Venn](https://en.wikipedia.org/wiki/Venn_diagram) diagrams are used to visualise the relationships between sets. Both typically employ circles to represent sets, and areas where two circles overlap represent subsets common to both supersets. Venn diagrams show all possible relationships of inclusion and exclusion between two or more sets. In Euler diagrams, the area corresponding to each subset is scaled according to the size of the subset. If a subset doesn't exist, the corresponding area doesn't exist.

This library was inspired by [`matplotlib-venn`](https://github.com/konstantint/matplotlib-venn/), but developed independently. It adds support for creating set diagrams for an arbitrary number of sets, visualising set and subset contents, and implements an improved layout engine. For more details, [see below](https://github.com/paulbrodersen/matplotlib_set_diagrams?tab=readme-ov-file#alternative-python-libraries). This library also improves on and replaces [`matplotlib_venn_wordcloud`](https://github.com/paulbrodersen/matplotlib_venn_wordcloud).


## Installation

``` shell
pip install matplotlib_set_diagrams
```


## Documentation

Numerous tutorials, code examples, and a complete documentation of the API can be found on [ReadTheDocs](https://matplotlib-set-diagrams.readthedocs.io/en/latest/index.html).


## Quickstart

This section is for the impatient. For more comprehensive, step-by-step guides, please consult the [documentation](https://matplotlib-set-diagrams.readthedocs.io/en/latest/sphinx_gallery_output/index.html).

![Quickstart output figure](./images/quickstart.png)

``` python
import matplotlib.pyplot as plt

from matplotlib_set_diagrams import EulerDiagram, VennDiagram

fig, axes = plt.subplots(2, 4, figsize=(15, 5))

for ii, SetDiagram in enumerate([EulerDiagram, VennDiagram]):

    # Initialise from a list of sets:
    SetDiagram.from_sets(
        [
            {"a", "b", "c", "d", "e"},
            {"e", "f", "g"},
        ],
        ax=axes[ii, 0])

    # Alternatively, initialise directly from pre-computed subset sizes.
    SetDiagram(
        {
            (1, 0) : 4, # {"a", "b", "c", "d"}
            (0, 1) : 2, # {"f", "g"}
            (1, 1) : 1, # {"e"}
        },
        ax=axes[ii, 1])

    # Visualise subset items as word clouds:
    text_1 = """Lorem ipsum dolor sit amet, consectetur adipiscing elit,
    sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
    enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
    ut aliquip ex ea commodo consequat."""

    text_2 = """Duis aute irure dolor in reprehenderit in voluptate velit
    esse cillum dolore eu fugiat nulla pariatur. Lorem ipsum dolor sit
    amet."""

    def word_tokenize(text):
        """Break a string into its constituent words, and convert the words
        into their 'standard' form (tokens).

        The procedure below is a poor-man's tokenization.
        Consider using the Natural Language Toolkit (NLTK) instead:

        >>> import nltk; words = nltk.word_tokenize(text)

        """
        # get a word list
        words = text.split(' ')
        # remove non alphanumeric characters
        words = [''.join(ch for ch in word if ch.isalnum()) for word in words]
        # convert to all lower case
        words = [word.lower() for word in words]
        return words

    # Tokenize strings.
    sets = [set(word_tokenize(text)) for text in [text_1, text_2]]

    SetDiagram.as_wordcloud(sets, ax=axes[ii, 2])

    # The implementation generalises to any number of sets.
    # However, exact solutions are only guaranteed for two given sets,
    # and the more sets are given, the less likely it becomes that
    # the optimisation procedure finds even an approximate solution.
    # Furthermore, above four or five sets, diagrams become unintelligible.
    # Here an example of a 4-way set diagram:
    SetDiagram(
        {
            (1, 0, 0, 0) : 4.0,
            (0, 1, 0, 0) : 3.0,
            (0, 0, 1, 0) : 2.0,
            (0, 0, 0, 1) : 1.0,
            (1, 1, 0, 0) : 0.9,
            (1, 0, 1, 0) : 0.8,
            (1, 0, 0, 1) : 0.7,
            (0, 1, 1, 0) : 0.6,
            (0, 1, 0, 1) : 0.5,
            (0, 0, 1, 1) : 0.4,
            (1, 1, 1, 0) : 0.3,
            (1, 1, 0, 1) : 0.25,
            (1, 0, 1, 1) : 0.2,
            (0, 1, 1, 1) : 0.15,
            (1, 1, 1, 1) : 0.1,
        },
    ax=axes[ii, 3])

    # set row titles
    axes[ii, 0].annotate(
        SetDiagram.__name__,
        xy         = (0, 0.5),
        xycoords   = 'axes fraction',
        xytext     = (-10, 0),
        textcoords = "offset points",
        ha         = 'right',
        va         = 'center',
        fontsize   = 'large',
        fontweight = 'bold',
    )

fig.tight_layout()
plt.show()

```


## Alternative python libraries

[`matplotlib-venn`](https://github.com/konstantint/matplotlib-venn/): the inspiration for this library. However, `matplotlib-venn` has some significant drawbacks:

1. [It only produces two-way and three-way set diagrams.](https://github.com/konstantint/matplotlib-venn/issues/15)
2. [There is no support for visualising set contents](https://github.com/konstantint/matplotlib-venn/issues/41) other than external libraries such as my [`matplotlib_venn_wordcloud`](https://github.com/paulbrodersen/matplotlib_venn_wordcloud).
3. The layout engine often generates incorrect results for three-way set diagrams, and a lot of issues on the matplotlib-venn issue tracker boil down to this problem. Consider the example below, [adapted from issue #34](https://github.com/konstantint/matplotlib-venn/issues/34):

  - Subset (1, 0, 0) / abC / (A - B - C) is annotated with the label for subset (1, 1, 0) / ABc / (A & B - C).
  - Subset (1, 1, 0) / ABc / (A & B - C) is not visualised at all.

![matplotlib-venn / matplotlib_set_diagrams comparison](./images/matplotlib_venn_issues.png)

``` python
import matplotlib.pyplot as plt

from matplotlib_set_diagrams import EulerDiagram
from matplotlib_venn import venn3

fig, axes = plt.subplots(1, 2, figsize=(6, 3))

subset_sizes = {
    (1, 0, 0) : 167, # Abc in matplotlib-venn nomenclature
    (0, 1, 0) : 7,   # aBc
    (0, 0, 1) : 25,  # abC
    (1, 1, 0) : 41,  # ABc
    (0, 1, 1) : 174, # aBC
    (1, 0, 1) : 171, # AbC
    (1, 1, 1) : 51,  # ABC
}

axes[0].set_title("matplotlib-venn")
print(tuple(subset_sizes.values()))
# (167, 7, 25, 41, 174, 171, 51)
venn3(tuple(subset_sizes.values()), ax=axes[0])

axes[1].set_title("matplotlib_set_diagrams")
EulerDiagram(subset_sizes, ax=axes[1])

plt.show()
```

[`pyvenn`](https://github.com/tctianchi/pyvenn): Uses pre-built images to produce Venn diagrams for up to 6 sets. The visualisations are hence not area-proportional; only the subset labels are adjusted based on user input.

![pyvenn example visualisation](https://raw.githubusercontent.com/wiki/tctianchi/pyvenn/venn6.png)

[`supervenn`](https://github.com/gecko984/supervenn): Produces area-proportional, Euler diagram-equivalent visualisations, that are, however, not Euler or Venn diagrams. Generalises well to arbitrary numbers of sets and thus easily the superior choice for diagnostic purposes (its intended use-case). However, the produced visualisations are more difficult to communicate to the unfamiliar reader, and thus probably less appropriate for publications.

![supervenn example visualisation](./images/supervenn.png)

``` python
from supervenn import supervenn

sets = [{1, 2, 3, 4}, {3, 4, 5}, {1, 6, 7, 8}]
labels = ['alice', 'bob', 'third party']
supervenn(sets, labels)
plt.show()
```

## Contributing & Support

If you get stuck and have a question that is not covered in the documentation, please raise an issue on the [issue tracker](https://github.com/paulbrodersen/matplotlib_set_diagrams/issues).
If applicable, make a sketch of the desired result.
If you submit a bug report, please make sure to include the complete error trace. Include any relevant code and data in a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example).
Pull requests are always welcome.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "matplotlib-set-diagrams",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">3.6",
    "maintainer_email": null,
    "keywords": "matplotlib, Euler diagram, Venn diagram, set, visualisation, visualization",
    "author": null,
    "author_email": "Paul Brodersen <paulbrodersen+matplotlib_set_diagrams@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/4b/af/128507a975f41b7a38245c51d618bf0c3d49fdfca3957db08ac6f3596d61/matplotlib_set_diagrams-0.0.2.tar.gz",
    "platform": null,
    "description": "# Matplotlib Set Diagrams\n\n*Draw Euler diagrams and Venn diagrams with Matplotlib.*\n\n[Euler](https://en.wikipedia.org/wiki/Euler_diagram) and [Venn](https://en.wikipedia.org/wiki/Venn_diagram) diagrams are used to visualise the relationships between sets. Both typically employ circles to represent sets, and areas where two circles overlap represent subsets common to both supersets. Venn diagrams show all possible relationships of inclusion and exclusion between two or more sets. In Euler diagrams, the area corresponding to each subset is scaled according to the size of the subset. If a subset doesn't exist, the corresponding area doesn't exist.\n\nThis library was inspired by [`matplotlib-venn`](https://github.com/konstantint/matplotlib-venn/), but developed independently. It adds support for creating set diagrams for an arbitrary number of sets, visualising set and subset contents, and implements an improved layout engine. For more details, [see below](https://github.com/paulbrodersen/matplotlib_set_diagrams?tab=readme-ov-file#alternative-python-libraries). This library also improves on and replaces [`matplotlib_venn_wordcloud`](https://github.com/paulbrodersen/matplotlib_venn_wordcloud).\n\n\n## Installation\n\n``` shell\npip install matplotlib_set_diagrams\n```\n\n\n## Documentation\n\nNumerous tutorials, code examples, and a complete documentation of the API can be found on [ReadTheDocs](https://matplotlib-set-diagrams.readthedocs.io/en/latest/index.html).\n\n\n## Quickstart\n\nThis section is for the impatient. For more comprehensive, step-by-step guides, please consult the [documentation](https://matplotlib-set-diagrams.readthedocs.io/en/latest/sphinx_gallery_output/index.html).\n\n![Quickstart output figure](./images/quickstart.png)\n\n``` python\nimport matplotlib.pyplot as plt\n\nfrom matplotlib_set_diagrams import EulerDiagram, VennDiagram\n\nfig, axes = plt.subplots(2, 4, figsize=(15, 5))\n\nfor ii, SetDiagram in enumerate([EulerDiagram, VennDiagram]):\n\n    # Initialise from a list of sets:\n    SetDiagram.from_sets(\n        [\n            {\"a\", \"b\", \"c\", \"d\", \"e\"},\n            {\"e\", \"f\", \"g\"},\n        ],\n        ax=axes[ii, 0])\n\n    # Alternatively, initialise directly from pre-computed subset sizes.\n    SetDiagram(\n        {\n            (1, 0) : 4, # {\"a\", \"b\", \"c\", \"d\"}\n            (0, 1) : 2, # {\"f\", \"g\"}\n            (1, 1) : 1, # {\"e\"}\n        },\n        ax=axes[ii, 1])\n\n    # Visualise subset items as word clouds:\n    text_1 = \"\"\"Lorem ipsum dolor sit amet, consectetur adipiscing elit,\n    sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut\n    enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi\n    ut aliquip ex ea commodo consequat.\"\"\"\n\n    text_2 = \"\"\"Duis aute irure dolor in reprehenderit in voluptate velit\n    esse cillum dolore eu fugiat nulla pariatur. Lorem ipsum dolor sit\n    amet.\"\"\"\n\n    def word_tokenize(text):\n        \"\"\"Break a string into its constituent words, and convert the words\n        into their 'standard' form (tokens).\n\n        The procedure below is a poor-man's tokenization.\n        Consider using the Natural Language Toolkit (NLTK) instead:\n\n        >>> import nltk; words = nltk.word_tokenize(text)\n\n        \"\"\"\n        # get a word list\n        words = text.split(' ')\n        # remove non alphanumeric characters\n        words = [''.join(ch for ch in word if ch.isalnum()) for word in words]\n        # convert to all lower case\n        words = [word.lower() for word in words]\n        return words\n\n    # Tokenize strings.\n    sets = [set(word_tokenize(text)) for text in [text_1, text_2]]\n\n    SetDiagram.as_wordcloud(sets, ax=axes[ii, 2])\n\n    # The implementation generalises to any number of sets.\n    # However, exact solutions are only guaranteed for two given sets,\n    # and the more sets are given, the less likely it becomes that\n    # the optimisation procedure finds even an approximate solution.\n    # Furthermore, above four or five sets, diagrams become unintelligible.\n    # Here an example of a 4-way set diagram:\n    SetDiagram(\n        {\n            (1, 0, 0, 0) : 4.0,\n            (0, 1, 0, 0) : 3.0,\n            (0, 0, 1, 0) : 2.0,\n            (0, 0, 0, 1) : 1.0,\n            (1, 1, 0, 0) : 0.9,\n            (1, 0, 1, 0) : 0.8,\n            (1, 0, 0, 1) : 0.7,\n            (0, 1, 1, 0) : 0.6,\n            (0, 1, 0, 1) : 0.5,\n            (0, 0, 1, 1) : 0.4,\n            (1, 1, 1, 0) : 0.3,\n            (1, 1, 0, 1) : 0.25,\n            (1, 0, 1, 1) : 0.2,\n            (0, 1, 1, 1) : 0.15,\n            (1, 1, 1, 1) : 0.1,\n        },\n    ax=axes[ii, 3])\n\n    # set row titles\n    axes[ii, 0].annotate(\n        SetDiagram.__name__,\n        xy         = (0, 0.5),\n        xycoords   = 'axes fraction',\n        xytext     = (-10, 0),\n        textcoords = \"offset points\",\n        ha         = 'right',\n        va         = 'center',\n        fontsize   = 'large',\n        fontweight = 'bold',\n    )\n\nfig.tight_layout()\nplt.show()\n\n```\n\n\n## Alternative python libraries\n\n[`matplotlib-venn`](https://github.com/konstantint/matplotlib-venn/): the inspiration for this library. However, `matplotlib-venn` has some significant drawbacks:\n\n1. [It only produces two-way and three-way set diagrams.](https://github.com/konstantint/matplotlib-venn/issues/15)\n2. [There is no support for visualising set contents](https://github.com/konstantint/matplotlib-venn/issues/41) other than external libraries such as my [`matplotlib_venn_wordcloud`](https://github.com/paulbrodersen/matplotlib_venn_wordcloud).\n3. The layout engine often generates incorrect results for three-way set diagrams, and a lot of issues on the matplotlib-venn issue tracker boil down to this problem. Consider the example below, [adapted from issue #34](https://github.com/konstantint/matplotlib-venn/issues/34):\n\n  - Subset (1, 0, 0) / abC / (A - B - C) is annotated with the label for subset (1, 1, 0) / ABc / (A & B - C).\n  - Subset (1, 1, 0) / ABc / (A & B - C) is not visualised at all.\n\n![matplotlib-venn / matplotlib_set_diagrams comparison](./images/matplotlib_venn_issues.png)\n\n``` python\nimport matplotlib.pyplot as plt\n\nfrom matplotlib_set_diagrams import EulerDiagram\nfrom matplotlib_venn import venn3\n\nfig, axes = plt.subplots(1, 2, figsize=(6, 3))\n\nsubset_sizes = {\n    (1, 0, 0) : 167, # Abc in matplotlib-venn nomenclature\n    (0, 1, 0) : 7,   # aBc\n    (0, 0, 1) : 25,  # abC\n    (1, 1, 0) : 41,  # ABc\n    (0, 1, 1) : 174, # aBC\n    (1, 0, 1) : 171, # AbC\n    (1, 1, 1) : 51,  # ABC\n}\n\naxes[0].set_title(\"matplotlib-venn\")\nprint(tuple(subset_sizes.values()))\n# (167, 7, 25, 41, 174, 171, 51)\nvenn3(tuple(subset_sizes.values()), ax=axes[0])\n\naxes[1].set_title(\"matplotlib_set_diagrams\")\nEulerDiagram(subset_sizes, ax=axes[1])\n\nplt.show()\n```\n\n[`pyvenn`](https://github.com/tctianchi/pyvenn): Uses pre-built images to produce Venn diagrams for up to 6 sets. The visualisations are hence not area-proportional; only the subset labels are adjusted based on user input.\n\n![pyvenn example visualisation](https://raw.githubusercontent.com/wiki/tctianchi/pyvenn/venn6.png)\n\n[`supervenn`](https://github.com/gecko984/supervenn): Produces area-proportional, Euler diagram-equivalent visualisations, that are, however, not Euler or Venn diagrams. Generalises well to arbitrary numbers of sets and thus easily the superior choice for diagnostic purposes (its intended use-case). However, the produced visualisations are more difficult to communicate to the unfamiliar reader, and thus probably less appropriate for publications.\n\n![supervenn example visualisation](./images/supervenn.png)\n\n``` python\nfrom supervenn import supervenn\n\nsets = [{1, 2, 3, 4}, {3, 4, 5}, {1, 6, 7, 8}]\nlabels = ['alice', 'bob', 'third party']\nsupervenn(sets, labels)\nplt.show()\n```\n\n## Contributing & Support\n\nIf you get stuck and have a question that is not covered in the documentation, please raise an issue on the [issue tracker](https://github.com/paulbrodersen/matplotlib_set_diagrams/issues).\nIf applicable, make a sketch of the desired result.\nIf you submit a bug report, please make sure to include the complete error trace. Include any relevant code and data in a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example).\nPull requests are always welcome.\n",
    "bugtrack_url": null,
    "license": "GNU General Public License v3 (GPLv3)",
    "summary": "Python drawing utilities for Venn and Euler diagrams visualizing the relationships between two or more sets.",
    "version": "0.0.2",
    "project_urls": {
        "Documentation": "https://matplotlib-set-diagrams.readthedocs.io/en/latest/",
        "Repository": "https://github.com/paulbrodersen/matplotlib_set_diagrams"
    },
    "split_keywords": [
        "matplotlib",
        " euler diagram",
        " venn diagram",
        " set",
        " visualisation",
        " visualization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c4049635fa5c9de55e8ddda3f5f256dfb7de86941a658bfb11a5e450e6ef73c3",
                "md5": "7ba52ce2a8b01b0d96a4bdef8e893b58",
                "sha256": "5cb9a4979f730318b01043b39f8562e93257f18ca12db8bf35b35305aaa3a5c8"
            },
            "downloads": -1,
            "filename": "matplotlib_set_diagrams-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7ba52ce2a8b01b0d96a4bdef8e893b58",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">3.6",
            "size": 26834,
            "upload_time": "2024-05-10T15:36:16",
            "upload_time_iso_8601": "2024-05-10T15:36:16.576633Z",
            "url": "https://files.pythonhosted.org/packages/c4/04/9635fa5c9de55e8ddda3f5f256dfb7de86941a658bfb11a5e450e6ef73c3/matplotlib_set_diagrams-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4baf128507a975f41b7a38245c51d618bf0c3d49fdfca3957db08ac6f3596d61",
                "md5": "fe0e853ebb7174485b2294e6629e9f53",
                "sha256": "09d9054a85538423b593b6698c0568063c696d6bef4f77ad01a8c2e0e1a3540d"
            },
            "downloads": -1,
            "filename": "matplotlib_set_diagrams-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "fe0e853ebb7174485b2294e6629e9f53",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">3.6",
            "size": 30086,
            "upload_time": "2024-05-10T15:36:18",
            "upload_time_iso_8601": "2024-05-10T15:36:18.170534Z",
            "url": "https://files.pythonhosted.org/packages/4b/af/128507a975f41b7a38245c51d618bf0c3d49fdfca3957db08ac6f3596d61/matplotlib_set_diagrams-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-10 15:36:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "paulbrodersen",
    "github_project": "matplotlib_set_diagrams",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": []
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "shapely",
            "specs": []
        },
        {
            "name": "wordcloud",
            "specs": []
        }
    ],
    "lcname": "matplotlib-set-diagrams"
}
        
Elapsed time: 0.26915s