visions


Namevisions JSON
Version 0.7.6 PyPI version JSON
download
home_pagehttps://github.com/dylan-profiler/visions
SummaryVisions
upload_time2024-02-06 21:15:50
maintainer
docs_urlNone
authorDylan Profiler
requires_python>=3.8
licenseBSD License
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <img src="images/visions.png" width="600px"><br>
  <i>And these visions of data types, they kept us up past the dawn.</i> 
</div>
<p align="center">
  <a href="https://pypi.org/project/visions/">
    <img src="https://pepy.tech/badge/visions" />
  </a>
  <a href="https://pypi.org/project/visions/">
    <img src="https://pepy.tech/badge/visions/month" />
  </a>
  <a href="https://pypi.org/project/visions/">
    <img src="https://img.shields.io/pypi/pyversions/visions" />
  </a>
  <a href="https://pypi.org/project/visions/">
    <img src="https://badge.fury.io/py/visions.svg" />
  </a>
  <a href="https://doi.org/10.21105/joss.02145">
    <img src="https://joss.theoj.org/papers/10.21105/joss.02145/status.svg" />
  </a>
  <a href="https://mybinder.org/v2/gh/dylan-profiler/visions/master">
    <img src="https://mybinder.org/badge_logo.svg" />
  </a>
</p>

# The Semantic Data Library

``Visions`` provides a set of tools for defining and using *semantic* data types.

- [x] [Semantic type](https://dylan-profiler.github.io/visions/visions/getting_started/concepts.html#types) detection &
  inference on sequence data.

- [x] Automated data processing

- [x] Completely customizable. `Visions` makes it easy to build and modify semantic data types for domain specific
  purposes

- [x] Out of the box support for
  multiple [backend implementations](https://github.com/dylan-profiler/visions#supported-frameworks) including pandas,
  spark, numpy, and python

- [x] A robust set
  of [default types and typesets](https://dylan-profiler.github.io/visions/visions/getting_started/usage/defaults.html)
  covering the most common use cases.

Check out the complete
documentation [here](https://dylan-profiler.github.io/visions/visions/getting_started/introduction.html).

## Installation

Source code is available on [github](https://github.com/dylan-profiler/visions) and binary installers via pip.

```
# Pip
pip install visions
```

Complete installation instructions (including extras) are available in
the [docs](https://dylan-profiler.github.io/visions/visions/getting_started/installation.html).

## Quick Start Guide

If you want to play immediately check out the examples folder
on [![](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dylan-profiler/visions/master). Otherwise,
let's get some data

```python
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
df.head(2)
```

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>PassengerId</th>
      <th>Survived</th>
      <th>Pclass</th>
      <th>Name</th>
      <th>Sex</th>
      <th>Age</th>
      <th>SibSp</th>
      <th>Parch</th>
      <th>Ticket</th>
      <th>Fare</th>
      <th>Cabin</th>
      <th>Embarked</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>0</td>
      <td>3</td>
      <td>Braund, Mr. Owen Harris</td>
      <td>male</td>
      <td>22.0</td>
      <td>1</td>
      <td>0</td>
      <td>A/5 21171</td>
      <td>7.2500</td>
      <td>NaN</td>
      <td>S</td>
    </tr>
    <tr>
      <td>2</td>
      <td>1</td>
      <td>1</td>
      <td>Cumings, Mrs. John Bradley (Florence Briggs Thayer)</td>
      <td>female</td>
      <td>38.0</td>
      <td>1</td>
      <td>0</td>
      <td>PC 17599</td>
      <td>71.2833</td>
      <td>C85</td>
      <td>C</td>
    </tr>
  </tbody>
</table>

The most important abstraction in `visions` are Types - these represent semantic notions about data. You have access to
a
range of well tested types like `Integer`, `Float`, and `Files` covering the most common software development use cases.
Types can be bundled together into typesets. Behind the scenes, `visions` builds a traversable graph for any collection
of types.

```python
from visions import types, typesets

# StandardSet is the basic builtin typeset
typeset = typesets.CompleteSet()
typeset.plot_graph()
```

![](https://dylan-profiler.github.io/visions/_images/typeset_complete_base.svg)
Note: Plots require pygraphviz to be [installed](https://pygraphviz.github.io/documentation/stable/install.html).

Because of the special relationship between types these graphs can be used to detect the type of your data or _infer_ a
more appropriate one.

```python
# Detection looks like this
typeset.detect_type(df)

# While inference looks like this
typeset.infer_type(df)

# Inference works well even if we monkey with the data, say by converting everything to strings
typeset.infer_type(df.astype(str))
>> {
    'PassengerId': Integer,
    'Survived': Integer,
    'Pclass': Integer,
    'Name': String,
    'Sex': String,
    'Age': Float,
    'SibSp': Integer,
    'Parch': Integer,
    'Ticket': String,
    'Fare': Float,
    'Cabin': String,
    'Embarked': String
}
```

`Visions` solves many of the most common problems working with tabular data for example, sequences of Integers are still
recognized as integers whether they have trailing decimal 0's from being cast to float, missing values, or something
else altogether. Much of this cleaning is performed automatically providing nicely cleaned and processed data as well.

```python
cleaned_df = typeset.cast_to_inferred(df)
```

This is only a small taste of everything visions can do
including [building your own](https://dylan-profiler.github.io/visions/visions/getting_started/extending.html) domain
specific types and typesets so please check out the [API](https://dylan-profiler.github.io/visions/visions/api.html)
documentation or the [examples/](https://github.com/dylan-profiler/visions/tree/develop/examples) directory for more
info!

## Supported frameworks

Thanks to its dispatch based implementation `Visions` is able to exploit framework specific capabilities offered by
libraries like pandas and spark. Currently it works with the following backends by default.

- [Pandas](https://github.com/pandas-dev/pandas) (feature complete)
- [Numpy](https://github.com/numpy/numpy) (boolean, complex, date time, float, integer, string, time deltas, string,
  objects)
- [Spark](https://github.com/apache/spark) (boolean, categorical, date, date time, float, integer, numeric, object,
  string)
- [Python](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range) (string, float, integer,
  date time, time delta, boolean, categorical, object, complex - other datatypes are untested)

If you're using pandas it will also take advantage of parallelization tools like
[swifter](https://github.com/jmcarpenter2/swifter) if available.

It also offers a simple annotation based API for registering new implementations as needed. For example, if you wished
to extend the categorical data type to include a Dask specific implementation you might do something like

```python
from visions.types.categorical import Categorical
from pandas.api import types as pdt
import dask


@Categorical.contains_op.register
def categorical_contains(series: dask.dataframe.Series, state: dict) -> bool:
    return pdt.is_categorical_dtype(series.dtype)
```

## Contributing and support

Contributions to `visions` are welcome. For more information, please visit the community
contributions [page](https://dylan-profiler.github.io/visions/visions/contributing/contributing.html) and join on us
on [slack](https://join.slack.com/t/dylan-profiling/shared_invite/zt-11c9blvpt-AqxXD5AMS9Q6CO7UUm~cRw). The
github [issues tracker](https://github.com/dylan-profiler/visions/issues/new/choose) is used for reporting bugs, feature
requests and support questions.

Also, please check out some of the other companies and packages using `visions` including:

* [pandas profiling](https://github.com/pandas-profiling/pandas-profiling)
* [Compress*io*](https://github.com/dylan-profiler/compressio)
* [Bitrook](https://www.bitrook.com/)

If you're currently using `visions` or would like to be featured here please let us know.

## Acknowledgements

This package is part of the [dylan-profiler](https://github.com/dylan-profiler)  project. The package is core component
of [pandas-profiling](https://github.com/pandas-profiling/pandas-profiling). More information can be
found [here](https://dylan-profiler.github.io/visions/visions/background/about.html>). This work was partially supported
by [SIDN Fonds](https://www.sidnfonds.nl/projecten/dylan-data-analysis-leveraging-automatisation).

![](https://github.com/dylan-profiler/visions/raw/master/images/SIDNfonds.png)



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/dylan-profiler/visions",
    "name": "visions",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "",
    "author": "Dylan Profiler",
    "author_email": "visions@ictopzee.nl",
    "download_url": "https://files.pythonhosted.org/packages/40/17/8ddcab3699d442a3a21c9859b5573a5b96ec19c51b85525653433bc28f5e/visions-0.7.6.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <img src=\"images/visions.png\" width=\"600px\"><br>\n  <i>And these visions of data types, they kept us up past the dawn.</i> \n</div>\n<p align=\"center\">\n  <a href=\"https://pypi.org/project/visions/\">\n    <img src=\"https://pepy.tech/badge/visions\" />\n  </a>\n  <a href=\"https://pypi.org/project/visions/\">\n    <img src=\"https://pepy.tech/badge/visions/month\" />\n  </a>\n  <a href=\"https://pypi.org/project/visions/\">\n    <img src=\"https://img.shields.io/pypi/pyversions/visions\" />\n  </a>\n  <a href=\"https://pypi.org/project/visions/\">\n    <img src=\"https://badge.fury.io/py/visions.svg\" />\n  </a>\n  <a href=\"https://doi.org/10.21105/joss.02145\">\n    <img src=\"https://joss.theoj.org/papers/10.21105/joss.02145/status.svg\" />\n  </a>\n  <a href=\"https://mybinder.org/v2/gh/dylan-profiler/visions/master\">\n    <img src=\"https://mybinder.org/badge_logo.svg\" />\n  </a>\n</p>\n\n# The Semantic Data Library\n\n``Visions`` provides a set of tools for defining and using *semantic* data types.\n\n- [x] [Semantic type](https://dylan-profiler.github.io/visions/visions/getting_started/concepts.html#types) detection &\n  inference on sequence data.\n\n- [x] Automated data processing\n\n- [x] Completely customizable. `Visions` makes it easy to build and modify semantic data types for domain specific\n  purposes\n\n- [x] Out of the box support for\n  multiple [backend implementations](https://github.com/dylan-profiler/visions#supported-frameworks) including pandas,\n  spark, numpy, and python\n\n- [x] A robust set\n  of [default types and typesets](https://dylan-profiler.github.io/visions/visions/getting_started/usage/defaults.html)\n  covering the most common use cases.\n\nCheck out the complete\ndocumentation [here](https://dylan-profiler.github.io/visions/visions/getting_started/introduction.html).\n\n## Installation\n\nSource code is available on [github](https://github.com/dylan-profiler/visions) and binary installers via pip.\n\n```\n# Pip\npip install visions\n```\n\nComplete installation instructions (including extras) are available in\nthe [docs](https://dylan-profiler.github.io/visions/visions/getting_started/installation.html).\n\n## Quick Start Guide\n\nIf you want to play immediately check out the examples folder\non [![](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dylan-profiler/visions/master). Otherwise,\nlet's get some data\n\n```python\nimport pandas as pd\n\ndf = pd.read_csv(\"https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv\")\ndf.head(2)\n```\n\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th>PassengerId</th>\n      <th>Survived</th>\n      <th>Pclass</th>\n      <th>Name</th>\n      <th>Sex</th>\n      <th>Age</th>\n      <th>SibSp</th>\n      <th>Parch</th>\n      <th>Ticket</th>\n      <th>Fare</th>\n      <th>Cabin</th>\n      <th>Embarked</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <td>1</td>\n      <td>0</td>\n      <td>3</td>\n      <td>Braund, Mr. Owen Harris</td>\n      <td>male</td>\n      <td>22.0</td>\n      <td>1</td>\n      <td>0</td>\n      <td>A/5 21171</td>\n      <td>7.2500</td>\n      <td>NaN</td>\n      <td>S</td>\n    </tr>\n    <tr>\n      <td>2</td>\n      <td>1</td>\n      <td>1</td>\n      <td>Cumings, Mrs. John Bradley (Florence Briggs Thayer)</td>\n      <td>female</td>\n      <td>38.0</td>\n      <td>1</td>\n      <td>0</td>\n      <td>PC 17599</td>\n      <td>71.2833</td>\n      <td>C85</td>\n      <td>C</td>\n    </tr>\n  </tbody>\n</table>\n\nThe most important abstraction in `visions` are Types - these represent semantic notions about data. You have access to\na\nrange of well tested types like `Integer`, `Float`, and `Files` covering the most common software development use cases.\nTypes can be bundled together into typesets. Behind the scenes, `visions` builds a traversable graph for any collection\nof types.\n\n```python\nfrom visions import types, typesets\n\n# StandardSet is the basic builtin typeset\ntypeset = typesets.CompleteSet()\ntypeset.plot_graph()\n```\n\n![](https://dylan-profiler.github.io/visions/_images/typeset_complete_base.svg)\nNote: Plots require pygraphviz to be [installed](https://pygraphviz.github.io/documentation/stable/install.html).\n\nBecause of the special relationship between types these graphs can be used to detect the type of your data or _infer_ a\nmore appropriate one.\n\n```python\n# Detection looks like this\ntypeset.detect_type(df)\n\n# While inference looks like this\ntypeset.infer_type(df)\n\n# Inference works well even if we monkey with the data, say by converting everything to strings\ntypeset.infer_type(df.astype(str))\n>> {\n    'PassengerId': Integer,\n    'Survived': Integer,\n    'Pclass': Integer,\n    'Name': String,\n    'Sex': String,\n    'Age': Float,\n    'SibSp': Integer,\n    'Parch': Integer,\n    'Ticket': String,\n    'Fare': Float,\n    'Cabin': String,\n    'Embarked': String\n}\n```\n\n`Visions` solves many of the most common problems working with tabular data for example, sequences of Integers are still\nrecognized as integers whether they have trailing decimal 0's from being cast to float, missing values, or something\nelse altogether. Much of this cleaning is performed automatically providing nicely cleaned and processed data as well.\n\n```python\ncleaned_df = typeset.cast_to_inferred(df)\n```\n\nThis is only a small taste of everything visions can do\nincluding [building your own](https://dylan-profiler.github.io/visions/visions/getting_started/extending.html) domain\nspecific types and typesets so please check out the [API](https://dylan-profiler.github.io/visions/visions/api.html)\ndocumentation or the [examples/](https://github.com/dylan-profiler/visions/tree/develop/examples) directory for more\ninfo!\n\n## Supported frameworks\n\nThanks to its dispatch based implementation `Visions` is able to exploit framework specific capabilities offered by\nlibraries like pandas and spark. Currently it works with the following backends by default.\n\n- [Pandas](https://github.com/pandas-dev/pandas) (feature complete)\n- [Numpy](https://github.com/numpy/numpy) (boolean, complex, date time, float, integer, string, time deltas, string,\n  objects)\n- [Spark](https://github.com/apache/spark) (boolean, categorical, date, date time, float, integer, numeric, object,\n  string)\n- [Python](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range) (string, float, integer,\n  date time, time delta, boolean, categorical, object, complex - other datatypes are untested)\n\nIf you're using pandas it will also take advantage of parallelization tools like\n[swifter](https://github.com/jmcarpenter2/swifter) if available.\n\nIt also offers a simple annotation based API for registering new implementations as needed. For example, if you wished\nto extend the categorical data type to include a Dask specific implementation you might do something like\n\n```python\nfrom visions.types.categorical import Categorical\nfrom pandas.api import types as pdt\nimport dask\n\n\n@Categorical.contains_op.register\ndef categorical_contains(series: dask.dataframe.Series, state: dict) -> bool:\n    return pdt.is_categorical_dtype(series.dtype)\n```\n\n## Contributing and support\n\nContributions to `visions` are welcome. For more information, please visit the community\ncontributions [page](https://dylan-profiler.github.io/visions/visions/contributing/contributing.html) and join on us\non [slack](https://join.slack.com/t/dylan-profiling/shared_invite/zt-11c9blvpt-AqxXD5AMS9Q6CO7UUm~cRw). The\ngithub [issues tracker](https://github.com/dylan-profiler/visions/issues/new/choose) is used for reporting bugs, feature\nrequests and support questions.\n\nAlso, please check out some of the other companies and packages using `visions` including:\n\n* [pandas profiling](https://github.com/pandas-profiling/pandas-profiling)\n* [Compress*io*](https://github.com/dylan-profiler/compressio)\n* [Bitrook](https://www.bitrook.com/)\n\nIf you're currently using `visions` or would like to be featured here please let us know.\n\n## Acknowledgements\n\nThis package is part of the [dylan-profiler](https://github.com/dylan-profiler)  project. The package is core component\nof [pandas-profiling](https://github.com/pandas-profiling/pandas-profiling). More information can be\nfound [here](https://dylan-profiler.github.io/visions/visions/background/about.html>). This work was partially supported\nby [SIDN Fonds](https://www.sidnfonds.nl/projecten/dylan-data-analysis-leveraging-automatisation).\n\n![](https://github.com/dylan-profiler/visions/raw/master/images/SIDNfonds.png)\n\n\n",
    "bugtrack_url": null,
    "license": "BSD License",
    "summary": "Visions",
    "version": "0.7.6",
    "project_urls": {
        "Homepage": "https://github.com/dylan-profiler/visions"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7cbf612b24e711ae25dea9af19b9304634b8949faa0b035fad47e8bcadf62f59",
                "md5": "c5878d1e304305eeb9989167fd3468ce",
                "sha256": "72b7f8dbc374e9d6055e938c8c67b0b8da52f3bcb8320f25d86b1a57457e7aa6"
            },
            "downloads": -1,
            "filename": "visions-0.7.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c5878d1e304305eeb9989167fd3468ce",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 104827,
            "upload_time": "2024-02-06T21:15:33",
            "upload_time_iso_8601": "2024-02-06T21:15:33.934872Z",
            "url": "https://files.pythonhosted.org/packages/7c/bf/612b24e711ae25dea9af19b9304634b8949faa0b035fad47e8bcadf62f59/visions-0.7.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "40178ddcab3699d442a3a21c9859b5573a5b96ec19c51b85525653433bc28f5e",
                "md5": "925f05016023c051028cfa040dee6e71",
                "sha256": "00f494a7f78917db2292e11ea832c6e026b64783e688b11da24f4c271ef1631d"
            },
            "downloads": -1,
            "filename": "visions-0.7.6.tar.gz",
            "has_sig": false,
            "md5_digest": "925f05016023c051028cfa040dee6e71",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 566629,
            "upload_time": "2024-02-06T21:15:50",
            "upload_time_iso_8601": "2024-02-06T21:15:50.228192Z",
            "url": "https://files.pythonhosted.org/packages/40/17/8ddcab3699d442a3a21c9859b5573a5b96ec19c51b85525653433bc28f5e/visions-0.7.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-06 21:15:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dylan-profiler",
    "github_project": "visions",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "visions"
}
        
Elapsed time: 0.18842s