mx06

Name	mx06 JSON
Version	0.2.dev0 JSON
	download
home_page	https://github.com/pprados/mx06
Summary	Bridge between pandas, cudf, modin, dask, dask-modin, dask-cudf, spark or spark+rapids and between numpy, cupy and dask.array
upload_time	2022-12-03 14:41:00
maintainer
docs_url	None
author	Philippe Prados
requires_python	>=3.8
license	Apache-2.0
keywords	dataframe
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Virtual DataFrame

[Full documentation](https://pprados.github.io/virtual_dataframe/)

## Motivation

With Panda-like dataframe or numby-like array, do you want to create a code, and choose at the end, the framework
to use?  Do you want to be able to choose the best framework after simply performing performance measurements?
This framework unifies multiple Panda-compatible or Numpy-comptaible components,
to allow the writing of a single code, compatible with all.

Do you want to use different architectures at different times of the year to be "green" and cheaper?
Do you want to use a GPU only for the black-friday?

## Synopsis

With some parameters and Virtual classes, it's possible to write a code, and execute this code:

- With or without multicore
- With or without cluster (multi nodes)
- With or without GPU

To do that, we create some virtual classes, add some methods in others classes, etc.

It's difficult to use a combinaison of framework, with the same classe name, with similare semantic, etc.
For example, if you want to use in the same program, Dask, cudf, pandas, modin, pyspark or pyspark+rapids,
you must manage:

- `pandas.DataFrame`, `pandas,Series`
- `modin.pandas.DataFrame`, `modin.pandas.Series`
- `cudf.DataFrame`, `cudf.Series`
- `dask.DataFrame`, `dask.Series`
- `pyspark.pandas.DataFrame`, `pyspark.pandas.Series`

With numpy, you must manage:
- `numpy.ndarray`
- `cupy.ndarray`
- `dask.array`

 With `cudf` or `cudf`, the code must call `.to_pandas()` or `asnumpy()`. With dask, the code must call `.compute()`, can use `@delayed` or
`dask.distributed.Client`. etc.

We propose to replace all these classes and scenarios, with a *uniform model*,
inspired by [dask](https://www.dask.org/) (the more complex API).
Then, it is possible to write one code, and use it in differents environnements and frameworks.

This project is essentially a back-port of *Dask+Cudf* to others frameworks.
We try to normalize the API of all frameworks.
This project will *weave* your code with the selected framework, at runtime.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pprados/virtual-dataframe?labpath=%2Fmain%2Fnotebooks)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pprados/mx06",
    "name": "mx06",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "dataframe",
    "author": "Philippe Prados",
    "author_email": "github@prados.fr",
    "download_url": "",
    "platform": null,
    "description": "# Virtual DataFrame\n\n[Full documentation](https://pprados.github.io/virtual_dataframe/)\n\n## Motivation\n\nWith Panda-like dataframe or numby-like array, do you want to create a code, and choose at the end, the framework\nto use?  Do you want to be able to choose the best framework after simply performing performance measurements?\nThis framework unifies multiple Panda-compatible or Numpy-comptaible components,\nto allow the writing of a single code, compatible with all.\n\nDo you want to use different architectures at different times of the year to be \"green\" and cheaper?\nDo you want to use a GPU only for the black-friday?\n\n## Synopsis\n\nWith some parameters and Virtual classes, it's possible to write a code, and execute this code:\n\n- With or without multicore\n- With or without cluster (multi nodes)\n- With or without GPU\n\nTo do that, we create some virtual classes, add some methods in others classes, etc.\n\nIt's difficult to use a combinaison of framework, with the same classe name, with similare semantic, etc.\nFor example, if you want to use in the same program, Dask, cudf, pandas, modin, pyspark or pyspark+rapids,\nyou must manage:\n\n- `pandas.DataFrame`, `pandas,Series`\n- `modin.pandas.DataFrame`, `modin.pandas.Series`\n- `cudf.DataFrame`, `cudf.Series`\n- `dask.DataFrame`, `dask.Series`\n- `pyspark.pandas.DataFrame`, `pyspark.pandas.Series`\n\nWith numpy, you must manage:\n- `numpy.ndarray`\n- `cupy.ndarray`\n- `dask.array`\n\n With `cudf` or `cudf`, the code must call `.to_pandas()` or `asnumpy()`. With dask, the code must call `.compute()`, can use `@delayed` or\n`dask.distributed.Client`. etc.\n\nWe propose to replace all these classes and scenarios, with a *uniform model*,\ninspired by [dask](https://www.dask.org/) (the more complex API).\nThen, it is possible to write one code, and use it in differents environnements and frameworks.\n\nThis project is essentially a back-port of *Dask+Cudf* to others frameworks.\nWe try to normalize the API of all frameworks.\nThis project will *weave* your code with the selected framework, at runtime.\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pprados/virtual-dataframe?labpath=%2Fmain%2Fnotebooks)\n\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Bridge between pandas, cudf, modin, dask, dask-modin, dask-cudf, spark or spark+rapids and between numpy, cupy and dask.array",
    "version": "0.2.dev0",
    "split_keywords": [
        "dataframe"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "e8c469e635fddc5899d1537698525a0d",
                "sha256": "3bcc58f467c49d2fbc1fa1a88892edbecc535ad0b4cfa44b62ea1a3470349a3c"
            },
            "downloads": -1,
            "filename": "mx06-0.2.dev0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e8c469e635fddc5899d1537698525a0d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 26639,
            "upload_time": "2022-12-03T14:41:00",
            "upload_time_iso_8601": "2022-12-03T14:41:00.895698Z",
            "url": "https://files.pythonhosted.org/packages/08/2f/689a49b87ff342174d951f7ec4f367ff83b971bd6776ebd6e186f890ad78/mx06-0.2.dev0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-03 14:41:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "pprados",
    "github_project": "mx06",
    "lcname": "mx06"
}

Philippe Prados