Name | mx07 JSON |
Version |
0.2.dev0
JSON |
| download |
home_page | https://github.com/pprados/mx06 |
Summary | Bridge between pandas, cudf, modin, dask, dask-modin, dask-cudf, spark or spark+rapids and between numpy, cupy and dask.array |
upload_time | 2022-12-03 14:46:29 |
maintainer | |
docs_url | None |
author | Philippe Prados |
requires_python | >=3.8 |
license | Apache-2.0 |
keywords |
dataframe
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Virtual DataFrame
[Full documentation](https://pprados.github.io/virtual_dataframe/)
## Motivation
With Panda-like dataframe or numby-like array, do you want to create a code, and choose at the end, the framework
to use? Do you want to be able to choose the best framework after simply performing performance measurements?
This framework unifies multiple Panda-compatible or Numpy-comptaible components,
to allow the writing of a single code, compatible with all.
Do you want to use different architectures at different times of the year to be "green" and cheaper?
Do you want to use a GPU only for the black-friday?
## Synopsis
With some parameters and Virtual classes, it's possible to write a code, and execute this code:
- With or without multicore
- With or without cluster (multi nodes)
- With or without GPU
To do that, we create some virtual classes, add some methods in others classes, etc.
It's difficult to use a combinaison of framework, with the same classe name, with similare semantic, etc.
For example, if you want to use in the same program, Dask, cudf, pandas, modin, pyspark or pyspark+rapids,
you must manage:
- `pandas.DataFrame`, `pandas,Series`
- `modin.pandas.DataFrame`, `modin.pandas.Series`
- `cudf.DataFrame`, `cudf.Series`
- `dask.DataFrame`, `dask.Series`
- `pyspark.pandas.DataFrame`, `pyspark.pandas.Series`
With numpy, you must manage:
- `numpy.ndarray`
- `cupy.ndarray`
- `dask.array`
With `cudf` or `cudf`, the code must call `.to_pandas()` or `asnumpy()`. With dask, the code must call `.compute()`, can use `@delayed` or
`dask.distributed.Client`. etc.
We propose to replace all these classes and scenarios, with a *uniform model*,
inspired by [dask](https://www.dask.org/) (the more complex API).
Then, it is possible to write one code, and use it in differents environnements and frameworks.
This project is essentially a back-port of *Dask+Cudf* to others frameworks.
We try to normalize the API of all frameworks.
This project will *weave* your code with the selected framework, at runtime.
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pprados/virtual-dataframe?labpath=%2Fmain%2Fnotebooks)
Raw data
{
"_id": null,
"home_page": "https://github.com/pprados/mx06",
"name": "mx07",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "dataframe",
"author": "Philippe Prados",
"author_email": "github@prados.fr",
"download_url": "",
"platform": null,
"description": "# Virtual DataFrame\n\n[Full documentation](https://pprados.github.io/virtual_dataframe/)\n\n## Motivation\n\nWith Panda-like dataframe or numby-like array, do you want to create a code, and choose at the end, the framework\nto use? Do you want to be able to choose the best framework after simply performing performance measurements?\nThis framework unifies multiple Panda-compatible or Numpy-comptaible components,\nto allow the writing of a single code, compatible with all.\n\nDo you want to use different architectures at different times of the year to be \"green\" and cheaper?\nDo you want to use a GPU only for the black-friday?\n\n## Synopsis\n\nWith some parameters and Virtual classes, it's possible to write a code, and execute this code:\n\n- With or without multicore\n- With or without cluster (multi nodes)\n- With or without GPU\n\nTo do that, we create some virtual classes, add some methods in others classes, etc.\n\nIt's difficult to use a combinaison of framework, with the same classe name, with similare semantic, etc.\nFor example, if you want to use in the same program, Dask, cudf, pandas, modin, pyspark or pyspark+rapids,\nyou must manage:\n\n- `pandas.DataFrame`, `pandas,Series`\n- `modin.pandas.DataFrame`, `modin.pandas.Series`\n- `cudf.DataFrame`, `cudf.Series`\n- `dask.DataFrame`, `dask.Series`\n- `pyspark.pandas.DataFrame`, `pyspark.pandas.Series`\n\nWith numpy, you must manage:\n- `numpy.ndarray`\n- `cupy.ndarray`\n- `dask.array`\n\n With `cudf` or `cudf`, the code must call `.to_pandas()` or `asnumpy()`. With dask, the code must call `.compute()`, can use `@delayed` or\n`dask.distributed.Client`. etc.\n\nWe propose to replace all these classes and scenarios, with a *uniform model*,\ninspired by [dask](https://www.dask.org/) (the more complex API).\nThen, it is possible to write one code, and use it in differents environnements and frameworks.\n\nThis project is essentially a back-port of *Dask+Cudf* to others frameworks.\nWe try to normalize the API of all frameworks.\nThis project will *weave* your code with the selected framework, at runtime.\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pprados/virtual-dataframe?labpath=%2Fmain%2Fnotebooks)\n\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Bridge between pandas, cudf, modin, dask, dask-modin, dask-cudf, spark or spark+rapids and between numpy, cupy and dask.array",
"version": "0.2.dev0",
"split_keywords": [
"dataframe"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "300bc9fd05cc1a5079e37d34906c36b3",
"sha256": "5683c86c093cfa8cd136d40b5db5b0fe93fd91fe793d51b809b43b09d0353187"
},
"downloads": -1,
"filename": "mx07-0.2.dev0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "300bc9fd05cc1a5079e37d34906c36b3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 26642,
"upload_time": "2022-12-03T14:46:29",
"upload_time_iso_8601": "2022-12-03T14:46:29.137374Z",
"url": "https://files.pythonhosted.org/packages/39/9a/27aed6b0ae2d1bb89c03de57ecd4b0f0d5def781d2b8e94682dba04e4a5f/mx07-0.2.dev0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-12-03 14:46:29",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "pprados",
"github_project": "mx06",
"lcname": "mx07"
}