mx07


Namemx07 JSON
Version 0.2.dev0 PyPI version JSON
download
home_pagehttps://github.com/pprados/mx06
SummaryBridge between pandas, cudf, modin, dask, dask-modin, dask-cudf, spark or spark+rapids and between numpy, cupy and dask.array
upload_time2022-12-03 14:46:29
maintainer
docs_urlNone
authorPhilippe Prados
requires_python>=3.8
licenseApache-2.0
keywords dataframe
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Virtual DataFrame

[Full documentation](https://pprados.github.io/virtual_dataframe/)

## Motivation

With Panda-like dataframe or numby-like array, do you want to create a code, and choose at the end, the framework
to use?  Do you want to be able to choose the best framework after simply performing performance measurements?
This framework unifies multiple Panda-compatible or Numpy-comptaible components,
to allow the writing of a single code, compatible with all.

Do you want to use different architectures at different times of the year to be "green" and cheaper?
Do you want to use a GPU only for the black-friday?

## Synopsis

With some parameters and Virtual classes, it's possible to write a code, and execute this code:

- With or without multicore
- With or without cluster (multi nodes)
- With or without GPU

To do that, we create some virtual classes, add some methods in others classes, etc.

It's difficult to use a combinaison of framework, with the same classe name, with similare semantic, etc.
For example, if you want to use in the same program, Dask, cudf, pandas, modin, pyspark or pyspark+rapids,
you must manage:

- `pandas.DataFrame`, `pandas,Series`
- `modin.pandas.DataFrame`, `modin.pandas.Series`
- `cudf.DataFrame`, `cudf.Series`
- `dask.DataFrame`, `dask.Series`
- `pyspark.pandas.DataFrame`, `pyspark.pandas.Series`

With numpy, you must manage:
- `numpy.ndarray`
- `cupy.ndarray`
- `dask.array`

 With `cudf` or `cudf`, the code must call `.to_pandas()` or `asnumpy()`. With dask, the code must call `.compute()`, can use `@delayed` or
`dask.distributed.Client`. etc.

We propose to replace all these classes and scenarios, with a *uniform model*,
inspired by [dask](https://www.dask.org/) (the more complex API).
Then, it is possible to write one code, and use it in differents environnements and frameworks.

This project is essentially a back-port of *Dask+Cudf* to others frameworks.
We try to normalize the API of all frameworks.
This project will *weave* your code with the selected framework, at runtime.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pprados/virtual-dataframe?labpath=%2Fmain%2Fnotebooks)



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pprados/mx06",
    "name": "mx07",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "dataframe",
    "author": "Philippe Prados",
    "author_email": "github@prados.fr",
    "download_url": "",
    "platform": null,
    "description": "# Virtual DataFrame\n\n[Full documentation](https://pprados.github.io/virtual_dataframe/)\n\n## Motivation\n\nWith Panda-like dataframe or numby-like array, do you want to create a code, and choose at the end, the framework\nto use?  Do you want to be able to choose the best framework after simply performing performance measurements?\nThis framework unifies multiple Panda-compatible or Numpy-comptaible components,\nto allow the writing of a single code, compatible with all.\n\nDo you want to use different architectures at different times of the year to be \"green\" and cheaper?\nDo you want to use a GPU only for the black-friday?\n\n## Synopsis\n\nWith some parameters and Virtual classes, it's possible to write a code, and execute this code:\n\n- With or without multicore\n- With or without cluster (multi nodes)\n- With or without GPU\n\nTo do that, we create some virtual classes, add some methods in others classes, etc.\n\nIt's difficult to use a combinaison of framework, with the same classe name, with similare semantic, etc.\nFor example, if you want to use in the same program, Dask, cudf, pandas, modin, pyspark or pyspark+rapids,\nyou must manage:\n\n- `pandas.DataFrame`, `pandas,Series`\n- `modin.pandas.DataFrame`, `modin.pandas.Series`\n- `cudf.DataFrame`, `cudf.Series`\n- `dask.DataFrame`, `dask.Series`\n- `pyspark.pandas.DataFrame`, `pyspark.pandas.Series`\n\nWith numpy, you must manage:\n- `numpy.ndarray`\n- `cupy.ndarray`\n- `dask.array`\n\n With `cudf` or `cudf`, the code must call `.to_pandas()` or `asnumpy()`. With dask, the code must call `.compute()`, can use `@delayed` or\n`dask.distributed.Client`. etc.\n\nWe propose to replace all these classes and scenarios, with a *uniform model*,\ninspired by [dask](https://www.dask.org/) (the more complex API).\nThen, it is possible to write one code, and use it in differents environnements and frameworks.\n\nThis project is essentially a back-port of *Dask+Cudf* to others frameworks.\nWe try to normalize the API of all frameworks.\nThis project will *weave* your code with the selected framework, at runtime.\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pprados/virtual-dataframe?labpath=%2Fmain%2Fnotebooks)\n\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Bridge between pandas, cudf, modin, dask, dask-modin, dask-cudf, spark or spark+rapids and between numpy, cupy and dask.array",
    "version": "0.2.dev0",
    "split_keywords": [
        "dataframe"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "300bc9fd05cc1a5079e37d34906c36b3",
                "sha256": "5683c86c093cfa8cd136d40b5db5b0fe93fd91fe793d51b809b43b09d0353187"
            },
            "downloads": -1,
            "filename": "mx07-0.2.dev0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "300bc9fd05cc1a5079e37d34906c36b3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 26642,
            "upload_time": "2022-12-03T14:46:29",
            "upload_time_iso_8601": "2022-12-03T14:46:29.137374Z",
            "url": "https://files.pythonhosted.org/packages/39/9a/27aed6b0ae2d1bb89c03de57ecd4b0f0d5def781d2b8e94682dba04e4a5f/mx07-0.2.dev0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-03 14:46:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "pprados",
    "github_project": "mx06",
    "lcname": "mx07"
}
        
Elapsed time: 0.01478s