ryp


Nameryp JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryR inside Python
upload_time2024-09-29 03:04:03
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseMIT License
keywords r rpy2 reticulate tidyverse ggplot ggplot2 arrow
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ryp: R inside Python

ryp is a minimalist, powerful Python library for:
- running R code inside Python
- quickly transferring huge datasets between Python (NumPy/pandas/polars) and R
  without writing to disk
- interactively working in both languages at the same time

ryp is an alternative to the widely used [rpy2](https://github.com/rpy2/rpy2) 
library. Compared to rpy2, ryp provides:
- increased stability
- a much simpler API, with less of a learning curve
- interactive printouts of R variables that match what you'd see in R
- a full-featured R terminal inside Python for interactive work
- inline plotting in Jupyter notebooks (requires the `svglite` R package)
- much faster data conversion with [Arrow](https://arrow.apache.org) (also
  provided by [rpy2-arrow](https://github.com/rpy2/rpy2-arrow))
- support for *every* NumPy, pandas and polars data type representable in base
  R, no matter how obscure
- support for sparse arrays/matrices
- recursive conversion of containers like R lists, Python tuples/lists/dicts, 
  and S3/S4/R6 objects
- full Windows support

ryp does the opposite of the 
[reticulate](https://rstudio.github.io/reticulate) R library, which runs Python
inside R.

## Installation

Install ryp via pip:

```bash
pip install ryp
```

conda:

```bash
conda install ryp
```

or mamba:

```bash
mamba install ryp
```

Or, install the development version via pip:

```bash
pip install git+https://github.com/Wainberg/ryp
```

ryp's only mandatory dependencies are:
- Python 3.7+
- R
- the [cffi](https://cffi.readthedocs.io/en/stable) Python package
- the [pyarrow](https://arrow.apache.org/docs/python) Python package, which 
  includes [NumPy](https://numpy.org) as a dependency
- the [arrow](https://arrow.apache.org/docs/r) R library

R and the arrow R library are automatically installed when installing ryp via 
conda or mamba, but not via pip. ryp uses the R installation pointed to by the
environment variable `R_HOME`, or if `R_HOME` is not defined or not a 
directory, by running `R RHOME` through `subprocess.run()`.

ryp also has several optional dependencies, which are not installed 
automatically with pip, conda or mamba. These are:
- [pandas](https://pandas.pydata.org), for `format='pandas'`
- [polars](https://pola.rs), for `format='polars'`
- [SciPy](https://scipy.org) and the
  [Matrix](https://cran.r-project.org/web/packages/Matrix) R library, for sparse
  matrices
- the [svglite](https://cran.r-project.org/web/packages/svglite) R library, for
  inline plotting in Jupyter notebooks

## Functionality

ryp consists of just three core functions:

1. `r(R_code)` runs a string of R code. `r()` with no arguments opens up an R 
   terminal inside your Python terminal for interactive work.
2. `to_r(python_object, R_variable_name)` converts a Python object into an R 
   object named `R_variable_name`. 
3. `to_py(R_statement)` converts the R object produced by evaluating 
   `R_statement` to Python. `R_statement` may be a single variable name, or a 
   more complex code snippet that evaluates to the R object you'd like to 
   convert.

and two more functions, `get_config()` and `set_config()`, for getting and 
setting ryp's global configuration options.

### `r()`

```python
r(R_code: str = ...) -> None
```

`r(R_code)` runs a string of R code inside ryp's R interpreter, which is 
embedded inside Python. It can contain multiple statements separated by
semicolons or newlines (e.g. within a triple-quoted Python string). It returns
`None`; use `to_py()` instead if you would like to convert the result back to 
Python.

`r()` with no arguments opens up an R terminal inside your Python terminal 
for interactive debugging. Press `Ctrl + D` to exit back to the Python 
terminal. R variables defined from Python will be available in the R terminal,
and variables defined in the R terminal will be available from Python once you
exit:

```python
>> > from ryp.ryp.ryp import r
>> > r('a = 1')
>> > r()
> a
[1]
1
> b < - 2
>
>> > r('b')
[1]
2
```

Note that the default value for `R_code` is the special sentinel value `...` 
(`Ellipsis`) rather than `None`. This stops users from inadvertently opening 
the terminal when passing a variable that is supposed to be a string but is 
unexpectedly `None`.

### `to_r()`

```python
to_r(python_object: object, R_variable_name: str, *, 
     format: Literal['keep', 'matrix', 'data.frame'] | None = None,
     rownames: object = None, colnames: object = None) -> None
```

`to_r(python_object, R_variable_name)` converts `python_object` to R, adding it
to R's global namespace (`globalenv`) as a variable named `R_variable_name`. 

If `python_object` is a container (`list`, `tuple`, or `dict`), `to_r()`
recursively converts each element and returns an R named list (if
`python_object` is a `dict`) or unnamed list (if `python_object` is a `list` or
`tuple`).

#### The `format` argument

By default (`format='keep'`), ryp converts polars and pandas DataFrames (and 
pandas MultiIndexes) into R data frames, and 2D NumPy arrays into R matrices. 
Specify `format='matrix'` to convert everything (even DataFrames) to R matrices
(in which case all DataFrame columns must have the same type), and 
`format='data.frame'` to convert everything (even 2D NumPy arrays) to R 
data frames.

`format` must be `None` unless `python_object` is a DataFrame, MultiIndex or 2D
NumPy array – or unless `python_object` is a `list`, `tuple`, or `dict`, in 
which case the `format` will apply recursively to any DataFrames, MultiIndexes
or 2D NumPy arrays it contains.

#### The `rownames` and `colnames` arguments

Since NumPy arrays, polars DataFrames and Series, and scipy sparse arrays and 
matrices lack row and column names, you can specify these separately via the 
`rownames` and/or `colnames` arguments, and they will be added to the converted
R object. `rownames` and `colnames` can be lists, tuples, string Series, or 
categorical Series with string categories, and will be automatically converted
to R character vectors. 

`rownames` and `colnames` must match the length or `shape[1]`, respectively, of
the object being converted. The one exception is that rownames of any length
may be added to a 0 &times; 0 polars DataFrame, since polars does not have the 
concept of an `N` &times; 0 DataFrame for nonzero `N`. (Dropping all the 
columns of a polars DataFrame always results in a 0 &times; 0 DataFrame, even 
if the original DataFrame had more than 0 rows.)

Because Python `bool`, `int`, `float`, and `str` convert to length-1 R vectors
that support names, you can pass length-1 `rownames` when converting objects of
these types. You can also pass `rownames` and/or `colnames` when 
`python_object` is a `list`, `tuple`, or `dict`, in which case row and column 
names will only be added to elements that support them. All elements that 
support `rownames` must have the same length as the `rownames`, and similarly 
for the `colnames`. 

`rownames` cannot be specified if `python_object` is a pandas Series or 
DataFrame (since they already have rownames, i.e. an index), or 
`bytes`/`bytearray` (since these convert to `raw` vectors, which lack 
rownames). `colnames` cannot be specified unless `python_object` is a 
multidimensional NumPy array or scipy sparse array or matrix, or something that
might contain one (`list`, `tuple`, or `dict`).

### `to_py()`

```python
to_py(R_statement: str, *,
      format: Literal['polars', 'pandas', 'pandas-pyarrow', 'numpy'] |
              dict[Literal['vector', 'matrix', 'data.frame'],
                   Literal['polars', 'pandas', 'pandas-pyarrow',
                           'numpy']] | None = None,
      index: str | Literal[False] | None = None,
      squeeze: bool | None = None) -> Any
```

`to_py(R_statement)` runs a single statement of R code (which can be as simple 
as a single variable name) and converts the resulting R object to Python. 

If the object is a list/S3 object, S4 object, or environment/R6 object, it
recursively converts each attribute/slot/field and returns a Python `dict` (or 
`list`, if the object is an unnamed list). For R6 objects, only public fields
will be converted.

#### The `format` argument

By default, or when `format='polars'`, R vectors will be converted to polars 
Series, and R data frames and matrices will be converted to polars DataFrames. 
You can change this by setting the `format` argument to `'pandas'`, 
`'pandas-pyarrow'` (like `'pandas'`, but converting to pyarrow dtypes wherever 
possible) or `'numpy'`. (You can also change the default format, e.g. with 
`set_config(to_py_format='pandas')`.)

For finer-grained control, you can set `format` for only certain R variable 
types by specifying a dictionary with `'vector'`, `'matrix'`, and/or
`'data.frame'` as keys and `'polars'`, `'pandas'`, `'pandas-pyarrow'` and/or 
`'numpy'` as values. 

`format` must be `None` when `R_statement` evaluates to `NULL`, when it 
evaluates to an array of 3 or more dimensions (these are always converted to 
NumPy arrays), or when the final result would be a Python scalar (see `squeeze`
below).

#### The `index` argument

By default, the R object's `names` or `rownames` will become the index (for 
pandas) or the first column (for polars) of the output Python object, named 
`'index'`. Set the `index` argument to a different string to change this name, 
or set `index=False` to not convert the `names`/`rownames`. 

Note that for polars, the output will be a two-column DataFrame (not a Series!)
when the input is an R vector, unless `index=False`. 

When the output is a NumPy array, `names` and `rownames` will always be 
discarded, since numeric NumPy arrays cannot store string indexes except with 
the inefficient `dtype=object`. 

`index` must be `None` when `format='numpy'`, or when the final result would be
a Python scalar (see `squeeze` below).

#### The `squeeze` argument

By default, length-1 R vectors, matrices and arrays will be converted to Python
scalars instead of Python arrays, Series or DataFrames. Set `squeeze=False` to
disable this special case. (R data frames are never converted to Python scalars
even if `squeeze=True`.) 

`squeeze` must be `None` unless the R object is a vector, matrix or array
(`raw` vectors don't count, because they always convert to Python scalars).

### `set_config()` and `get_config()`

```python
set_config(*, to_r_format=None, to_py_format=None, index=None, squeeze=None, 
           plot_width: int | float | None = None, 
           plot_height: int | float | None = None) -> None
```

`set_config` sets ryp's configuration settings. Arguments that are `None` are 
left unchanged.
    
For instance, to set pandas as the default format in `to_py()`, run 
`set_config(to_py_format='pandas')`.

- `to_r_format`: the default value for the `format` parameter in `to_r()`; 
  must be `'keep'` (the default), `'matrix'`, or `'data.frame'`.
- `to_py_format`: the default value for the `format` parameter in `to_py()`; 
  must be `'polars'` (the default), `'pandas'`, `'pandas-pyarrow'`, `'numpy'`,
  or a dictionary with one of those four Python formats and/or `None` as values
  and `'vector'`, `'matrix'` and/or `'data.frame'` as keys. If certain keys are 
  missing or have `None` as the format, leave their format unchanged.
- `index`: the default value for the `index` parameter in to_py(); must be a 
  string (default: `'index'`) or `False`. 
- `squeeze`: the default value for the `squeeze` parameter in `to_py()`; must  
  be `True` (the default) or `False`.
- `plot_width`: the width, in inches, of inline plots in Jupyter notebooks;
  must be a positive number. Defaults to 6.4 inches, to match Matplotlib's 
  default.
- `plot_height`: the height, in inches, of inline plots in Jupyter notebooks;
  must be a positive number. Defaults to 4.8 inches, to match Matplotlib's 
  default.

```python
get_config() -> dict[str, dict[str, str] | str | bool | int]
```

`get_config` returns the current configuration options as a dictionary, with 
keys `to_r_format`, `to_py_format`, `index`, `squeeze`, `plot_width`, and 
`plot_height`.

## Conversion rules

### Python to R (`to_r()`)

| Python                                                                  | R                                                                                                         |
|-------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| `None`                                                                  | `NULL` (if scalar) or `NA` (if inside NumPy, pandas or polars)                                            |
| `nan`                                                                   | `NaN` (if scalar or inside polars) or `NA` (if inside NumPy or pandas)                                    |
| `pd.NA`                                                                 | `NA`                                                                                                      |
| `pd.NaT`, `np.datetime64('NaT')`, `np.timedelta64('NaT')`               | `NA`                                                                                                      |   
| `bool`                                                                  | length-1 `logical` vector                                                                                 |
| `int`                                                                   | length-1 `integer` (if `abs(x) <= 2_147_483_647`) or `bit64::integer64` vector                            |
| `float`                                                                 | length-1 `numeric` vector                                                                                 |
| `str`                                                                   | length-1 `character` vector                                                                               |
| `complex`                                                               | length-1 `complex` vector                                                                                 |
| `datetime.date`                                                         | length-1 `Date` vector                                                                                    |
| `datetime.datetime`                                                     | length-1 `POSIXct` vector                                                                                 |
| `datetime.timedelta`                                                    | length-1 `difftime(units='secs')` vector                                                                  |
| `datetime.time` (`tzinfo` must be `None`)                               | length-1 `hms::hms` vector                                                                                |
| `bytes`, `bytearray`                                                    | `raw` vector                                                                                              |
| `list`, `tuple`                                                         | unnamed list                                                                                              |
| `dict` (all keys must be strings)                                       | named list                                                                                                |
| polars Series, pandas Series<sup>&ast;</sup>, pandas `Index`            | vector                                                                                                    |
| polars DataFrame, pandas DataFrame<sup>&ast;</sup>, pandas `MultiIndex` | matrix<sup>&dagger;</sup> (if `format == 'matrix'`; all columns must have same data type) or `data.frame` |
| 1D NumPy array                                                          | vector                                                                                                    |
| 2D NumPy array                                                          | `data.frame` (if `format == 'data.frame'`) or matrix<sup>&dagger;</sup>                                   |
| &ge; 3D NumPy array                                                     | array<sup>&dagger;</sup>                                                                                  |
| 0D NumPy array (e.g. `np.array(1)`), NumPy generic (e.g. `np.int32(1)`) | length-1 vector                                                                                           |
| `csr_array`, `csr_matrix`                                               | `dgRMatrix` (if int or float), `lgRMatrix` (if boolean), -- (if complex)                                  | 
| `csc_array`, `csc_matrix`                                               | `dgCMatrix` (if int or float), `lgCMatrix` (if boolean), -- (if complex)                                  |
| `coo_array`, `coo_matrix`                                               | `dgTMatrix` (if int or float), `lgTMatrix` (if boolean), -- (if complex)                                  |

#### NumPy data types

| Python                                      | R                                                              |
|---------------------------------------------|----------------------------------------------------------------|
| `bool`                                      | `logical`                                                      |
| `int8`, `uint8`, `int16`, `uint16`, `int32` | `integer`                                                      |
| `uint32`, `uint64`                          | `integer` (if `x <= 2_147_483_647`) or `numeric`               |
| `int64`                                     | `integer` (if `abs(x) <= 2_147_483_647`) or `bit64::integer64` |
| `float16`, `float32`, `float64`, `float128` | `numeric` (note: `float128` loses precision)                   |
| `complex64`, `complex128`                   | `complex`                                                      |
| `bytes` (e.g. `'S1'`)                       | --                                                             |
| `str`/`unicode` (e.g. `'U1'`)               | `character`                                                    |
| `datetime64`                                | `POSIXct`                                                      | 
| `timedelta64`                               | `difftime(units='secs')`                                       |
| `void` (unstructured)                       | `raw`                                                          |
| `void` (structured)                         | --                                                             |
| `object`                                    | depends on the contents<sup>&Dagger;</sup>                     |

#### pandas-specific data types

| Python                                                               | R                                                              |
|----------------------------------------------------------------------|----------------------------------------------------------------|
| `BooleanDtype`                                                       | `logical`                                                      |
| `Int8Dtype`, `UInt8Dtype`, `Int16Dtype`, `UInt16Dtype`, `Int32Dtype` | `integer`                                                      |
| `UInt32Dtype`, `UInt64Dtype`                                         | `integer` (if `x <= 2_147_483_647`) or `numeric`               |
| `Int64Dtype`                                                         | `integer` (if `abs(x) <= 2_147_483_647`) or `bit64::integer64` |  
| `Float32Dtype`, `Float64Dtype`                                       | `numeric`                                                      |
| `StringDtype`                                                        | `character`                                                    |
| `CategoricalDtype(ordered=False)`                                    | unordered `factor`                                             |
| `CategoricalDtype(ordered=True)`                                     | ordered `factor`                                               |
| `DatetimeTZDtype`, `PeriodDtype`                                     | `POSIXct`                                                      |
| `IntervalDtype`, `SparseDtype`                                       | --                                                             |

#### pandas Arrow data types (`pd.ArrowDtype`)

| Python                                                     | R                                                              |
|------------------------------------------------------------|----------------------------------------------------------------|
| `pa.bool_`                                                 | `logical`                                                      |
| `pa.int8`, `pa.uint8`, `pa.int16`, `pa.uint16`, `pa.int32` | `integer`                                                      |
| `pa.uint32`, `pa.uint64`                                   | `integer` (if `x <= 2_147_483_647`) or `numeric`               |
| `pa.int64`                                                 | `integer` (if `abs(x) <= 2_147_483_647`) or `bit64::integer64` |
| `pa.float32`, `pa.float64`                                 | `numeric`                                                      |
| `pa.string`, `pa.large_string`                             | `character`                                                    |
| `pa.date32`                                                | `Date`                                                         |
| `pa.date64`, `pa.timestamp`                                | `POSIXct`                                                      |
| `pa.duration`                                              | `difftime(units='secs')`                                       |
| `pa.time32`, `pa.time64`                                   | `hms::hms`                                                     |
| `pa.dictionary(any integer type, pa.string(), ordered=0)`  | unordered `factor`                                             |
| `pa.dictionary(any integer type, pa.string(), ordered=1)`  | ordered `factor`                                               |
| `pa.null()`                                                | `vctrs::unspecified`                                           |

#### Polars data types

| Python                                      | R                                                              |
|---------------------------------------------|----------------------------------------------------------------|
| `Boolean`                                   | `logical`                                                      |
| `Int8`, `UInt8`, `Int16`, `UInt16`, `Int32` | `integer`                                                      |
| `UInt32`, `UInt64`                          | `integer` (if `x <= 2_147_483_647`) or `numeric`               |
| `Int64`                                     | `integer` (if `abs(x) <= 2_147_483_647`) or `bit64::integer64` |
| `Float32`, `Float64`                        | `numeric`                                                      |
| `Date`                                      | `Date`                                                         |
| `Datetime`                                  | `POSIXct`                                                      |
| `Duration`                                  | `difftime(units='secs')`                                       |
| `Time`                                      | `hms::hms`                                                     |
| `String`                                    | `character`                                                    |
| `Categorical`                               | unordered `factor`                                             |
| `Enum`                                      | ordered `factor`                                               |
| `Object`                                    | depends on the contents<sup>&Dagger;</sup>                     |
| `Null`                                      | `vctrs::unspecified`                                           | 
| `Binary`, `Decimal`, `List`, `Array`        | --                                                             |

#### Notes

<sup>&ast;</sup> For pandas Series and DataFrames, string indexes (and 
categorical indexes where the categories are strings) will be automatically
converted to `names`/`rownames`. The default index
(`pd.RangeIndex(len(python_object))`) will be ignored. All other indexes are
disallowed. 

<sup>&dagger;</sup> Because R does not support `POSIXct` and `Date` matrices or
arrays, dates and datetimes cannot be converted to R matrices or arrays.

<sup>&Dagger;</sup> For `dtype=object` and `dtype=pl.Object`, the output R type
depends on the contents, e.g. `'character'` if all elements are strings. Some
additional notes on ryp's handling of object data types:
- `None`, `np.nan`, `pd.NA`, `pd.NaT`, `np.datetime64('NaT')`, and 
  `np.timedelta64('NaT')` are all treated as missing values &ndash; even for 
  polars, where `np.nan` is ordinarily treated as a floating-point number 
  rather than a missing value. 
- Length-0 and all-missing data will be converted to the `vctrs::unspecified` R
  type (`vctrs` is part of the tidyverse). 
- If the elements are objects with a mix of types (or datetimes with a mix of
  time zones), Arrow will generally cause the conversion to fail, though mixes
  of related types (e.g. int and float) will be automatically cast to the
  common supertype and succeed. 
- Conversion will also fail if the contents are objects that are not 
  representable as R vector elements. This includes `bytes`/`bytearray` (which
  are only representable in R when scalar, as a `raw` vector) and Python
  containers (`list`, `tuple`, and  `dict`). 
- pandas `Timedelta` objects will be rounded down to the nearest microsecond,
  following the behavior of Arrow.

### R to Python (`to_py()`)

| R                                                                            | Python                                                                                                                                                                                        |
|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `NULL`                                                                       | `None`                                                                                                                                                                                        |
| `NA`                                                                         | `None` (if scalar or `format='polars'`), `None`/`nan`/`pd.NA`/`pd.NaT`/`np.datetime64('NaT', 'us')`/`np.timedelta64('NaT', 'ns')`/etc. (if `format='numpy'` `'pandas'` or `'pandas-pyarrow'`) |
| `NaN`                                                                        | `nan`                                                                                                                                                                                         |
| length-1 vector, matrix or array, `squeeze == False`                         | scalar                                                                                                                                                                                        | 
| vector or 1D array, `format == 'numpy'`                                      | 1D NumPy array                                                                                                                                                                                |
| vector or 1D array, `format == 'pandas'` or `format == 'pandas-pyarrow'`     | pandas Series                                                                                                                                                                                 |
| vector or 1D array, `format == 'polars'`                                     | polars Series (if `index=False`) or two-column DataFrame                                                                                                                                      |
| matrix or `data.frame`, `format == 'numpy'`                                  | 2D NumPy array                                                                                                                                                                                |
| matrix or `data.frame`, `format == 'pandas'` or `format == 'pandas-pyarrow'` | pandas DataFrame                                                                                                                                                                              |
| matrix or `data.frame`, `format == 'polars'`                                 | polars DataFrame                                                                                                                                                                              |
| &ge; 3D array                                                                | NumPy array                                                                                                                                                                                   |  
| unnamed list                                                                 | `list`                                                                                                                                                                                        |
| named list, S3 object, S4 object, environment, S6 object                     | `dict`                                                                                                                                                                                        |
| `dgRMatrix`                                                                  | `csr_array(dtype='int32')`                                                                                                                                                                    |
| `dgCMatrix`                                                                  | `csc_array(dtype='int32')`                                                                                                                                                                    |
| `dgTMatrix`                                                                  | `coo_array(dtype='int32')`                                                                                                                                                                    |
| `lgRMatrix`, `ngRMatrix`                                                     | `csr_array(dtype=bool)`                                                                                                                                                                       |
| `lgCMatrix`, `ngCMatrix`                                                     | `csc_array(dtype=bool)`                                                                                                                                                                       |
| `lgTMatrix`, `ngTMatrix`                                                     | `coo_array(dtype=bool)`                                                                                                                                                                       |
| formula (`~`)                                                                | --                                                                                                                                                                                            |

#### Data types

| R                           | Python scalar                        | NumPy                                                    | pandas                                                   | pandas-pyarrow                                                 | polars                                     |
|-----------------------------|--------------------------------------|----------------------------------------------------------|----------------------------------------------------------|----------------------------------------------------------------|--------------------------------------------|
| `logical`                   | `bool`                               | `bool`                                                   | `bool`                                                   | `ArrowDtype(pa.bool_())`                                       | `Boolean`                                  |
| `integer`                   | `int`                                | `int32`                                                  | `int32`                                                  | `ArrowDtype(pa.int32())`                                       | `Int32`                                    |
| `bit64::integer64`          | `int`                                | `int64`                                                  | `int64`                                                  | `ArrowDtype(pa.int64())`                                       | `Int64`                                    |
| `numeric`                   | `float`                              | `float`                                                  | `float`                                                  | `ArrowDtype(pa.float64())`                                     | `Float64`                                  |
| `character`                 | `str`                                | `object` (with `str` elements)                           | `object` (with `str` elements)                           | `ArrowDtype(pa.string())`                                      | `String`                                   |
| `complex`                   | `complex`                            | `complex128`                                             | `complex128`                                             | `complex128`                                                   | --                                         |
| `raw`                       | `bytearray`                          | --                                                       | --                                                       | --                                                             | --                                         |
| unordered `factor`          | `str`                                | `object` (with `str` elements)                           | `CategoricalDtype(ordered=False)`                        | `ArrowDtype(pa.dictionary(pa.int8(), pa.string(), ordered=0))` | `Categorical`                              |
| ordered `factor`            | `str`                                | `object` (with `str` elements)                           | `CategoricalDtype(ordered=True)`                         | `ArrowDtype(pa.dictionary(pa.int8(), pa.string(), ordered=1))` | `Enum`                                     |
| `POSIXct` without time zone | `datetime.datetime`<sup>&ast;</sup>  | `datetime64[us]`<sup>&ast;</sup>                         | `datetime64[us]`<sup>&ast;</sup>                         | `ArrowDtype(pa.timestamp('us'))`<sup>&ast;</sup>               | `Datetime('us')`<sup>&ast;</sup>           |
| `POSIXct` with time zone    | `datetime.datetime`<sup>&ast;</sup>  | `datetime64[us]`<sup>&ast;</sup> (time zone discarded)   | `DatetimeTZDtype('us', time_zone)`<sup>&ast;</sup>       | `ArrowDtype(pa.timestamp('us', time_zone))`<sup>&ast;</sup>    | `Datetime('us, time_zone)`<sup>&ast;</sup> | 
| `POSIXlt`                   | `dict` of scalars                    | `dict` of NumPy arrays                                   | `dict` of pandas Series                                  | `dict` of pandas Series                                        | `dict` of polars Series                    |
| `Date`                      | `datetime.date`                      | `datetime64[D]`                                          | `datetime64[ms]`                                         | `ArrowDtype(pa.date32('day'))`                                 | `Date`                                     |
| `difftime`                  | `datetime.timedelta`<sup>&ast;</sup> | `timedelta64[ns]`                                        | `timedelta64[ns]`                                        | `ArrowDtype(pa.duration('ns'))`                                | `Duration(time_unit='ns')`                 |
| `hms::hms`                  | `datetime.time`<sup>&ast;</sup>      | `object` (with `datetime.time` elements)<sup>&ast;</sup> | `object` (with `datetime.time` elements)<sup>&ast;</sup> | `ArrowDtype(pa.time64('ns'))`<sup>&ast;</sup>                  | `Time`                                     |
| `vctrs::unspecified`        | `None`                               | `object` (with `None` elements)                          | `object` (with `None` elements)                          | `ArrowDtype(pa.null())`                                        | `Null`                                     |

<sup>&ast;</sup> Due to the limitations of conversion with Arrow, `POSIXct` and
`hms::hms` values are rounded down to the nearest microsecond when converting
to Python, except for `hms::hms` when converting to polars. `difftime` values
are also rounded down to the nearest microsecond, but only when converting to
scalar `datetime.timedelta` values (which cannot represent nanoseconds).

## Examples

1. Apply R's `scale()` function to a pandas DataFrame:

```python
>>> import pandas as pd
>>> from ryp import r, to_py, to_r, set_config
>>> set_config(to_py_format='pandas')
>>> data = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 3, 4]})
>>> to_r(data, 'data')
>>> r('data')
  a b
1 1 1
2 2 3
3 3 4
>>> r('data <- scale(data)')  # scale the R data.frame
>>> scaled_data = to_py('data')  # convert the R data.frame to Python
>>> scaled_data
     a         b
0 -1.0 -1.091089
1  0.0  0.218218
2  1.0  0.872872
```
Note: we could have just written `to_py('scale(data)')` instead of
`r('data <- scale(data)')` followed by `to_py('data')`.

2. Run a linear model on a polars DataFrame:

```python
>>> import polars as pl
>>> from ryp import r, to_py, to_r
>>> data = pl.DataFrame({'y': [7, 1, 2, 3, 6], 'x': [5, 2, 3, 2, 5]})
>>> to_r(data, 'data')
>>> r('model <- lm(y ~ x, data=data)')
>>> coef = to_py('summary(model)$coefficients', index='variable')
>>> p_value = coef.filter(variable='x').select('Pr(>|t|)')[0, 0]
>>> p_value
0.02831035772841884
```

3. Recursive conversion, showcasing all the keyword arguments to `to_r()` and
   `to_py()`:

```python
>>> import numpy as np
>>> from ryp import r, to_py, to_r
>>> arrays = {'ints': np.array([[1, 2], [3, 4]]),
...           'floats': np.array([[0.5, 1.5], [2.5, 3.5]])}
>>> to_r(arrays, 'arrays', format='data.frame',
...      rownames = ['row1', 'row2'], colnames = ['col1', 'col2'])
>>> r('arrays')
$ints
     col1 col2
row1    1    2
row2    3    4

$floats
     col1 col2
row1  0.5  1.5
row2  2.5  3.5
>>> arrays = to_py('arrays', format='pandas', index='foo')
>>> arrays['ints']
      col1  col2
foo
row1     1     2
row2     3     4
>>> arrays['floats']
      col1  col2
foo
row1   0.5   1.5
row2   2.5   3.5
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ryp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "R, rpy2, reticulate, tidyverse, ggplot, ggplot2, arrow",
    "author": null,
    "author_email": "Michael Wainberg <m.wainberg@utoronto.ca>",
    "download_url": "https://files.pythonhosted.org/packages/d8/1b/abfd1da98e46ea28389243fc36504ef853e82cbd219fbfd04b21bf094735/ryp-0.1.0.tar.gz",
    "platform": null,
    "description": "# ryp: R inside Python\r\n\r\nryp is a minimalist, powerful Python library for:\r\n- running R code inside Python\r\n- quickly transferring huge datasets between Python (NumPy/pandas/polars) and R\r\n  without writing to disk\r\n- interactively working in both languages at the same time\r\n\r\nryp is an alternative to the widely used [rpy2](https://github.com/rpy2/rpy2) \r\nlibrary. Compared to rpy2, ryp provides:\r\n- increased stability\r\n- a much simpler API, with less of a learning curve\r\n- interactive printouts of R variables that match what you'd see in R\r\n- a full-featured R terminal inside Python for interactive work\r\n- inline plotting in Jupyter notebooks (requires the `svglite` R package)\r\n- much faster data conversion with [Arrow](https://arrow.apache.org) (also\r\n  provided by [rpy2-arrow](https://github.com/rpy2/rpy2-arrow))\r\n- support for *every* NumPy, pandas and polars data type representable in base\r\n  R, no matter how obscure\r\n- support for sparse arrays/matrices\r\n- recursive conversion of containers like R lists, Python tuples/lists/dicts, \r\n  and S3/S4/R6 objects\r\n- full Windows support\r\n\r\nryp does the opposite of the \r\n[reticulate](https://rstudio.github.io/reticulate) R library, which runs Python\r\ninside R.\r\n\r\n## Installation\r\n\r\nInstall ryp via pip:\r\n\r\n```bash\r\npip install ryp\r\n```\r\n\r\nconda:\r\n\r\n```bash\r\nconda install ryp\r\n```\r\n\r\nor mamba:\r\n\r\n```bash\r\nmamba install ryp\r\n```\r\n\r\nOr, install the development version via pip:\r\n\r\n```bash\r\npip install git+https://github.com/Wainberg/ryp\r\n```\r\n\r\nryp's only mandatory dependencies are:\r\n- Python 3.7+\r\n- R\r\n- the [cffi](https://cffi.readthedocs.io/en/stable) Python package\r\n- the [pyarrow](https://arrow.apache.org/docs/python) Python package, which \r\n  includes [NumPy](https://numpy.org) as a dependency\r\n- the [arrow](https://arrow.apache.org/docs/r) R library\r\n\r\nR and the arrow R library are automatically installed when installing ryp via \r\nconda or mamba, but not via pip. ryp uses the R installation pointed to by the\r\nenvironment variable `R_HOME`, or if `R_HOME` is not defined or not a \r\ndirectory, by running `R RHOME` through `subprocess.run()`.\r\n\r\nryp also has several optional dependencies, which are not installed \r\nautomatically with pip, conda or mamba. These are:\r\n- [pandas](https://pandas.pydata.org), for `format='pandas'`\r\n- [polars](https://pola.rs), for `format='polars'`\r\n- [SciPy](https://scipy.org) and the\r\n  [Matrix](https://cran.r-project.org/web/packages/Matrix) R library, for sparse\r\n  matrices\r\n- the [svglite](https://cran.r-project.org/web/packages/svglite) R library, for\r\n  inline plotting in Jupyter notebooks\r\n\r\n## Functionality\r\n\r\nryp consists of just three core functions:\r\n\r\n1. `r(R_code)` runs a string of R code. `r()` with no arguments opens up an R \r\n   terminal inside your Python terminal for interactive work.\r\n2. `to_r(python_object, R_variable_name)` converts a Python object into an R \r\n   object named `R_variable_name`. \r\n3. `to_py(R_statement)` converts the R object produced by evaluating \r\n   `R_statement` to Python. `R_statement` may be a single variable name, or a \r\n   more complex code snippet that evaluates to the R object you'd like to \r\n   convert.\r\n\r\nand two more functions, `get_config()` and `set_config()`, for getting and \r\nsetting ryp's global configuration options.\r\n\r\n### `r()`\r\n\r\n```python\r\nr(R_code: str = ...) -> None\r\n```\r\n\r\n`r(R_code)` runs a string of R code inside ryp's R interpreter, which is \r\nembedded inside Python. It can contain multiple statements separated by\r\nsemicolons or newlines (e.g. within a triple-quoted Python string). It returns\r\n`None`; use `to_py()` instead if you would like to convert the result back to \r\nPython.\r\n\r\n`r()` with no arguments opens up an R terminal inside your Python terminal \r\nfor interactive debugging. Press `Ctrl + D` to exit back to the Python \r\nterminal. R variables defined from Python will be available in the R terminal,\r\nand variables defined in the R terminal will be available from Python once you\r\nexit:\r\n\r\n```python\r\n>> > from ryp.ryp.ryp import r\r\n>> > r('a = 1')\r\n>> > r()\r\n> a\r\n[1]\r\n1\r\n> b < - 2\r\n>\r\n>> > r('b')\r\n[1]\r\n2\r\n```\r\n\r\nNote that the default value for `R_code` is the special sentinel value `...` \r\n(`Ellipsis`) rather than `None`. This stops users from inadvertently opening \r\nthe terminal when passing a variable that is supposed to be a string but is \r\nunexpectedly `None`.\r\n\r\n### `to_r()`\r\n\r\n```python\r\nto_r(python_object: object, R_variable_name: str, *, \r\n     format: Literal['keep', 'matrix', 'data.frame'] | None = None,\r\n     rownames: object = None, colnames: object = None) -> None\r\n```\r\n\r\n`to_r(python_object, R_variable_name)` converts `python_object` to R, adding it\r\nto R's global namespace (`globalenv`) as a variable named `R_variable_name`. \r\n\r\nIf `python_object` is a container (`list`, `tuple`, or `dict`), `to_r()`\r\nrecursively converts each element and returns an R named list (if\r\n`python_object` is a `dict`) or unnamed list (if `python_object` is a `list` or\r\n`tuple`).\r\n\r\n#### The `format` argument\r\n\r\nBy default (`format='keep'`), ryp converts polars and pandas DataFrames (and \r\npandas MultiIndexes) into R data frames, and 2D NumPy arrays into R matrices. \r\nSpecify `format='matrix'` to convert everything (even DataFrames) to R matrices\r\n(in which case all DataFrame columns must have the same type), and \r\n`format='data.frame'` to convert everything (even 2D NumPy arrays) to R \r\ndata frames.\r\n\r\n`format` must be `None` unless `python_object` is a DataFrame, MultiIndex or 2D\r\nNumPy array \u2013 or unless `python_object` is a `list`, `tuple`, or `dict`, in \r\nwhich case the `format` will apply recursively to any DataFrames, MultiIndexes\r\nor 2D NumPy arrays it contains.\r\n\r\n#### The `rownames` and `colnames` arguments\r\n\r\nSince NumPy arrays, polars DataFrames and Series, and scipy sparse arrays and \r\nmatrices lack row and column names, you can specify these separately via the \r\n`rownames` and/or `colnames` arguments, and they will be added to the converted\r\nR object. `rownames` and `colnames` can be lists, tuples, string Series, or \r\ncategorical Series with string categories, and will be automatically converted\r\nto R character vectors. \r\n\r\n`rownames` and `colnames` must match the length or `shape[1]`, respectively, of\r\nthe object being converted. The one exception is that rownames of any length\r\nmay be added to a 0 &times; 0 polars DataFrame, since polars does not have the \r\nconcept of an `N` &times; 0 DataFrame for nonzero `N`. (Dropping all the \r\ncolumns of a polars DataFrame always results in a 0 &times; 0 DataFrame, even \r\nif the original DataFrame had more than 0 rows.)\r\n\r\nBecause Python `bool`, `int`, `float`, and `str` convert to length-1 R vectors\r\nthat support names, you can pass length-1 `rownames` when converting objects of\r\nthese types. You can also pass `rownames` and/or `colnames` when \r\n`python_object` is a `list`, `tuple`, or `dict`, in which case row and column \r\nnames will only be added to elements that support them. All elements that \r\nsupport `rownames` must have the same length as the `rownames`, and similarly \r\nfor the `colnames`. \r\n\r\n`rownames` cannot be specified if `python_object` is a pandas Series or \r\nDataFrame (since they already have rownames, i.e. an index), or \r\n`bytes`/`bytearray` (since these convert to `raw` vectors, which lack \r\nrownames). `colnames` cannot be specified unless `python_object` is a \r\nmultidimensional NumPy array or scipy sparse array or matrix, or something that\r\nmight contain one (`list`, `tuple`, or `dict`).\r\n\r\n### `to_py()`\r\n\r\n```python\r\nto_py(R_statement: str, *,\r\n      format: Literal['polars', 'pandas', 'pandas-pyarrow', 'numpy'] |\r\n              dict[Literal['vector', 'matrix', 'data.frame'],\r\n                   Literal['polars', 'pandas', 'pandas-pyarrow',\r\n                           'numpy']] | None = None,\r\n      index: str | Literal[False] | None = None,\r\n      squeeze: bool | None = None) -> Any\r\n```\r\n\r\n`to_py(R_statement)` runs a single statement of R code (which can be as simple \r\nas a single variable name) and converts the resulting R object to Python. \r\n\r\nIf the object is a list/S3 object, S4 object, or environment/R6 object, it\r\nrecursively converts each attribute/slot/field and returns a Python `dict` (or \r\n`list`, if the object is an unnamed list). For R6 objects, only public fields\r\nwill be converted.\r\n\r\n#### The `format` argument\r\n\r\nBy default, or when `format='polars'`, R vectors will be converted to polars \r\nSeries, and R data frames and matrices will be converted to polars DataFrames. \r\nYou can change this by setting the `format` argument to `'pandas'`, \r\n`'pandas-pyarrow'` (like `'pandas'`, but converting to pyarrow dtypes wherever \r\npossible) or `'numpy'`. (You can also change the default format, e.g. with \r\n`set_config(to_py_format='pandas')`.)\r\n\r\nFor finer-grained control, you can set `format` for only certain R variable \r\ntypes by specifying a dictionary with `'vector'`, `'matrix'`, and/or\r\n`'data.frame'` as keys and `'polars'`, `'pandas'`, `'pandas-pyarrow'` and/or \r\n`'numpy'` as values. \r\n\r\n`format` must be `None` when `R_statement` evaluates to `NULL`, when it \r\nevaluates to an array of 3 or more dimensions (these are always converted to \r\nNumPy arrays), or when the final result would be a Python scalar (see `squeeze`\r\nbelow).\r\n\r\n#### The `index` argument\r\n\r\nBy default, the R object's `names` or `rownames` will become the index (for \r\npandas) or the first column (for polars) of the output Python object, named \r\n`'index'`. Set the `index` argument to a different string to change this name, \r\nor set `index=False` to not convert the `names`/`rownames`. \r\n\r\nNote that for polars, the output will be a two-column DataFrame (not a Series!)\r\nwhen the input is an R vector, unless `index=False`. \r\n\r\nWhen the output is a NumPy array, `names` and `rownames` will always be \r\ndiscarded, since numeric NumPy arrays cannot store string indexes except with \r\nthe inefficient `dtype=object`. \r\n\r\n`index` must be `None` when `format='numpy'`, or when the final result would be\r\na Python scalar (see `squeeze` below).\r\n\r\n#### The `squeeze` argument\r\n\r\nBy default, length-1 R vectors, matrices and arrays will be converted to Python\r\nscalars instead of Python arrays, Series or DataFrames. Set `squeeze=False` to\r\ndisable this special case. (R data frames are never converted to Python scalars\r\neven if `squeeze=True`.) \r\n\r\n`squeeze` must be `None` unless the R object is a vector, matrix or array\r\n(`raw` vectors don't count, because they always convert to Python scalars).\r\n\r\n### `set_config()` and `get_config()`\r\n\r\n```python\r\nset_config(*, to_r_format=None, to_py_format=None, index=None, squeeze=None, \r\n           plot_width: int | float | None = None, \r\n           plot_height: int | float | None = None) -> None\r\n```\r\n\r\n`set_config` sets ryp's configuration settings. Arguments that are `None` are \r\nleft unchanged.\r\n    \r\nFor instance, to set pandas as the default format in `to_py()`, run \r\n`set_config(to_py_format='pandas')`.\r\n\r\n- `to_r_format`: the default value for the `format` parameter in `to_r()`; \r\n  must be `'keep'` (the default), `'matrix'`, or `'data.frame'`.\r\n- `to_py_format`: the default value for the `format` parameter in `to_py()`; \r\n  must be `'polars'` (the default), `'pandas'`, `'pandas-pyarrow'`, `'numpy'`,\r\n  or a dictionary with one of those four Python formats and/or `None` as values\r\n  and `'vector'`, `'matrix'` and/or `'data.frame'` as keys. If certain keys are \r\n  missing or have `None` as the format, leave their format unchanged.\r\n- `index`: the default value for the `index` parameter in to_py(); must be a \r\n  string (default: `'index'`) or `False`. \r\n- `squeeze`: the default value for the `squeeze` parameter in `to_py()`; must  \r\n  be `True` (the default) or `False`.\r\n- `plot_width`: the width, in inches, of inline plots in Jupyter notebooks;\r\n  must be a positive number. Defaults to 6.4 inches, to match Matplotlib's \r\n  default.\r\n- `plot_height`: the height, in inches, of inline plots in Jupyter notebooks;\r\n  must be a positive number. Defaults to 4.8 inches, to match Matplotlib's \r\n  default.\r\n\r\n```python\r\nget_config() -> dict[str, dict[str, str] | str | bool | int]\r\n```\r\n\r\n`get_config` returns the current configuration options as a dictionary, with \r\nkeys `to_r_format`, `to_py_format`, `index`, `squeeze`, `plot_width`, and \r\n`plot_height`.\r\n\r\n## Conversion rules\r\n\r\n### Python to R (`to_r()`)\r\n\r\n| Python                                                                  | R                                                                                                         |\r\n|-------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|\r\n| `None`                                                                  | `NULL` (if scalar) or `NA` (if inside NumPy, pandas or polars)                                            |\r\n| `nan`                                                                   | `NaN` (if scalar or inside polars) or `NA` (if inside NumPy or pandas)                                    |\r\n| `pd.NA`                                                                 | `NA`                                                                                                      |\r\n| `pd.NaT`, `np.datetime64('NaT')`, `np.timedelta64('NaT')`               | `NA`                                                                                                      |   \r\n| `bool`                                                                  | length-1 `logical` vector                                                                                 |\r\n| `int`                                                                   | length-1 `integer` (if `abs(x) <= 2_147_483_647`) or `bit64::integer64` vector                            |\r\n| `float`                                                                 | length-1 `numeric` vector                                                                                 |\r\n| `str`                                                                   | length-1 `character` vector                                                                               |\r\n| `complex`                                                               | length-1 `complex` vector                                                                                 |\r\n| `datetime.date`                                                         | length-1 `Date` vector                                                                                    |\r\n| `datetime.datetime`                                                     | length-1 `POSIXct` vector                                                                                 |\r\n| `datetime.timedelta`                                                    | length-1 `difftime(units='secs')` vector                                                                  |\r\n| `datetime.time` (`tzinfo` must be `None`)                               | length-1 `hms::hms` vector                                                                                |\r\n| `bytes`, `bytearray`                                                    | `raw` vector                                                                                              |\r\n| `list`, `tuple`                                                         | unnamed list                                                                                              |\r\n| `dict` (all keys must be strings)                                       | named list                                                                                                |\r\n| polars Series, pandas Series<sup>&ast;</sup>, pandas `Index`            | vector                                                                                                    |\r\n| polars DataFrame, pandas DataFrame<sup>&ast;</sup>, pandas `MultiIndex` | matrix<sup>&dagger;</sup> (if `format == 'matrix'`; all columns must have same data type) or `data.frame` |\r\n| 1D NumPy array                                                          | vector                                                                                                    |\r\n| 2D NumPy array                                                          | `data.frame` (if `format == 'data.frame'`) or matrix<sup>&dagger;</sup>                                   |\r\n| &ge; 3D NumPy array                                                     | array<sup>&dagger;</sup>                                                                                  |\r\n| 0D NumPy array (e.g. `np.array(1)`), NumPy generic (e.g. `np.int32(1)`) | length-1 vector                                                                                           |\r\n| `csr_array`, `csr_matrix`                                               | `dgRMatrix` (if int or float), `lgRMatrix` (if boolean), -- (if complex)                                  | \r\n| `csc_array`, `csc_matrix`                                               | `dgCMatrix` (if int or float), `lgCMatrix` (if boolean), -- (if complex)                                  |\r\n| `coo_array`, `coo_matrix`                                               | `dgTMatrix` (if int or float), `lgTMatrix` (if boolean), -- (if complex)                                  |\r\n\r\n#### NumPy data types\r\n\r\n| Python                                      | R                                                              |\r\n|---------------------------------------------|----------------------------------------------------------------|\r\n| `bool`                                      | `logical`                                                      |\r\n| `int8`, `uint8`, `int16`, `uint16`, `int32` | `integer`                                                      |\r\n| `uint32`, `uint64`                          | `integer` (if `x <= 2_147_483_647`) or `numeric`               |\r\n| `int64`                                     | `integer` (if `abs(x) <= 2_147_483_647`) or `bit64::integer64` |\r\n| `float16`, `float32`, `float64`, `float128` | `numeric` (note: `float128` loses precision)                   |\r\n| `complex64`, `complex128`                   | `complex`                                                      |\r\n| `bytes` (e.g. `'S1'`)                       | --                                                             |\r\n| `str`/`unicode` (e.g. `'U1'`)               | `character`                                                    |\r\n| `datetime64`                                | `POSIXct`                                                      | \r\n| `timedelta64`                               | `difftime(units='secs')`                                       |\r\n| `void` (unstructured)                       | `raw`                                                          |\r\n| `void` (structured)                         | --                                                             |\r\n| `object`                                    | depends on the contents<sup>&Dagger;</sup>                     |\r\n\r\n#### pandas-specific data types\r\n\r\n| Python                                                               | R                                                              |\r\n|----------------------------------------------------------------------|----------------------------------------------------------------|\r\n| `BooleanDtype`                                                       | `logical`                                                      |\r\n| `Int8Dtype`, `UInt8Dtype`, `Int16Dtype`, `UInt16Dtype`, `Int32Dtype` | `integer`                                                      |\r\n| `UInt32Dtype`, `UInt64Dtype`                                         | `integer` (if `x <= 2_147_483_647`) or `numeric`               |\r\n| `Int64Dtype`                                                         | `integer` (if `abs(x) <= 2_147_483_647`) or `bit64::integer64` |  \r\n| `Float32Dtype`, `Float64Dtype`                                       | `numeric`                                                      |\r\n| `StringDtype`                                                        | `character`                                                    |\r\n| `CategoricalDtype(ordered=False)`                                    | unordered `factor`                                             |\r\n| `CategoricalDtype(ordered=True)`                                     | ordered `factor`                                               |\r\n| `DatetimeTZDtype`, `PeriodDtype`                                     | `POSIXct`                                                      |\r\n| `IntervalDtype`, `SparseDtype`                                       | --                                                             |\r\n\r\n#### pandas Arrow data types (`pd.ArrowDtype`)\r\n\r\n| Python                                                     | R                                                              |\r\n|------------------------------------------------------------|----------------------------------------------------------------|\r\n| `pa.bool_`                                                 | `logical`                                                      |\r\n| `pa.int8`, `pa.uint8`, `pa.int16`, `pa.uint16`, `pa.int32` | `integer`                                                      |\r\n| `pa.uint32`, `pa.uint64`                                   | `integer` (if `x <= 2_147_483_647`) or `numeric`               |\r\n| `pa.int64`                                                 | `integer` (if `abs(x) <= 2_147_483_647`) or `bit64::integer64` |\r\n| `pa.float32`, `pa.float64`                                 | `numeric`                                                      |\r\n| `pa.string`, `pa.large_string`                             | `character`                                                    |\r\n| `pa.date32`                                                | `Date`                                                         |\r\n| `pa.date64`, `pa.timestamp`                                | `POSIXct`                                                      |\r\n| `pa.duration`                                              | `difftime(units='secs')`                                       |\r\n| `pa.time32`, `pa.time64`                                   | `hms::hms`                                                     |\r\n| `pa.dictionary(any integer type, pa.string(), ordered=0)`  | unordered `factor`                                             |\r\n| `pa.dictionary(any integer type, pa.string(), ordered=1)`  | ordered `factor`                                               |\r\n| `pa.null()`                                                | `vctrs::unspecified`                                           |\r\n\r\n#### Polars data types\r\n\r\n| Python                                      | R                                                              |\r\n|---------------------------------------------|----------------------------------------------------------------|\r\n| `Boolean`                                   | `logical`                                                      |\r\n| `Int8`, `UInt8`, `Int16`, `UInt16`, `Int32` | `integer`                                                      |\r\n| `UInt32`, `UInt64`                          | `integer` (if `x <= 2_147_483_647`) or `numeric`               |\r\n| `Int64`                                     | `integer` (if `abs(x) <= 2_147_483_647`) or `bit64::integer64` |\r\n| `Float32`, `Float64`                        | `numeric`                                                      |\r\n| `Date`                                      | `Date`                                                         |\r\n| `Datetime`                                  | `POSIXct`                                                      |\r\n| `Duration`                                  | `difftime(units='secs')`                                       |\r\n| `Time`                                      | `hms::hms`                                                     |\r\n| `String`                                    | `character`                                                    |\r\n| `Categorical`                               | unordered `factor`                                             |\r\n| `Enum`                                      | ordered `factor`                                               |\r\n| `Object`                                    | depends on the contents<sup>&Dagger;</sup>                     |\r\n| `Null`                                      | `vctrs::unspecified`                                           | \r\n| `Binary`, `Decimal`, `List`, `Array`        | --                                                             |\r\n\r\n#### Notes\r\n\r\n<sup>&ast;</sup> For pandas Series and DataFrames, string indexes (and \r\ncategorical indexes where the categories are strings) will be automatically\r\nconverted to `names`/`rownames`. The default index\r\n(`pd.RangeIndex(len(python_object))`) will be ignored. All other indexes are\r\ndisallowed. \r\n\r\n<sup>&dagger;</sup> Because R does not support `POSIXct` and `Date` matrices or\r\narrays, dates and datetimes cannot be converted to R matrices or arrays.\r\n\r\n<sup>&Dagger;</sup> For `dtype=object` and `dtype=pl.Object`, the output R type\r\ndepends on the contents, e.g. `'character'` if all elements are strings. Some\r\nadditional notes on ryp's handling of object data types:\r\n- `None`, `np.nan`, `pd.NA`, `pd.NaT`, `np.datetime64('NaT')`, and \r\n  `np.timedelta64('NaT')` are all treated as missing values &ndash; even for \r\n  polars, where `np.nan` is ordinarily treated as a floating-point number \r\n  rather than a missing value. \r\n- Length-0 and all-missing data will be converted to the `vctrs::unspecified` R\r\n  type (`vctrs` is part of the tidyverse). \r\n- If the elements are objects with a mix of types (or datetimes with a mix of\r\n  time zones), Arrow will generally cause the conversion to fail, though mixes\r\n  of related types (e.g. int and float) will be automatically cast to the\r\n  common supertype and succeed. \r\n- Conversion will also fail if the contents are objects that are not \r\n  representable as R vector elements. This includes `bytes`/`bytearray` (which\r\n  are only representable in R when scalar, as a `raw` vector) and Python\r\n  containers (`list`, `tuple`, and  `dict`). \r\n- pandas `Timedelta` objects will be rounded down to the nearest microsecond,\r\n  following the behavior of Arrow.\r\n\r\n### R to Python (`to_py()`)\r\n\r\n| R                                                                            | Python                                                                                                                                                                                        |\r\n|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\r\n| `NULL`                                                                       | `None`                                                                                                                                                                                        |\r\n| `NA`                                                                         | `None` (if scalar or `format='polars'`), `None`/`nan`/`pd.NA`/`pd.NaT`/`np.datetime64('NaT', 'us')`/`np.timedelta64('NaT', 'ns')`/etc. (if `format='numpy'` `'pandas'` or `'pandas-pyarrow'`) |\r\n| `NaN`                                                                        | `nan`                                                                                                                                                                                         |\r\n| length-1 vector, matrix or array, `squeeze == False`                         | scalar                                                                                                                                                                                        | \r\n| vector or 1D array, `format == 'numpy'`                                      | 1D NumPy array                                                                                                                                                                                |\r\n| vector or 1D array, `format == 'pandas'` or `format == 'pandas-pyarrow'`     | pandas Series                                                                                                                                                                                 |\r\n| vector or 1D array, `format == 'polars'`                                     | polars Series (if `index=False`) or two-column DataFrame                                                                                                                                      |\r\n| matrix or `data.frame`, `format == 'numpy'`                                  | 2D NumPy array                                                                                                                                                                                |\r\n| matrix or `data.frame`, `format == 'pandas'` or `format == 'pandas-pyarrow'` | pandas DataFrame                                                                                                                                                                              |\r\n| matrix or `data.frame`, `format == 'polars'`                                 | polars DataFrame                                                                                                                                                                              |\r\n| &ge; 3D array                                                                | NumPy array                                                                                                                                                                                   |  \r\n| unnamed list                                                                 | `list`                                                                                                                                                                                        |\r\n| named list, S3 object, S4 object, environment, S6 object                     | `dict`                                                                                                                                                                                        |\r\n| `dgRMatrix`                                                                  | `csr_array(dtype='int32')`                                                                                                                                                                    |\r\n| `dgCMatrix`                                                                  | `csc_array(dtype='int32')`                                                                                                                                                                    |\r\n| `dgTMatrix`                                                                  | `coo_array(dtype='int32')`                                                                                                                                                                    |\r\n| `lgRMatrix`, `ngRMatrix`                                                     | `csr_array(dtype=bool)`                                                                                                                                                                       |\r\n| `lgCMatrix`, `ngCMatrix`                                                     | `csc_array(dtype=bool)`                                                                                                                                                                       |\r\n| `lgTMatrix`, `ngTMatrix`                                                     | `coo_array(dtype=bool)`                                                                                                                                                                       |\r\n| formula (`~`)                                                                | --                                                                                                                                                                                            |\r\n\r\n#### Data types\r\n\r\n| R                           | Python scalar                        | NumPy                                                    | pandas                                                   | pandas-pyarrow                                                 | polars                                     |\r\n|-----------------------------|--------------------------------------|----------------------------------------------------------|----------------------------------------------------------|----------------------------------------------------------------|--------------------------------------------|\r\n| `logical`                   | `bool`                               | `bool`                                                   | `bool`                                                   | `ArrowDtype(pa.bool_())`                                       | `Boolean`                                  |\r\n| `integer`                   | `int`                                | `int32`                                                  | `int32`                                                  | `ArrowDtype(pa.int32())`                                       | `Int32`                                    |\r\n| `bit64::integer64`          | `int`                                | `int64`                                                  | `int64`                                                  | `ArrowDtype(pa.int64())`                                       | `Int64`                                    |\r\n| `numeric`                   | `float`                              | `float`                                                  | `float`                                                  | `ArrowDtype(pa.float64())`                                     | `Float64`                                  |\r\n| `character`                 | `str`                                | `object` (with `str` elements)                           | `object` (with `str` elements)                           | `ArrowDtype(pa.string())`                                      | `String`                                   |\r\n| `complex`                   | `complex`                            | `complex128`                                             | `complex128`                                             | `complex128`                                                   | --                                         |\r\n| `raw`                       | `bytearray`                          | --                                                       | --                                                       | --                                                             | --                                         |\r\n| unordered `factor`          | `str`                                | `object` (with `str` elements)                           | `CategoricalDtype(ordered=False)`                        | `ArrowDtype(pa.dictionary(pa.int8(), pa.string(), ordered=0))` | `Categorical`                              |\r\n| ordered `factor`            | `str`                                | `object` (with `str` elements)                           | `CategoricalDtype(ordered=True)`                         | `ArrowDtype(pa.dictionary(pa.int8(), pa.string(), ordered=1))` | `Enum`                                     |\r\n| `POSIXct` without time zone | `datetime.datetime`<sup>&ast;</sup>  | `datetime64[us]`<sup>&ast;</sup>                         | `datetime64[us]`<sup>&ast;</sup>                         | `ArrowDtype(pa.timestamp('us'))`<sup>&ast;</sup>               | `Datetime('us')`<sup>&ast;</sup>           |\r\n| `POSIXct` with time zone    | `datetime.datetime`<sup>&ast;</sup>  | `datetime64[us]`<sup>&ast;</sup> (time zone discarded)   | `DatetimeTZDtype('us', time_zone)`<sup>&ast;</sup>       | `ArrowDtype(pa.timestamp('us', time_zone))`<sup>&ast;</sup>    | `Datetime('us, time_zone)`<sup>&ast;</sup> | \r\n| `POSIXlt`                   | `dict` of scalars                    | `dict` of NumPy arrays                                   | `dict` of pandas Series                                  | `dict` of pandas Series                                        | `dict` of polars Series                    |\r\n| `Date`                      | `datetime.date`                      | `datetime64[D]`                                          | `datetime64[ms]`                                         | `ArrowDtype(pa.date32('day'))`                                 | `Date`                                     |\r\n| `difftime`                  | `datetime.timedelta`<sup>&ast;</sup> | `timedelta64[ns]`                                        | `timedelta64[ns]`                                        | `ArrowDtype(pa.duration('ns'))`                                | `Duration(time_unit='ns')`                 |\r\n| `hms::hms`                  | `datetime.time`<sup>&ast;</sup>      | `object` (with `datetime.time` elements)<sup>&ast;</sup> | `object` (with `datetime.time` elements)<sup>&ast;</sup> | `ArrowDtype(pa.time64('ns'))`<sup>&ast;</sup>                  | `Time`                                     |\r\n| `vctrs::unspecified`        | `None`                               | `object` (with `None` elements)                          | `object` (with `None` elements)                          | `ArrowDtype(pa.null())`                                        | `Null`                                     |\r\n\r\n<sup>&ast;</sup> Due to the limitations of conversion with Arrow, `POSIXct` and\r\n`hms::hms` values are rounded down to the nearest microsecond when converting\r\nto Python, except for `hms::hms` when converting to polars. `difftime` values\r\nare also rounded down to the nearest microsecond, but only when converting to\r\nscalar `datetime.timedelta` values (which cannot represent nanoseconds).\r\n\r\n## Examples\r\n\r\n1. Apply R's `scale()` function to a pandas DataFrame:\r\n\r\n```python\r\n>>> import pandas as pd\r\n>>> from ryp import r, to_py, to_r, set_config\r\n>>> set_config(to_py_format='pandas')\r\n>>> data = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 3, 4]})\r\n>>> to_r(data, 'data')\r\n>>> r('data')\r\n  a b\r\n1 1 1\r\n2 2 3\r\n3 3 4\r\n>>> r('data <- scale(data)')  # scale the R data.frame\r\n>>> scaled_data = to_py('data')  # convert the R data.frame to Python\r\n>>> scaled_data\r\n     a         b\r\n0 -1.0 -1.091089\r\n1  0.0  0.218218\r\n2  1.0  0.872872\r\n```\r\nNote: we could have just written `to_py('scale(data)')` instead of\r\n`r('data <- scale(data)')` followed by `to_py('data')`.\r\n\r\n2. Run a linear model on a polars DataFrame:\r\n\r\n```python\r\n>>> import polars as pl\r\n>>> from ryp import r, to_py, to_r\r\n>>> data = pl.DataFrame({'y': [7, 1, 2, 3, 6], 'x': [5, 2, 3, 2, 5]})\r\n>>> to_r(data, 'data')\r\n>>> r('model <- lm(y ~ x, data=data)')\r\n>>> coef = to_py('summary(model)$coefficients', index='variable')\r\n>>> p_value = coef.filter(variable='x').select('Pr(>|t|)')[0, 0]\r\n>>> p_value\r\n0.02831035772841884\r\n```\r\n\r\n3. Recursive conversion, showcasing all the keyword arguments to `to_r()` and\r\n   `to_py()`:\r\n\r\n```python\r\n>>> import numpy as np\r\n>>> from ryp import r, to_py, to_r\r\n>>> arrays = {'ints': np.array([[1, 2], [3, 4]]),\r\n...           'floats': np.array([[0.5, 1.5], [2.5, 3.5]])}\r\n>>> to_r(arrays, 'arrays', format='data.frame',\r\n...      rownames = ['row1', 'row2'], colnames = ['col1', 'col2'])\r\n>>> r('arrays')\r\n$ints\r\n     col1 col2\r\nrow1    1    2\r\nrow2    3    4\r\n\r\n$floats\r\n     col1 col2\r\nrow1  0.5  1.5\r\nrow2  2.5  3.5\r\n>>> arrays = to_py('arrays', format='pandas', index='foo')\r\n>>> arrays['ints']\r\n      col1  col2\r\nfoo\r\nrow1     1     2\r\nrow2     3     4\r\n>>> arrays['floats']\r\n      col1  col2\r\nfoo\r\nrow1   0.5   1.5\r\nrow2   2.5   3.5\r\n```\r\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "R inside Python",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/Wainberg/ryp/issues",
        "Homepage": "https://github.com/Wainberg/ryp"
    },
    "split_keywords": [
        "r",
        " rpy2",
        " reticulate",
        " tidyverse",
        " ggplot",
        " ggplot2",
        " arrow"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f40d891db84a0a1bc0edad614d316ad0c306dea76599ac1b534fb576181b143f",
                "md5": "19fb32abdd059a5296da12363302899e",
                "sha256": "3b8f02b851af2915485b74a22fb5323d4837811aea21fdfb65d9af2c6307c0ba"
            },
            "downloads": -1,
            "filename": "ryp-0.1.0-1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "19fb32abdd059a5296da12363302899e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 48712,
            "upload_time": "2024-09-29T03:11:09",
            "upload_time_iso_8601": "2024-09-29T03:11:09.020453Z",
            "url": "https://files.pythonhosted.org/packages/f4/0d/891db84a0a1bc0edad614d316ad0c306dea76599ac1b534fb576181b143f/ryp-0.1.0-1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f9a2a3ac6252d48e55a3ff65f378d0aa5d208f591649127420867e463f908ab5",
                "md5": "03071925bab7fd2825433a81b8bd6dbb",
                "sha256": "da9a3c8eda43a5b3ace0431b85da51f3cc67dd0121a0084b400f97a113de122a"
            },
            "downloads": -1,
            "filename": "ryp-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "03071925bab7fd2825433a81b8bd6dbb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 48718,
            "upload_time": "2024-09-29T03:04:01",
            "upload_time_iso_8601": "2024-09-29T03:04:01.930106Z",
            "url": "https://files.pythonhosted.org/packages/f9/a2/a3ac6252d48e55a3ff65f378d0aa5d208f591649127420867e463f908ab5/ryp-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d81babfd1da98e46ea28389243fc36504ef853e82cbd219fbfd04b21bf094735",
                "md5": "22e09029cd18baac1f21bae184b59737",
                "sha256": "d120624eee2074e6b6df4b51cbd40394861c921000c0e43d973fd62c81436ebf"
            },
            "downloads": -1,
            "filename": "ryp-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "22e09029cd18baac1f21bae184b59737",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 64250,
            "upload_time": "2024-09-29T03:04:03",
            "upload_time_iso_8601": "2024-09-29T03:04:03.701603Z",
            "url": "https://files.pythonhosted.org/packages/d8/1b/abfd1da98e46ea28389243fc36504ef853e82cbd219fbfd04b21bf094735/ryp-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-29 03:04:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Wainberg",
    "github_project": "ryp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ryp"
}
        
Elapsed time: 1.18402s