numpycythonpermutations

Name	numpycythonpermutations JSON
Version	0.10 JSON
	download
home_page	https://github.com/hansalemaos/numpycythonpermutations
Summary	Permutations, Combinations and Product for Numpy - written in Cython - 20x faster than itertools
upload_time	2024-02-03 15:06:07
maintainer
docs_url	None
author	Johannes Fischer
requires_python
license	MIT
keywords	permutations numpy combinations product
VCS
bugtrack_url
requirements	cycompi flatten_any_dict_iterable_or_whatsoever numpy
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
# Efficient NumPy Permutations, Combinations, and Product using Cython/C++/OpenMP


Generate permutations, combinations, and product sets with NumPy efficiently. 
The provided `generate_product` function is designed to outperform the standard 
itertools library, offering more than 20x speed improvement.




- Utilizes a "Yellow-line-free" Cython Backend for high speed performance.
- Implements OpenMP multiprocessing for parallel processing.
- Compiles on the first run (requires a C/C++ compiler installed on your PC).
- Achieves 90% less memory usage compared to itertools.
- Performance scales with data size, making it ideal for large datasets.
- Efficiently creates a lookup NumPy array with a lightweight dtype (typically np.uint8, unless you are combining more than 255 different elements).
- Utilizes numpy indexing for memory savings - depending on the datatype (and your luck :-) ), numpy shows you only element views, which means, you are saving a loooooooooooooooooooot of memory

## Supported Functionality


<table><thead><tr><th><p>Iterator</p></th><th><p>Arguments</p></th><th><p>Results</p></th></tr></thead><tbody><tr><td><p><a href="https://docs.python.org/3/library/itertools.html#itertools.product" title="itertools.product"><code><span>product()</span></code></a></p></td><td><p>p, q, … [repeat=1]</p></td><td><p>cartesian product, equivalent to a nested for-loop</p></td></tr><tr><td><p><a href="https://docs.python.org/3/library/itertools.html#itertools.permutations" title="itertools.permutations"><code><span>permutations()</span></code></a></p></td><td><p>p[, r]</p></td><td><p>r-length tuples, all possible orderings, no repeated elements</p></td></tr><tr><td><p><a href="https://docs.python.org/3/library/itertools.html#itertools.combinations" title="itertools.combinations"><code><span>combinations()</span></code></a></p></td><td><p>p, r</p></td><td><p>r-length tuples, in sorted order, no repeated elements</p></td></tr><tr><td><p><a href="https://docs.python.org/3/library/itertools.html#itertools.combinations_with_replacement" title="itertools.combinations_with_replacement"><code><span>combinations_with_replacement()</span></code></a></p></td><td><p>p, r</p></td><td><p>r-length tuples, in sorted order, with repeated elements</p></td></tr></tbody></table>


## Getting Started

### Only tested on Windows 10 / Python 3.11

```python
- Make sure you have Python and a C/C++ compiler installed 
- Use pip install numpycythonpermutations or download it from Github
```


## Some examples

## Generating all RGB colors in 200 ms.

#### more than 25 times faster than itertools generating all RGB colors


```python
import numpy as np
import itertools
from numpycythonpermutations import generate_product

# RGB COLORS:

args = np.asarray( # The input must be always 2 dimensional (list or numpy)
    [
        list(range(256)),
        list(range(256)),
        list(range(256)),
    ],
    dtype=np.uint8,
)

In [17]: %timeit resus = np.array(list(itertools.product(*args)))
5.88 s ± 78.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

...: %timeit resus = generate_product(args, remove_duplicates=False, str_format_function=repr, multicpu=True, return_index_only=False, max_reps_rows=-1, r=-1, dummyval="DUMMYVAL")
232 ms ± 31.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

```

### But even 2.5x faster when using a tiny database 


```python
2.5x times faster using little data
args = np.asarray(
    [
        list(range(5)),
        list(range(5)),
        list(range(5)),
    ],
    dtype=np.uint8,
)

In [23]: %timeit np.array(list(itertools.product(*args)))
39.3 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [25]: %timeit resus = generate_product(args, remove_duplicates=False, str_format_function=repr, multicpu=True, return_index_only=False, max_reps_rows=-1, r=-1, dummyval="DUMMYVAL")
19.2 µs ± 176 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
```

### Attention! The output is different (Fortran-styled order) from itertools:

#### Itertools

```python


array([
[  0,   0,   0],
[  1,   0,   0],
[  2,   0,   0],
...,
[253, 255, 255],
[254, 255, 255],
[255, 255, 255]], dtype=np.uint8)

```

#### numpycythonpermutations


```python
array(
     [[  0,   0,   0],
      [  0,   0,   1],
      [  0,   0,   2],
      ...,
      [255, 255, 253],
      [255, 255, 254],
      [255, 255, 255]])
```

## Deleting duplicates      

```python
args = [
    [1, 2, 3, 4],
    [2, 0, 0, 2],
    [2, 1, 6, 2],
    [1, 2, 3, 4],
]
resus1 = generate_product(
    args,
    remove_duplicates=True,
)

print(resus1)
print(resus1.shape)
In [15]: resus1
Out[15]:
array([[1, 2, 2, 1],
[2, 0, 2, 3],
[2, 2, 2, 1],
[3, 2, 2, 1],
...
[4, 0, 1, 4],
[1, 2, 6, 4],
[3, 2, 6, 4],
[4, 2, 6, 4],
[1, 0, 6, 4],
[3, 0, 6, 4],
[4, 0, 6, 4]])
In [18]: resus1.shape
Out[18]: (96, 4)

# Without removing duplicates

args = [
    [1, 2, 3, 4],
    [2, 0, 0, 2],
    [2, 1, 6, 2],
    [1, 2, 3, 4],
]
resus2 = generate_product(
    args,
    remove_duplicates=False,
)
print(resus2.shape)

In [16]: resus2
Out[16]:
array([[1, 2, 2, 1],
[2, 2, 2, 1],
[3, 2, 2, 1],
...,
[2, 2, 2, 4],
[3, 2, 2, 4],
[4, 2, 2, 4]])
In [17]: resus2.shape
Out[17]: (256, 4)
```

## Filtering Data

### To get all colors whose RGB values are R!=G!=B

#### The order of any filtered output may vary each time due to multicore parsing.

```python

args = [
    list(range(256)),
    list(range(256)),
    list(range(256)),
]

generate_product(args, max_reps_rows=1)

array([[119, 158, 238],
[ 50,   2,   0],
[226, 251,  90],
...,
[244, 254, 255],
[245, 254, 255],
[246, 254, 255]])

# But it takes some time to filter 16,7 Million colors:

In [38]: %timeit generate_product(args, max_reps_rows=1)
11.7 s ± 437 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Passing a NumPy array is a little faster

args = np.asarray(
    [
        list(range(256)),
        list(range(256)),
        list(range(256)),
    ],
    dtype=np.uint8,
)

In [2]: %timeit generate_product(args, max_reps_rows=1)
9.94 s ± 209 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Another example
args = [
    [2, 1, 3, 4],
    [4, 4, 3, 4],
]
resus = generate_product(args, 
remove_duplicates=True, # removes all duplicated rows
r=len(args[0]), # similar to itertools
max_reps_rows=2) # allows only 2 occurrences of the same element in the same row

[[1 1 2 2 4 3 3 4]
[1 1 2 2 4 3 4 3]
[1 2 2 1 3 4 3 4]
[2 1 2 1 3 4 4 3]
[1 1 2 2 3 3 4 4]
[2 1 1 2 3 4 3 4]
[2 1 2 1 3 4 3 4]
[1 1 2 2 3 4 3 4]


# Another example

args = [
    [1, 2, 3, 4],
]

resus = generate_product(args, remove_duplicates=False, r=len(args[0]))
print(resus)
print(resus.shape)

[[1 2 3 4]
[2 2 3 4]
[3 2 3 4]
...
[2 1 2 3]
[3 1 2 3]
[4 1 2 3]]
(256, 4)

```

## You can mix data types


```python
args = [
    [[1, 2], 3, 4],
    [3, "xxxxx", 3, 6],
    [2, 0, 0, 2],
    [2, 0, [0, 2]],
    [8, 2, 8, 2],
    [4, 5, 4, 5],
    [[3, 3], 3, 6],
    [4, 5, 4, 5],
    [0, {2, 3, 4}, 8, 7],
    [1, 2, b"xxx3", 4],
]

q = generate_product(args, remove_duplicates=False)

Out[6]:
array([[list([1, 2]), 3, 2, ..., 4, 0, 1],
[3, 3, 2, ..., 4, 0, 1],
[4, 3, 2, ..., 4, 0, 1],
...,
[list([1, 2]), 6, 2, ..., 5, 7, 4],
[3, 6, 2, ..., 5, 7, 4],
[4, 6, 2, ..., 5, 7, 4]], dtype=object)



the function repr is usually used to filter Not-Numpy-Friendly-Data
This might lead to some problems, e.g. pandas DataFrames which are usually not
fully shown when calling __repr__
In these cases, you can pass a custom function to str_format_function
 (but to be honest: Who the hell puts a pandas DataFrame inside a NumPy array?)

# Example for a function (The string is only used for indexing)
str_format_function = (
    lambda x: x.to_string() if isinstance(x, pd.DataFrame) else repr(x)
)


import pandas as pd

args = [
    [2, 1, 3, 4],
    [4, 4, 3, 4],
    [
        pd.read_csv(
            "https://github.com/datasciencedojo/datasets/blob/master/titanic.csv",
            on_bad_lines="skip",
        ),
        np.array([222, 3]),
        dict(baba=333, bibi=444),
    ],
]

resus = generate_product(
    args,
    remove_duplicates=True,
    r=len(args[0]),
    max_reps_rows=-1,
    str_format_function=str_format_function,
)
print(resus)
print(resus.shape)

Ain't it pretty? hahaha

[[4 3 2 ... {'baba': 333, 'bibi': 444}
<!DOCTYPE html>
0                                                 <html
1                                             lang="en"
2       data-color-mode="auto" data-light-theme="lig...
3       data-a11y-animated-images="system" data-a11y...
4                                                     >
...                                                 ...
1062                                             </div>
1063      <div id="js-global-screen-reader-notice" c...
1064      <div id="js-global-screen-reader-notice-as...
1065                                            </body>
1066                                            </html>

[1067 rows x 1 columns]
array([222,   3])]
[2 1 1 ... {'baba': 333, 'bibi': 444} {'baba': 333, 'bibi': 444}
array([222,   3])]
[1 1 3 ...                                         <!DOCTYPE html>
0                                                 <html
1                                             lang="en"
2       data-color-mode="auto" data-light-theme="lig...
3       data-a11y-animated-images="system" data-a11y...
4                                                     >
...                                                 ...
1062                                             </div>

```

## Inhomogeneous Shapes? No problem!

```python 
# An Inhomogeneous Shape is also no problem. 
# Just make sure that the default dummy value dummyval="DUMMYVAL" is not in your Array (not very likely, I guess)

a = [1, 2]
b = [3, 4]
c = [5, 6, 7]
d = [8, 9, 10]
total = [a, b, c, d]

resus = generate_product(total, remove_duplicates=True, dummyval="DUMMYVAL")
print(resus)

[[2 3 6 9]
[1 3 5 8]
[1 3 6 9]
[1 4 6 8]
[1 4 5 8]
[2 3 5 8]
[1 4 7 9]
[1 3 7 9]
...
[2 3 7 9]
[2 4 7 9]
[2 3 5 10]
[2 4 5 10]
[1 3 6 10]
[2 3 6 10]
[1 4 6 10]
[2 4 6 10]
[2 4 7 10]]

a = [1, 2, 3]
b = [3, 4, 4]
c = [5, 6]
d = [8, 9, 10]
total = [a, b, c, d]

resus = generate_product(total, remove_duplicates=True, dummyval="DUMMYVAL")
print(resus)
[[1 3 5 8]
[3 4 6 10]
[1 3 5 10]
[2 4 5 9]
[2 4 6 8]
[3 3 5 8]
...
[3 3 6 9]
[2 3 6 10]
[2 4 6 9]
[1 4 6 10]
[3 4 6 9]
[3 3 5 10]
[1 3 6 10]
[1 4 5 10]
[2 4 5 10]
[3 4 5 10]
[3 3 6 10]]

``` 


## How to get the index

```python 

# To save memory, the function can only return the index, this saves a lot of memory 
# and you can access each element by looping through the data and accessing the input Element

args = [
    [100, 200, 300, 400],
    [300, 300, 300, 600],
    [200, 000, 000, 200],
    [200, 000, 000, 200],
    [800, 200, 800, 200],
    [400, 500, 400, 500],
    [300, 300, 300, 600],
    [400, 500, 400, 500],
    [000, 900, 800, 700],
    [100, 200, 300, 400],
]

resus = generate_product(
    args,
    remove_duplicates=False,
    return_index_only=True,
)
print(resus)
print(resus.shape)

The function returns:
[[0 2 1 ... 3 5 0]
[1 2 1 ... 3 5 0]
[2 2 1 ... 3 5 0]
...
[1 4 1 ... 7 9 3]
[2 4 1 ... 7 9 3]
[3 4 1 ... 7 9 3]]
(1048576, 10)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hansalemaos/numpycythonpermutations",
    "name": "numpycythonpermutations",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Permutations,numpy,Combinations,Product",
    "author": "Johannes Fischer",
    "author_email": "aulasparticularesdealemaosp@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/06/70/12908607d901eae1621940813a9877a97b37e7a187c9b92d76fb78e49a73/numpycythonpermutations-0.10.tar.gz",
    "platform": null,
    "description": "\r\n# Efficient NumPy Permutations, Combinations, and Product using Cython/C++/OpenMP\r\n\r\n\r\nGenerate permutations, combinations, and product sets with NumPy efficiently. \r\nThe provided `generate_product` function is designed to outperform the standard \r\nitertools library, offering more than 20x speed improvement.\r\n\r\n\r\n\r\n\r\n- Utilizes a \"Yellow-line-free\" Cython Backend for high speed performance.\r\n- Implements OpenMP multiprocessing for parallel processing.\r\n- Compiles on the first run (requires a C/C++ compiler installed on your PC).\r\n- Achieves 90% less memory usage compared to itertools.\r\n- Performance scales with data size, making it ideal for large datasets.\r\n- Efficiently creates a lookup NumPy array with a lightweight dtype (typically np.uint8, unless you are combining more than 255 different elements).\r\n- Utilizes numpy indexing for memory savings - depending on the datatype (and your luck :-) ), numpy shows you only element views, which means, you are saving a loooooooooooooooooooot of memory\r\n\r\n## Supported Functionality\r\n\r\n\r\n<table><thead><tr><th><p>Iterator</p></th><th><p>Arguments</p></th><th><p>Results</p></th></tr></thead><tbody><tr><td><p><a href=\"https://docs.python.org/3/library/itertools.html#itertools.product\" title=\"itertools.product\"><code><span>product()</span></code></a></p></td><td><p>p, q, \u2026 [repeat=1]</p></td><td><p>cartesian product, equivalent to a nested for-loop</p></td></tr><tr><td><p><a href=\"https://docs.python.org/3/library/itertools.html#itertools.permutations\" title=\"itertools.permutations\"><code><span>permutations()</span></code></a></p></td><td><p>p[, r]</p></td><td><p>r-length tuples, all possible orderings, no repeated elements</p></td></tr><tr><td><p><a href=\"https://docs.python.org/3/library/itertools.html#itertools.combinations\" title=\"itertools.combinations\"><code><span>combinations()</span></code></a></p></td><td><p>p, r</p></td><td><p>r-length tuples, in sorted order, no repeated elements</p></td></tr><tr><td><p><a href=\"https://docs.python.org/3/library/itertools.html#itertools.combinations_with_replacement\" title=\"itertools.combinations_with_replacement\"><code><span>combinations_with_replacement()</span></code></a></p></td><td><p>p, r</p></td><td><p>r-length tuples, in sorted order, with repeated elements</p></td></tr></tbody></table>\r\n\r\n\r\n## Getting Started\r\n\r\n### Only tested on Windows 10 / Python 3.11\r\n\r\n```python\r\n- Make sure you have Python and a C/C++ compiler installed \r\n- Use pip install numpycythonpermutations or download it from Github\r\n```\r\n\r\n\r\n## Some examples\r\n\r\n## Generating all RGB colors in 200 ms.\r\n\r\n#### more than 25 times faster than itertools generating all RGB colors\r\n\r\n\r\n```python\r\nimport numpy as np\r\nimport itertools\r\nfrom numpycythonpermutations import generate_product\r\n\r\n# RGB COLORS:\r\n\r\nargs = np.asarray( # The input must be always 2 dimensional (list or numpy)\r\n    [\r\n        list(range(256)),\r\n        list(range(256)),\r\n        list(range(256)),\r\n    ],\r\n    dtype=np.uint8,\r\n)\r\n\r\nIn [17]: %timeit resus = np.array(list(itertools.product(*args)))\r\n5.88 s \u00b1 78.5 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n\r\n...: %timeit resus = generate_product(args, remove_duplicates=False, str_format_function=repr, multicpu=True, return_index_only=False, max_reps_rows=-1, r=-1, dummyval=\"DUMMYVAL\")\r\n232 ms \u00b1 31.6 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n\r\n```\r\n\r\n### But even 2.5x faster when using a tiny database \r\n\r\n\r\n```python\r\n2.5x times faster using little data\r\nargs = np.asarray(\r\n    [\r\n        list(range(5)),\r\n        list(range(5)),\r\n        list(range(5)),\r\n    ],\r\n    dtype=np.uint8,\r\n)\r\n\r\nIn [23]: %timeit np.array(list(itertools.product(*args)))\r\n39.3 \u00b5s \u00b1 113 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\r\n\r\nIn [25]: %timeit resus = generate_product(args, remove_duplicates=False, str_format_function=repr, multicpu=True, return_index_only=False, max_reps_rows=-1, r=-1, dummyval=\"DUMMYVAL\")\r\n19.2 \u00b5s \u00b1 176 ns per loop (mean \u00b1 std. dev. of 7 runs, 100,000 loops each)\r\n```\r\n\r\n### Attention! The output is different (Fortran-styled order) from itertools:\r\n\r\n#### Itertools\r\n\r\n```python\r\n\r\n\r\narray([\r\n[  0,   0,   0],\r\n[  1,   0,   0],\r\n[  2,   0,   0],\r\n...,\r\n[253, 255, 255],\r\n[254, 255, 255],\r\n[255, 255, 255]], dtype=np.uint8)\r\n\r\n```\r\n\r\n#### numpycythonpermutations\r\n\r\n\r\n```python\r\narray(\r\n     [[  0,   0,   0],\r\n      [  0,   0,   1],\r\n      [  0,   0,   2],\r\n      ...,\r\n      [255, 255, 253],\r\n      [255, 255, 254],\r\n      [255, 255, 255]])\r\n```\r\n\r\n## Deleting duplicates      \r\n\r\n```python\r\nargs = [\r\n    [1, 2, 3, 4],\r\n    [2, 0, 0, 2],\r\n    [2, 1, 6, 2],\r\n    [1, 2, 3, 4],\r\n]\r\nresus1 = generate_product(\r\n    args,\r\n    remove_duplicates=True,\r\n)\r\n\r\nprint(resus1)\r\nprint(resus1.shape)\r\nIn [15]: resus1\r\nOut[15]:\r\narray([[1, 2, 2, 1],\r\n[2, 0, 2, 3],\r\n[2, 2, 2, 1],\r\n[3, 2, 2, 1],\r\n...\r\n[4, 0, 1, 4],\r\n[1, 2, 6, 4],\r\n[3, 2, 6, 4],\r\n[4, 2, 6, 4],\r\n[1, 0, 6, 4],\r\n[3, 0, 6, 4],\r\n[4, 0, 6, 4]])\r\nIn [18]: resus1.shape\r\nOut[18]: (96, 4)\r\n\r\n# Without removing duplicates\r\n\r\nargs = [\r\n    [1, 2, 3, 4],\r\n    [2, 0, 0, 2],\r\n    [2, 1, 6, 2],\r\n    [1, 2, 3, 4],\r\n]\r\nresus2 = generate_product(\r\n    args,\r\n    remove_duplicates=False,\r\n)\r\nprint(resus2.shape)\r\n\r\nIn [16]: resus2\r\nOut[16]:\r\narray([[1, 2, 2, 1],\r\n[2, 2, 2, 1],\r\n[3, 2, 2, 1],\r\n...,\r\n[2, 2, 2, 4],\r\n[3, 2, 2, 4],\r\n[4, 2, 2, 4]])\r\nIn [17]: resus2.shape\r\nOut[17]: (256, 4)\r\n```\r\n\r\n## Filtering Data\r\n\r\n### To get all colors whose RGB values are R!=G!=B\r\n\r\n#### The order of any filtered output may vary each time due to multicore parsing.\r\n\r\n```python\r\n\r\nargs = [\r\n    list(range(256)),\r\n    list(range(256)),\r\n    list(range(256)),\r\n]\r\n\r\ngenerate_product(args, max_reps_rows=1)\r\n\r\narray([[119, 158, 238],\r\n[ 50,   2,   0],\r\n[226, 251,  90],\r\n...,\r\n[244, 254, 255],\r\n[245, 254, 255],\r\n[246, 254, 255]])\r\n\r\n# But it takes some time to filter 16,7 Million colors:\r\n\r\nIn [38]: %timeit generate_product(args, max_reps_rows=1)\r\n11.7 s \u00b1 437 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n\r\n# Passing a NumPy array is a little faster\r\n\r\nargs = np.asarray(\r\n    [\r\n        list(range(256)),\r\n        list(range(256)),\r\n        list(range(256)),\r\n    ],\r\n    dtype=np.uint8,\r\n)\r\n\r\nIn [2]: %timeit generate_product(args, max_reps_rows=1)\r\n9.94 s \u00b1 209 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n\r\n# Another example\r\nargs = [\r\n    [2, 1, 3, 4],\r\n    [4, 4, 3, 4],\r\n]\r\nresus = generate_product(args, \r\nremove_duplicates=True, # removes all duplicated rows\r\nr=len(args[0]), # similar to itertools\r\nmax_reps_rows=2) # allows only 2 occurrences of the same element in the same row\r\n\r\n[[1 1 2 2 4 3 3 4]\r\n[1 1 2 2 4 3 4 3]\r\n[1 2 2 1 3 4 3 4]\r\n[2 1 2 1 3 4 4 3]\r\n[1 1 2 2 3 3 4 4]\r\n[2 1 1 2 3 4 3 4]\r\n[2 1 2 1 3 4 3 4]\r\n[1 1 2 2 3 4 3 4]\r\n\r\n\r\n# Another example\r\n\r\nargs = [\r\n    [1, 2, 3, 4],\r\n]\r\n\r\nresus = generate_product(args, remove_duplicates=False, r=len(args[0]))\r\nprint(resus)\r\nprint(resus.shape)\r\n\r\n[[1 2 3 4]\r\n[2 2 3 4]\r\n[3 2 3 4]\r\n...\r\n[2 1 2 3]\r\n[3 1 2 3]\r\n[4 1 2 3]]\r\n(256, 4)\r\n\r\n```\r\n\r\n## You can mix data types\r\n\r\n\r\n```python\r\nargs = [\r\n    [[1, 2], 3, 4],\r\n    [3, \"xxxxx\", 3, 6],\r\n    [2, 0, 0, 2],\r\n    [2, 0, [0, 2]],\r\n    [8, 2, 8, 2],\r\n    [4, 5, 4, 5],\r\n    [[3, 3], 3, 6],\r\n    [4, 5, 4, 5],\r\n    [0, {2, 3, 4}, 8, 7],\r\n    [1, 2, b\"xxx3\", 4],\r\n]\r\n\r\nq = generate_product(args, remove_duplicates=False)\r\n\r\nOut[6]:\r\narray([[list([1, 2]), 3, 2, ..., 4, 0, 1],\r\n[3, 3, 2, ..., 4, 0, 1],\r\n[4, 3, 2, ..., 4, 0, 1],\r\n...,\r\n[list([1, 2]), 6, 2, ..., 5, 7, 4],\r\n[3, 6, 2, ..., 5, 7, 4],\r\n[4, 6, 2, ..., 5, 7, 4]], dtype=object)\r\n\r\n\r\n\r\nthe function repr is usually used to filter Not-Numpy-Friendly-Data\r\nThis might lead to some problems, e.g. pandas DataFrames which are usually not\r\nfully shown when calling __repr__\r\nIn these cases, you can pass a custom function to str_format_function\r\n (but to be honest: Who the hell puts a pandas DataFrame inside a NumPy array?)\r\n\r\n# Example for a function (The string is only used for indexing)\r\nstr_format_function = (\r\n    lambda x: x.to_string() if isinstance(x, pd.DataFrame) else repr(x)\r\n)\r\n\r\n\r\nimport pandas as pd\r\n\r\nargs = [\r\n    [2, 1, 3, 4],\r\n    [4, 4, 3, 4],\r\n    [\r\n        pd.read_csv(\r\n            \"https://github.com/datasciencedojo/datasets/blob/master/titanic.csv\",\r\n            on_bad_lines=\"skip\",\r\n        ),\r\n        np.array([222, 3]),\r\n        dict(baba=333, bibi=444),\r\n    ],\r\n]\r\n\r\nresus = generate_product(\r\n    args,\r\n    remove_duplicates=True,\r\n    r=len(args[0]),\r\n    max_reps_rows=-1,\r\n    str_format_function=str_format_function,\r\n)\r\nprint(resus)\r\nprint(resus.shape)\r\n\r\nAin't it pretty? hahaha\r\n\r\n[[4 3 2 ... {'baba': 333, 'bibi': 444}\r\n<!DOCTYPE html>\r\n0                                                 <html\r\n1                                             lang=\"en\"\r\n2       data-color-mode=\"auto\" data-light-theme=\"lig...\r\n3       data-a11y-animated-images=\"system\" data-a11y...\r\n4                                                     >\r\n...                                                 ...\r\n1062                                             </div>\r\n1063      <div id=\"js-global-screen-reader-notice\" c...\r\n1064      <div id=\"js-global-screen-reader-notice-as...\r\n1065                                            </body>\r\n1066                                            </html>\r\n\r\n[1067 rows x 1 columns]\r\narray([222,   3])]\r\n[2 1 1 ... {'baba': 333, 'bibi': 444} {'baba': 333, 'bibi': 444}\r\narray([222,   3])]\r\n[1 1 3 ...                                         <!DOCTYPE html>\r\n0                                                 <html\r\n1                                             lang=\"en\"\r\n2       data-color-mode=\"auto\" data-light-theme=\"lig...\r\n3       data-a11y-animated-images=\"system\" data-a11y...\r\n4                                                     >\r\n...                                                 ...\r\n1062                                             </div>\r\n\r\n```\r\n\r\n## Inhomogeneous Shapes? No problem!\r\n\r\n```python \r\n# An Inhomogeneous Shape is also no problem. \r\n# Just make sure that the default dummy value dummyval=\"DUMMYVAL\" is not in your Array (not very likely, I guess)\r\n\r\na = [1, 2]\r\nb = [3, 4]\r\nc = [5, 6, 7]\r\nd = [8, 9, 10]\r\ntotal = [a, b, c, d]\r\n\r\nresus = generate_product(total, remove_duplicates=True, dummyval=\"DUMMYVAL\")\r\nprint(resus)\r\n\r\n[[2 3 6 9]\r\n[1 3 5 8]\r\n[1 3 6 9]\r\n[1 4 6 8]\r\n[1 4 5 8]\r\n[2 3 5 8]\r\n[1 4 7 9]\r\n[1 3 7 9]\r\n...\r\n[2 3 7 9]\r\n[2 4 7 9]\r\n[2 3 5 10]\r\n[2 4 5 10]\r\n[1 3 6 10]\r\n[2 3 6 10]\r\n[1 4 6 10]\r\n[2 4 6 10]\r\n[2 4 7 10]]\r\n\r\na = [1, 2, 3]\r\nb = [3, 4, 4]\r\nc = [5, 6]\r\nd = [8, 9, 10]\r\ntotal = [a, b, c, d]\r\n\r\nresus = generate_product(total, remove_duplicates=True, dummyval=\"DUMMYVAL\")\r\nprint(resus)\r\n[[1 3 5 8]\r\n[3 4 6 10]\r\n[1 3 5 10]\r\n[2 4 5 9]\r\n[2 4 6 8]\r\n[3 3 5 8]\r\n...\r\n[3 3 6 9]\r\n[2 3 6 10]\r\n[2 4 6 9]\r\n[1 4 6 10]\r\n[3 4 6 9]\r\n[3 3 5 10]\r\n[1 3 6 10]\r\n[1 4 5 10]\r\n[2 4 5 10]\r\n[3 4 5 10]\r\n[3 3 6 10]]\r\n\r\n``` \r\n\r\n\r\n## How to get the index\r\n\r\n```python \r\n\r\n# To save memory, the function can only return the index, this saves a lot of memory \r\n# and you can access each element by looping through the data and accessing the input Element\r\n\r\nargs = [\r\n    [100, 200, 300, 400],\r\n    [300, 300, 300, 600],\r\n    [200, 000, 000, 200],\r\n    [200, 000, 000, 200],\r\n    [800, 200, 800, 200],\r\n    [400, 500, 400, 500],\r\n    [300, 300, 300, 600],\r\n    [400, 500, 400, 500],\r\n    [000, 900, 800, 700],\r\n    [100, 200, 300, 400],\r\n]\r\n\r\nresus = generate_product(\r\n    args,\r\n    remove_duplicates=False,\r\n    return_index_only=True,\r\n)\r\nprint(resus)\r\nprint(resus.shape)\r\n\r\nThe function returns:\r\n[[0 2 1 ... 3 5 0]\r\n[1 2 1 ... 3 5 0]\r\n[2 2 1 ... 3 5 0]\r\n...\r\n[1 4 1 ... 7 9 3]\r\n[2 4 1 ... 7 9 3]\r\n[3 4 1 ... 7 9 3]]\r\n(1048576, 10)\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Permutations, Combinations and Product for Numpy - written in Cython - 20x faster than itertools",
    "version": "0.10",
    "project_urls": {
        "Homepage": "https://github.com/hansalemaos/numpycythonpermutations"
    },
    "split_keywords": [
        "permutations",
        "numpy",
        "combinations",
        "product"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8277c15869c198f1bc9713d07d9dd28142d87ad3979d955d32fafa79f92924fb",
                "md5": "fa7a445ba937cfd2c2c7931454e1366c",
                "sha256": "10b7ddf840e3391d092a8d6297df851969cc2be3645d2dea7507d75523dd9fe7"
            },
            "downloads": -1,
            "filename": "numpycythonpermutations-0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fa7a445ba937cfd2c2c7931454e1366c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 94559,
            "upload_time": "2024-02-03T15:06:05",
            "upload_time_iso_8601": "2024-02-03T15:06:05.453319Z",
            "url": "https://files.pythonhosted.org/packages/82/77/c15869c198f1bc9713d07d9dd28142d87ad3979d955d32fafa79f92924fb/numpycythonpermutations-0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "067012908607d901eae1621940813a9877a97b37e7a187c9b92d76fb78e49a73",
                "md5": "e1f479f70c68a61896a2a7f6ed098f5c",
                "sha256": "36dce26eacbb15c22b604793bc07790165ad12f4a4fe21545b798ff4d181a0c1"
            },
            "downloads": -1,
            "filename": "numpycythonpermutations-0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "e1f479f70c68a61896a2a7f6ed098f5c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 92326,
            "upload_time": "2024-02-03T15:06:07",
            "upload_time_iso_8601": "2024-02-03T15:06:07.729849Z",
            "url": "https://files.pythonhosted.org/packages/06/70/12908607d901eae1621940813a9877a97b37e7a187c9b92d76fb78e49a73/numpycythonpermutations-0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-03 15:06:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hansalemaos",
    "github_project": "numpycythonpermutations",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "cycompi",
            "specs": []
        },
        {
            "name": "flatten_any_dict_iterable_or_whatsoever",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        }
    ],
    "lcname": "numpycythonpermutations"
}

Johannes Fischer