# Efficient NumPy Permutations, Combinations, and Product using Cython/C++/OpenMP
Generate permutations, combinations, and product sets with NumPy efficiently.
The provided `generate_product` function is designed to outperform the standard
itertools library, offering more than 20x speed improvement.
- Utilizes a "Yellow-line-free" Cython Backend for high speed performance.
- Implements OpenMP multiprocessing for parallel processing.
- Compiles on the first run (requires a C/C++ compiler installed on your PC).
- Achieves 90% less memory usage compared to itertools.
- Performance scales with data size, making it ideal for large datasets.
- Efficiently creates a lookup NumPy array with a lightweight dtype (typically np.uint8, unless you are combining more than 255 different elements).
- Utilizes numpy indexing for memory savings - depending on the datatype (and your luck :-) ), numpy shows you only element views, which means, you are saving a loooooooooooooooooooot of memory
## Supported Functionality
<table><thead><tr><th><p>Iterator</p></th><th><p>Arguments</p></th><th><p>Results</p></th></tr></thead><tbody><tr><td><p><a href="https://docs.python.org/3/library/itertools.html#itertools.product" title="itertools.product"><code><span>product()</span></code></a></p></td><td><p>p, q, … [repeat=1]</p></td><td><p>cartesian product, equivalent to a nested for-loop</p></td></tr><tr><td><p><a href="https://docs.python.org/3/library/itertools.html#itertools.permutations" title="itertools.permutations"><code><span>permutations()</span></code></a></p></td><td><p>p[, r]</p></td><td><p>r-length tuples, all possible orderings, no repeated elements</p></td></tr><tr><td><p><a href="https://docs.python.org/3/library/itertools.html#itertools.combinations" title="itertools.combinations"><code><span>combinations()</span></code></a></p></td><td><p>p, r</p></td><td><p>r-length tuples, in sorted order, no repeated elements</p></td></tr><tr><td><p><a href="https://docs.python.org/3/library/itertools.html#itertools.combinations_with_replacement" title="itertools.combinations_with_replacement"><code><span>combinations_with_replacement()</span></code></a></p></td><td><p>p, r</p></td><td><p>r-length tuples, in sorted order, with repeated elements</p></td></tr></tbody></table>
## Getting Started
### Only tested on Windows 10 / Python 3.11
```python
- Make sure you have Python and a C/C++ compiler installed
- Use pip install numpycythonpermutations or download it from Github
```
## Some examples
## Generating all RGB colors in 200 ms.
#### more than 25 times faster than itertools generating all RGB colors
```python
import numpy as np
import itertools
from numpycythonpermutations import generate_product
# RGB COLORS:
args = np.asarray( # The input must be always 2 dimensional (list or numpy)
[
list(range(256)),
list(range(256)),
list(range(256)),
],
dtype=np.uint8,
)
In [17]: %timeit resus = np.array(list(itertools.product(*args)))
5.88 s ± 78.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
...: %timeit resus = generate_product(args, remove_duplicates=False, str_format_function=repr, multicpu=True, return_index_only=False, max_reps_rows=-1, r=-1, dummyval="DUMMYVAL")
232 ms ± 31.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```
### But even 2.5x faster when using a tiny database
```python
2.5x times faster using little data
args = np.asarray(
[
list(range(5)),
list(range(5)),
list(range(5)),
],
dtype=np.uint8,
)
In [23]: %timeit np.array(list(itertools.product(*args)))
39.3 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [25]: %timeit resus = generate_product(args, remove_duplicates=False, str_format_function=repr, multicpu=True, return_index_only=False, max_reps_rows=-1, r=-1, dummyval="DUMMYVAL")
19.2 µs ± 176 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
```
### Attention! The output is different (Fortran-styled order) from itertools:
#### Itertools
```python
array([
[ 0, 0, 0],
[ 1, 0, 0],
[ 2, 0, 0],
...,
[253, 255, 255],
[254, 255, 255],
[255, 255, 255]], dtype=np.uint8)
```
#### numpycythonpermutations
```python
array(
[[ 0, 0, 0],
[ 0, 0, 1],
[ 0, 0, 2],
...,
[255, 255, 253],
[255, 255, 254],
[255, 255, 255]])
```
## Deleting duplicates
```python
args = [
[1, 2, 3, 4],
[2, 0, 0, 2],
[2, 1, 6, 2],
[1, 2, 3, 4],
]
resus1 = generate_product(
args,
remove_duplicates=True,
)
print(resus1)
print(resus1.shape)
In [15]: resus1
Out[15]:
array([[1, 2, 2, 1],
[2, 0, 2, 3],
[2, 2, 2, 1],
[3, 2, 2, 1],
...
[4, 0, 1, 4],
[1, 2, 6, 4],
[3, 2, 6, 4],
[4, 2, 6, 4],
[1, 0, 6, 4],
[3, 0, 6, 4],
[4, 0, 6, 4]])
In [18]: resus1.shape
Out[18]: (96, 4)
# Without removing duplicates
args = [
[1, 2, 3, 4],
[2, 0, 0, 2],
[2, 1, 6, 2],
[1, 2, 3, 4],
]
resus2 = generate_product(
args,
remove_duplicates=False,
)
print(resus2.shape)
In [16]: resus2
Out[16]:
array([[1, 2, 2, 1],
[2, 2, 2, 1],
[3, 2, 2, 1],
...,
[2, 2, 2, 4],
[3, 2, 2, 4],
[4, 2, 2, 4]])
In [17]: resus2.shape
Out[17]: (256, 4)
```
## Filtering Data
### To get all colors whose RGB values are R!=G!=B
#### The order of any filtered output may vary each time due to multicore parsing.
```python
args = [
list(range(256)),
list(range(256)),
list(range(256)),
]
generate_product(args, max_reps_rows=1)
array([[119, 158, 238],
[ 50, 2, 0],
[226, 251, 90],
...,
[244, 254, 255],
[245, 254, 255],
[246, 254, 255]])
# But it takes some time to filter 16,7 Million colors:
In [38]: %timeit generate_product(args, max_reps_rows=1)
11.7 s ± 437 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Passing a NumPy array is a little faster
args = np.asarray(
[
list(range(256)),
list(range(256)),
list(range(256)),
],
dtype=np.uint8,
)
In [2]: %timeit generate_product(args, max_reps_rows=1)
9.94 s ± 209 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Another example
args = [
[2, 1, 3, 4],
[4, 4, 3, 4],
]
resus = generate_product(args,
remove_duplicates=True, # removes all duplicated rows
r=len(args[0]), # similar to itertools
max_reps_rows=2) # allows only 2 occurrences of the same element in the same row
[[1 1 2 2 4 3 3 4]
[1 1 2 2 4 3 4 3]
[1 2 2 1 3 4 3 4]
[2 1 2 1 3 4 4 3]
[1 1 2 2 3 3 4 4]
[2 1 1 2 3 4 3 4]
[2 1 2 1 3 4 3 4]
[1 1 2 2 3 4 3 4]
# Another example
args = [
[1, 2, 3, 4],
]
resus = generate_product(args, remove_duplicates=False, r=len(args[0]))
print(resus)
print(resus.shape)
[[1 2 3 4]
[2 2 3 4]
[3 2 3 4]
...
[2 1 2 3]
[3 1 2 3]
[4 1 2 3]]
(256, 4)
```
## You can mix data types
```python
args = [
[[1, 2], 3, 4],
[3, "xxxxx", 3, 6],
[2, 0, 0, 2],
[2, 0, [0, 2]],
[8, 2, 8, 2],
[4, 5, 4, 5],
[[3, 3], 3, 6],
[4, 5, 4, 5],
[0, {2, 3, 4}, 8, 7],
[1, 2, b"xxx3", 4],
]
q = generate_product(args, remove_duplicates=False)
Out[6]:
array([[list([1, 2]), 3, 2, ..., 4, 0, 1],
[3, 3, 2, ..., 4, 0, 1],
[4, 3, 2, ..., 4, 0, 1],
...,
[list([1, 2]), 6, 2, ..., 5, 7, 4],
[3, 6, 2, ..., 5, 7, 4],
[4, 6, 2, ..., 5, 7, 4]], dtype=object)
the function repr is usually used to filter Not-Numpy-Friendly-Data
This might lead to some problems, e.g. pandas DataFrames which are usually not
fully shown when calling __repr__
In these cases, you can pass a custom function to str_format_function
(but to be honest: Who the hell puts a pandas DataFrame inside a NumPy array?)
# Example for a function (The string is only used for indexing)
str_format_function = (
lambda x: x.to_string() if isinstance(x, pd.DataFrame) else repr(x)
)
import pandas as pd
args = [
[2, 1, 3, 4],
[4, 4, 3, 4],
[
pd.read_csv(
"https://github.com/datasciencedojo/datasets/blob/master/titanic.csv",
on_bad_lines="skip",
),
np.array([222, 3]),
dict(baba=333, bibi=444),
],
]
resus = generate_product(
args,
remove_duplicates=True,
r=len(args[0]),
max_reps_rows=-1,
str_format_function=str_format_function,
)
print(resus)
print(resus.shape)
Ain't it pretty? hahaha
[[4 3 2 ... {'baba': 333, 'bibi': 444}
<!DOCTYPE html>
0 <html
1 lang="en"
2 data-color-mode="auto" data-light-theme="lig...
3 data-a11y-animated-images="system" data-a11y...
4 >
... ...
1062 </div>
1063 <div id="js-global-screen-reader-notice" c...
1064 <div id="js-global-screen-reader-notice-as...
1065 </body>
1066 </html>
[1067 rows x 1 columns]
array([222, 3])]
[2 1 1 ... {'baba': 333, 'bibi': 444} {'baba': 333, 'bibi': 444}
array([222, 3])]
[1 1 3 ... <!DOCTYPE html>
0 <html
1 lang="en"
2 data-color-mode="auto" data-light-theme="lig...
3 data-a11y-animated-images="system" data-a11y...
4 >
... ...
1062 </div>
```
## Inhomogeneous Shapes? No problem!
```python
# An Inhomogeneous Shape is also no problem.
# Just make sure that the default dummy value dummyval="DUMMYVAL" is not in your Array (not very likely, I guess)
a = [1, 2]
b = [3, 4]
c = [5, 6, 7]
d = [8, 9, 10]
total = [a, b, c, d]
resus = generate_product(total, remove_duplicates=True, dummyval="DUMMYVAL")
print(resus)
[[2 3 6 9]
[1 3 5 8]
[1 3 6 9]
[1 4 6 8]
[1 4 5 8]
[2 3 5 8]
[1 4 7 9]
[1 3 7 9]
...
[2 3 7 9]
[2 4 7 9]
[2 3 5 10]
[2 4 5 10]
[1 3 6 10]
[2 3 6 10]
[1 4 6 10]
[2 4 6 10]
[2 4 7 10]]
a = [1, 2, 3]
b = [3, 4, 4]
c = [5, 6]
d = [8, 9, 10]
total = [a, b, c, d]
resus = generate_product(total, remove_duplicates=True, dummyval="DUMMYVAL")
print(resus)
[[1 3 5 8]
[3 4 6 10]
[1 3 5 10]
[2 4 5 9]
[2 4 6 8]
[3 3 5 8]
...
[3 3 6 9]
[2 3 6 10]
[2 4 6 9]
[1 4 6 10]
[3 4 6 9]
[3 3 5 10]
[1 3 6 10]
[1 4 5 10]
[2 4 5 10]
[3 4 5 10]
[3 3 6 10]]
```
## How to get the index
```python
# To save memory, the function can only return the index, this saves a lot of memory
# and you can access each element by looping through the data and accessing the input Element
args = [
[100, 200, 300, 400],
[300, 300, 300, 600],
[200, 000, 000, 200],
[200, 000, 000, 200],
[800, 200, 800, 200],
[400, 500, 400, 500],
[300, 300, 300, 600],
[400, 500, 400, 500],
[000, 900, 800, 700],
[100, 200, 300, 400],
]
resus = generate_product(
args,
remove_duplicates=False,
return_index_only=True,
)
print(resus)
print(resus.shape)
The function returns:
[[0 2 1 ... 3 5 0]
[1 2 1 ... 3 5 0]
[2 2 1 ... 3 5 0]
...
[1 4 1 ... 7 9 3]
[2 4 1 ... 7 9 3]
[3 4 1 ... 7 9 3]]
(1048576, 10)
Raw data
{
"_id": null,
"home_page": "https://github.com/hansalemaos/numpycythonpermutations",
"name": "numpycythonpermutations",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "Permutations,numpy,Combinations,Product",
"author": "Johannes Fischer",
"author_email": "aulasparticularesdealemaosp@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/06/70/12908607d901eae1621940813a9877a97b37e7a187c9b92d76fb78e49a73/numpycythonpermutations-0.10.tar.gz",
"platform": null,
"description": "\r\n# Efficient NumPy Permutations, Combinations, and Product using Cython/C++/OpenMP\r\n\r\n\r\nGenerate permutations, combinations, and product sets with NumPy efficiently. \r\nThe provided `generate_product` function is designed to outperform the standard \r\nitertools library, offering more than 20x speed improvement.\r\n\r\n\r\n\r\n\r\n- Utilizes a \"Yellow-line-free\" Cython Backend for high speed performance.\r\n- Implements OpenMP multiprocessing for parallel processing.\r\n- Compiles on the first run (requires a C/C++ compiler installed on your PC).\r\n- Achieves 90% less memory usage compared to itertools.\r\n- Performance scales with data size, making it ideal for large datasets.\r\n- Efficiently creates a lookup NumPy array with a lightweight dtype (typically np.uint8, unless you are combining more than 255 different elements).\r\n- Utilizes numpy indexing for memory savings - depending on the datatype (and your luck :-) ), numpy shows you only element views, which means, you are saving a loooooooooooooooooooot of memory\r\n\r\n## Supported Functionality\r\n\r\n\r\n<table><thead><tr><th><p>Iterator</p></th><th><p>Arguments</p></th><th><p>Results</p></th></tr></thead><tbody><tr><td><p><a href=\"https://docs.python.org/3/library/itertools.html#itertools.product\" title=\"itertools.product\"><code><span>product()</span></code></a></p></td><td><p>p, q, \u2026 [repeat=1]</p></td><td><p>cartesian product, equivalent to a nested for-loop</p></td></tr><tr><td><p><a href=\"https://docs.python.org/3/library/itertools.html#itertools.permutations\" title=\"itertools.permutations\"><code><span>permutations()</span></code></a></p></td><td><p>p[, r]</p></td><td><p>r-length tuples, all possible orderings, no repeated elements</p></td></tr><tr><td><p><a href=\"https://docs.python.org/3/library/itertools.html#itertools.combinations\" title=\"itertools.combinations\"><code><span>combinations()</span></code></a></p></td><td><p>p, r</p></td><td><p>r-length tuples, in sorted order, no repeated elements</p></td></tr><tr><td><p><a href=\"https://docs.python.org/3/library/itertools.html#itertools.combinations_with_replacement\" title=\"itertools.combinations_with_replacement\"><code><span>combinations_with_replacement()</span></code></a></p></td><td><p>p, r</p></td><td><p>r-length tuples, in sorted order, with repeated elements</p></td></tr></tbody></table>\r\n\r\n\r\n## Getting Started\r\n\r\n### Only tested on Windows 10 / Python 3.11\r\n\r\n```python\r\n- Make sure you have Python and a C/C++ compiler installed \r\n- Use pip install numpycythonpermutations or download it from Github\r\n```\r\n\r\n\r\n## Some examples\r\n\r\n## Generating all RGB colors in 200 ms.\r\n\r\n#### more than 25 times faster than itertools generating all RGB colors\r\n\r\n\r\n```python\r\nimport numpy as np\r\nimport itertools\r\nfrom numpycythonpermutations import generate_product\r\n\r\n# RGB COLORS:\r\n\r\nargs = np.asarray( # The input must be always 2 dimensional (list or numpy)\r\n [\r\n list(range(256)),\r\n list(range(256)),\r\n list(range(256)),\r\n ],\r\n dtype=np.uint8,\r\n)\r\n\r\nIn [17]: %timeit resus = np.array(list(itertools.product(*args)))\r\n5.88 s \u00b1 78.5 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n\r\n...: %timeit resus = generate_product(args, remove_duplicates=False, str_format_function=repr, multicpu=True, return_index_only=False, max_reps_rows=-1, r=-1, dummyval=\"DUMMYVAL\")\r\n232 ms \u00b1 31.6 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n\r\n```\r\n\r\n### But even 2.5x faster when using a tiny database \r\n\r\n\r\n```python\r\n2.5x times faster using little data\r\nargs = np.asarray(\r\n [\r\n list(range(5)),\r\n list(range(5)),\r\n list(range(5)),\r\n ],\r\n dtype=np.uint8,\r\n)\r\n\r\nIn [23]: %timeit np.array(list(itertools.product(*args)))\r\n39.3 \u00b5s \u00b1 113 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\r\n\r\nIn [25]: %timeit resus = generate_product(args, remove_duplicates=False, str_format_function=repr, multicpu=True, return_index_only=False, max_reps_rows=-1, r=-1, dummyval=\"DUMMYVAL\")\r\n19.2 \u00b5s \u00b1 176 ns per loop (mean \u00b1 std. dev. of 7 runs, 100,000 loops each)\r\n```\r\n\r\n### Attention! The output is different (Fortran-styled order) from itertools:\r\n\r\n#### Itertools\r\n\r\n```python\r\n\r\n\r\narray([\r\n[ 0, 0, 0],\r\n[ 1, 0, 0],\r\n[ 2, 0, 0],\r\n...,\r\n[253, 255, 255],\r\n[254, 255, 255],\r\n[255, 255, 255]], dtype=np.uint8)\r\n\r\n```\r\n\r\n#### numpycythonpermutations\r\n\r\n\r\n```python\r\narray(\r\n [[ 0, 0, 0],\r\n [ 0, 0, 1],\r\n [ 0, 0, 2],\r\n ...,\r\n [255, 255, 253],\r\n [255, 255, 254],\r\n [255, 255, 255]])\r\n```\r\n\r\n## Deleting duplicates \r\n\r\n```python\r\nargs = [\r\n [1, 2, 3, 4],\r\n [2, 0, 0, 2],\r\n [2, 1, 6, 2],\r\n [1, 2, 3, 4],\r\n]\r\nresus1 = generate_product(\r\n args,\r\n remove_duplicates=True,\r\n)\r\n\r\nprint(resus1)\r\nprint(resus1.shape)\r\nIn [15]: resus1\r\nOut[15]:\r\narray([[1, 2, 2, 1],\r\n[2, 0, 2, 3],\r\n[2, 2, 2, 1],\r\n[3, 2, 2, 1],\r\n...\r\n[4, 0, 1, 4],\r\n[1, 2, 6, 4],\r\n[3, 2, 6, 4],\r\n[4, 2, 6, 4],\r\n[1, 0, 6, 4],\r\n[3, 0, 6, 4],\r\n[4, 0, 6, 4]])\r\nIn [18]: resus1.shape\r\nOut[18]: (96, 4)\r\n\r\n# Without removing duplicates\r\n\r\nargs = [\r\n [1, 2, 3, 4],\r\n [2, 0, 0, 2],\r\n [2, 1, 6, 2],\r\n [1, 2, 3, 4],\r\n]\r\nresus2 = generate_product(\r\n args,\r\n remove_duplicates=False,\r\n)\r\nprint(resus2.shape)\r\n\r\nIn [16]: resus2\r\nOut[16]:\r\narray([[1, 2, 2, 1],\r\n[2, 2, 2, 1],\r\n[3, 2, 2, 1],\r\n...,\r\n[2, 2, 2, 4],\r\n[3, 2, 2, 4],\r\n[4, 2, 2, 4]])\r\nIn [17]: resus2.shape\r\nOut[17]: (256, 4)\r\n```\r\n\r\n## Filtering Data\r\n\r\n### To get all colors whose RGB values are R!=G!=B\r\n\r\n#### The order of any filtered output may vary each time due to multicore parsing.\r\n\r\n```python\r\n\r\nargs = [\r\n list(range(256)),\r\n list(range(256)),\r\n list(range(256)),\r\n]\r\n\r\ngenerate_product(args, max_reps_rows=1)\r\n\r\narray([[119, 158, 238],\r\n[ 50, 2, 0],\r\n[226, 251, 90],\r\n...,\r\n[244, 254, 255],\r\n[245, 254, 255],\r\n[246, 254, 255]])\r\n\r\n# But it takes some time to filter 16,7 Million colors:\r\n\r\nIn [38]: %timeit generate_product(args, max_reps_rows=1)\r\n11.7 s \u00b1 437 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n\r\n# Passing a NumPy array is a little faster\r\n\r\nargs = np.asarray(\r\n [\r\n list(range(256)),\r\n list(range(256)),\r\n list(range(256)),\r\n ],\r\n dtype=np.uint8,\r\n)\r\n\r\nIn [2]: %timeit generate_product(args, max_reps_rows=1)\r\n9.94 s \u00b1 209 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n\r\n# Another example\r\nargs = [\r\n [2, 1, 3, 4],\r\n [4, 4, 3, 4],\r\n]\r\nresus = generate_product(args, \r\nremove_duplicates=True, # removes all duplicated rows\r\nr=len(args[0]), # similar to itertools\r\nmax_reps_rows=2) # allows only 2 occurrences of the same element in the same row\r\n\r\n[[1 1 2 2 4 3 3 4]\r\n[1 1 2 2 4 3 4 3]\r\n[1 2 2 1 3 4 3 4]\r\n[2 1 2 1 3 4 4 3]\r\n[1 1 2 2 3 3 4 4]\r\n[2 1 1 2 3 4 3 4]\r\n[2 1 2 1 3 4 3 4]\r\n[1 1 2 2 3 4 3 4]\r\n\r\n\r\n# Another example\r\n\r\nargs = [\r\n [1, 2, 3, 4],\r\n]\r\n\r\nresus = generate_product(args, remove_duplicates=False, r=len(args[0]))\r\nprint(resus)\r\nprint(resus.shape)\r\n\r\n[[1 2 3 4]\r\n[2 2 3 4]\r\n[3 2 3 4]\r\n...\r\n[2 1 2 3]\r\n[3 1 2 3]\r\n[4 1 2 3]]\r\n(256, 4)\r\n\r\n```\r\n\r\n## You can mix data types\r\n\r\n\r\n```python\r\nargs = [\r\n [[1, 2], 3, 4],\r\n [3, \"xxxxx\", 3, 6],\r\n [2, 0, 0, 2],\r\n [2, 0, [0, 2]],\r\n [8, 2, 8, 2],\r\n [4, 5, 4, 5],\r\n [[3, 3], 3, 6],\r\n [4, 5, 4, 5],\r\n [0, {2, 3, 4}, 8, 7],\r\n [1, 2, b\"xxx3\", 4],\r\n]\r\n\r\nq = generate_product(args, remove_duplicates=False)\r\n\r\nOut[6]:\r\narray([[list([1, 2]), 3, 2, ..., 4, 0, 1],\r\n[3, 3, 2, ..., 4, 0, 1],\r\n[4, 3, 2, ..., 4, 0, 1],\r\n...,\r\n[list([1, 2]), 6, 2, ..., 5, 7, 4],\r\n[3, 6, 2, ..., 5, 7, 4],\r\n[4, 6, 2, ..., 5, 7, 4]], dtype=object)\r\n\r\n\r\n\r\nthe function repr is usually used to filter Not-Numpy-Friendly-Data\r\nThis might lead to some problems, e.g. pandas DataFrames which are usually not\r\nfully shown when calling __repr__\r\nIn these cases, you can pass a custom function to str_format_function\r\n (but to be honest: Who the hell puts a pandas DataFrame inside a NumPy array?)\r\n\r\n# Example for a function (The string is only used for indexing)\r\nstr_format_function = (\r\n lambda x: x.to_string() if isinstance(x, pd.DataFrame) else repr(x)\r\n)\r\n\r\n\r\nimport pandas as pd\r\n\r\nargs = [\r\n [2, 1, 3, 4],\r\n [4, 4, 3, 4],\r\n [\r\n pd.read_csv(\r\n \"https://github.com/datasciencedojo/datasets/blob/master/titanic.csv\",\r\n on_bad_lines=\"skip\",\r\n ),\r\n np.array([222, 3]),\r\n dict(baba=333, bibi=444),\r\n ],\r\n]\r\n\r\nresus = generate_product(\r\n args,\r\n remove_duplicates=True,\r\n r=len(args[0]),\r\n max_reps_rows=-1,\r\n str_format_function=str_format_function,\r\n)\r\nprint(resus)\r\nprint(resus.shape)\r\n\r\nAin't it pretty? hahaha\r\n\r\n[[4 3 2 ... {'baba': 333, 'bibi': 444}\r\n<!DOCTYPE html>\r\n0 <html\r\n1 lang=\"en\"\r\n2 data-color-mode=\"auto\" data-light-theme=\"lig...\r\n3 data-a11y-animated-images=\"system\" data-a11y...\r\n4 >\r\n... ...\r\n1062 </div>\r\n1063 <div id=\"js-global-screen-reader-notice\" c...\r\n1064 <div id=\"js-global-screen-reader-notice-as...\r\n1065 </body>\r\n1066 </html>\r\n\r\n[1067 rows x 1 columns]\r\narray([222, 3])]\r\n[2 1 1 ... {'baba': 333, 'bibi': 444} {'baba': 333, 'bibi': 444}\r\narray([222, 3])]\r\n[1 1 3 ... <!DOCTYPE html>\r\n0 <html\r\n1 lang=\"en\"\r\n2 data-color-mode=\"auto\" data-light-theme=\"lig...\r\n3 data-a11y-animated-images=\"system\" data-a11y...\r\n4 >\r\n... ...\r\n1062 </div>\r\n\r\n```\r\n\r\n## Inhomogeneous Shapes? No problem!\r\n\r\n```python \r\n# An Inhomogeneous Shape is also no problem. \r\n# Just make sure that the default dummy value dummyval=\"DUMMYVAL\" is not in your Array (not very likely, I guess)\r\n\r\na = [1, 2]\r\nb = [3, 4]\r\nc = [5, 6, 7]\r\nd = [8, 9, 10]\r\ntotal = [a, b, c, d]\r\n\r\nresus = generate_product(total, remove_duplicates=True, dummyval=\"DUMMYVAL\")\r\nprint(resus)\r\n\r\n[[2 3 6 9]\r\n[1 3 5 8]\r\n[1 3 6 9]\r\n[1 4 6 8]\r\n[1 4 5 8]\r\n[2 3 5 8]\r\n[1 4 7 9]\r\n[1 3 7 9]\r\n...\r\n[2 3 7 9]\r\n[2 4 7 9]\r\n[2 3 5 10]\r\n[2 4 5 10]\r\n[1 3 6 10]\r\n[2 3 6 10]\r\n[1 4 6 10]\r\n[2 4 6 10]\r\n[2 4 7 10]]\r\n\r\na = [1, 2, 3]\r\nb = [3, 4, 4]\r\nc = [5, 6]\r\nd = [8, 9, 10]\r\ntotal = [a, b, c, d]\r\n\r\nresus = generate_product(total, remove_duplicates=True, dummyval=\"DUMMYVAL\")\r\nprint(resus)\r\n[[1 3 5 8]\r\n[3 4 6 10]\r\n[1 3 5 10]\r\n[2 4 5 9]\r\n[2 4 6 8]\r\n[3 3 5 8]\r\n...\r\n[3 3 6 9]\r\n[2 3 6 10]\r\n[2 4 6 9]\r\n[1 4 6 10]\r\n[3 4 6 9]\r\n[3 3 5 10]\r\n[1 3 6 10]\r\n[1 4 5 10]\r\n[2 4 5 10]\r\n[3 4 5 10]\r\n[3 3 6 10]]\r\n\r\n``` \r\n\r\n\r\n## How to get the index\r\n\r\n```python \r\n\r\n# To save memory, the function can only return the index, this saves a lot of memory \r\n# and you can access each element by looping through the data and accessing the input Element\r\n\r\nargs = [\r\n [100, 200, 300, 400],\r\n [300, 300, 300, 600],\r\n [200, 000, 000, 200],\r\n [200, 000, 000, 200],\r\n [800, 200, 800, 200],\r\n [400, 500, 400, 500],\r\n [300, 300, 300, 600],\r\n [400, 500, 400, 500],\r\n [000, 900, 800, 700],\r\n [100, 200, 300, 400],\r\n]\r\n\r\nresus = generate_product(\r\n args,\r\n remove_duplicates=False,\r\n return_index_only=True,\r\n)\r\nprint(resus)\r\nprint(resus.shape)\r\n\r\nThe function returns:\r\n[[0 2 1 ... 3 5 0]\r\n[1 2 1 ... 3 5 0]\r\n[2 2 1 ... 3 5 0]\r\n...\r\n[1 4 1 ... 7 9 3]\r\n[2 4 1 ... 7 9 3]\r\n[3 4 1 ... 7 9 3]]\r\n(1048576, 10)\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Permutations, Combinations and Product for Numpy - written in Cython - 20x faster than itertools",
"version": "0.10",
"project_urls": {
"Homepage": "https://github.com/hansalemaos/numpycythonpermutations"
},
"split_keywords": [
"permutations",
"numpy",
"combinations",
"product"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8277c15869c198f1bc9713d07d9dd28142d87ad3979d955d32fafa79f92924fb",
"md5": "fa7a445ba937cfd2c2c7931454e1366c",
"sha256": "10b7ddf840e3391d092a8d6297df851969cc2be3645d2dea7507d75523dd9fe7"
},
"downloads": -1,
"filename": "numpycythonpermutations-0.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fa7a445ba937cfd2c2c7931454e1366c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 94559,
"upload_time": "2024-02-03T15:06:05",
"upload_time_iso_8601": "2024-02-03T15:06:05.453319Z",
"url": "https://files.pythonhosted.org/packages/82/77/c15869c198f1bc9713d07d9dd28142d87ad3979d955d32fafa79f92924fb/numpycythonpermutations-0.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "067012908607d901eae1621940813a9877a97b37e7a187c9b92d76fb78e49a73",
"md5": "e1f479f70c68a61896a2a7f6ed098f5c",
"sha256": "36dce26eacbb15c22b604793bc07790165ad12f4a4fe21545b798ff4d181a0c1"
},
"downloads": -1,
"filename": "numpycythonpermutations-0.10.tar.gz",
"has_sig": false,
"md5_digest": "e1f479f70c68a61896a2a7f6ed098f5c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 92326,
"upload_time": "2024-02-03T15:06:07",
"upload_time_iso_8601": "2024-02-03T15:06:07.729849Z",
"url": "https://files.pythonhosted.org/packages/06/70/12908607d901eae1621940813a9877a97b37e7a187c9b92d76fb78e49a73/numpycythonpermutations-0.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-03 15:06:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hansalemaos",
"github_project": "numpycythonpermutations",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "cycompi",
"specs": []
},
{
"name": "flatten_any_dict_iterable_or_whatsoever",
"specs": []
},
{
"name": "numpy",
"specs": []
}
],
"lcname": "numpycythonpermutations"
}