distrel


Namedistrel JSON
Version 0.2.1 PyPI version JSON
download
home_page
SummaryCalculate the relationship between parameters with known distributions.
upload_time2023-12-14 19:48:32
maintainer
docs_urlNone
authorAlex
requires_python>=3.11,<4.0
licenseAGPL-3.0-or-later
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Distrel

Distrel (distribution relations) is a package used to approximate the relationship between dependent distributions when only the properties of the distribution are known.

## Example use case

Suppose we have 3 parameters with known distributions, such that: $`a \sim \mathcal{N}(\mu_a, \sigma_a)`$, $`b \sim \mathcal{N}(\mu_b, \sigma_b)`$ and $`c \sim \mathcal{N}(\mu_c, \sigma_c)`$ and distributions are related by $c = a \times b$.

Distrel attempts to solve the question, how do we correctly pseudo-randomly sample from $a$ and $b$ such then $c$ follows the correct distribution.

### How it works

Distrel is a wrapper around a simple neural network written in [pytorch](https://pytorch.org/) that finds the function $`f_{\mu}(a)`$ and $`f_{\sigma}(a)`$ from:

$`b \sim \mathcal{N}(f_{\mu}(a), f_{\sigma}(a))`$

such that:

$`a \times b \sim \mathcal{N}(\mu_c, \sigma_c)`$

That is, Distrel assumes that $a$ and $b$ are weakly related and hence samples $b$ from a normal distribution defined by a particular value of $a$.

Below is a diagram of the distrel neural network architecture:

![A diagram of the network architectures](drn.png) 

## Installation

With [poetry](https://python-poetry.org/)
```bash
poetry add distrel
```

or with [pip](https://pypi.org/)

```bash
pip install distrel
```

## Usage

To make things interesting, let's consider the relationship $a \times b = c$ but each distribution is a unique non-central t-distribution.
The key thing to note is that non-central t-distributions are defined by their degrees of freedom and a non-centrality parameter.
We could use any distribution but currently distributions can only be characterised by their mean, variance, skewness and kurtosis - if more properties are required then please do open an issue.
```python
from scipy.stats import nct
import distrel

# Defines the degrees of freedom (df) and non-centrality parameter (nc) for each distribution
a_df, a_nc = 9, 52
b_df, b_nc = 4, 30
c_df, c_nc = 143, 90


```

The distribution relation network (drn) needs three things:
    - A generator function to generate the input data.
    - A calculator function to obtain the 3rd distribution from the output of the generator and the output of the neural network.
    - The properties of each of the 3 distributions.
    
```python
# We need to define a generator function that can generate samples of A.
# It must take N (the number of samples as it's only argument.
# It doesn't matter that gen_a doesn't return a torch tensor, this is accounted for internally.
def gen_a(N):
    return nct.rvs(a_df, a_nc, size=N)
    
# We also need to define a function that calculates $c$ from $a$ and $b$.
# That is, the neural network will take input $a$, generate $b$ and then calculate $c$ from $a$ and 
# $b$.
def calc_c(a, b):
    return a * b
    
# Now we define the distribution relation network
drn = distrel.drn(gen=gen_a, calc=calc_c, seed=42)



# We need to define the distribution properties with the following:
# nct.stats returns mean, var, skew, kurt
a_mean, a_var, a_skew, a_kurt = nct.stats(a_df, a_nc, moments='mvsk')
b_mean, b_var, b_skew, b_kurt = nct.stats(b_df, b_nc, moments='mvsk')
c_mean, c_var, c_skew, c_kurt = nct.stats(c_df, c_nc, moments='mvsk')

drn.set_properties(
    {"mean": a_mean, "var": a_var, "skew": a_skew, "kurt": a_kurt}, 
    {"mean": b_mean, "var": b_var, "skew": b_skew, "kurt": b_kurt}, 
    {"mean": c_mean, "var": c_var, "skew": c_skew, "kurt": c_kurt},
)
# We could have alternatively initialised the drn with the following
# drn = distrel.drn(
#    gen=gen_a, 
#    calc=calc_c, 
#    properties=[
#        {"mean": a_mean, "var": a_var},
#        {"mean": b_mean, "var": b_var},
#        {"mean": c_mean, "var": c_var},
#    ],
#)
# This also shows that we don't /have/ to test for all four metrics.
# Currently, only mean, variance, skewness and kurtosis are supported.
```


Now the drn is ready to train, a tolerance can be provided to allow for early stopping if ALL of 
the distribution properties are within a percentage tolerance.
The tolerance defaults to zero which means no early stopping occurs.
Additionally, the maximum number of epochs can be set (this defaults to 100) and then drn will
stop training upon reaching this number regardless of performance.

The training starts off with gradient free optimisation before switching to a gradient based
optimiser.
To disable the gradient free optimiser, set budget to 1.

You can set the number of samples generated by the generator with N.

For reproducible results, you can also set the seed which sets the seed for pytorch, numpy and python's random.

`drn.train` returns a boolean, True if the training exited via to early stopping due to 
convergence, and False otherwise.
```python
converge = drn.train(
    max_epochs=1000, 
    tol=1e-2, 
    progress_bar=False,
    lr=1e-3
    budget=1000,
    N=1000,
    seed=42,
)
```

If you want more fine grain controlled over the optimiser - you can do that!
```python
from torch.optim import LBFGS

converge = drn.train(
    max_epochs=max_epochs,
    tol=tol,
    lr=lr,
    progress_bar=True,
    require_closure=True,
    optim=LBFGS,
    optim_kwargs={'line_search_fn': 'strong_wolfe'},
)

print(f"Converged?:\t{converge}")
```
The `optim_kwargs` are unpacked when calling the optimiser.
When the optimiser is called internally, set `require_closure` to True if you'd like to call:
```python
optim.step(closure)
```
or `require_closure` to False if you'd like to call:
```python
optim.step()
```

```python
# To view the network waits simply print the drn
print(drn)
```
Which returns something like:
```example
        _l2_ mu
in _l1_/
       \_l3_ sigma
out ~ N(mu, sigma)
---

l1:
Linear(in_features=1, out_features=1, bias=True)
Bias: -3.3627309799194336
Weight: 0.027783172205090523

l2:
Linear(in_features=1, out_features=1, bias=True)
Bias: 0.14515426754951477
Weight: 3.0655031204223633

l3:
Linear(in_features=1, out_features=1, bias=True)
Bias: -13.486244201660156
Weight: 2.5820298194885254
```

```python
# To use the drn as a predictor, call the predict method, a seed is optional but recommend for.
# reproducible results.
new_a = 37
new_b = drn.predict(37, seed=42)
new_c = drn.calc(new_a, new_b) # Wrapps around the calc_c function
```

## Contributing

Please feel free to submit an issue for something you'd like to see fixed or implemented.
Alternatively merge requests are always welcome!

## Copying

This project is licenced under the GNU GPL version 3 license - to read the license see [LICENSE](LICENSE).

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "distrel",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.11,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Alex",
    "author_email": "adrysdale@protonmail.com",
    "download_url": "https://files.pythonhosted.org/packages/5a/f8/486dd86cd4f5d6fbba63fe2810b6ce802cfb71acab892fd00581d910bbac/distrel-0.2.1.tar.gz",
    "platform": null,
    "description": "# Distrel\n\nDistrel (distribution relations) is a package used to approximate the relationship between dependent distributions when only the properties of the distribution are known.\n\n## Example use case\n\nSuppose we have 3 parameters with known distributions, such that: $`a \\sim \\mathcal{N}(\\mu_a, \\sigma_a)`$, $`b \\sim \\mathcal{N}(\\mu_b, \\sigma_b)`$ and $`c \\sim \\mathcal{N}(\\mu_c, \\sigma_c)`$ and distributions are related by $c = a \\times b$.\n\nDistrel attempts to solve the question, how do we correctly pseudo-randomly sample from $a$ and $b$ such then $c$ follows the correct distribution.\n\n### How it works\n\nDistrel is a wrapper around a simple neural network written in [pytorch](https://pytorch.org/) that finds the function $`f_{\\mu}(a)`$ and $`f_{\\sigma}(a)`$ from:\n\n$`b \\sim \\mathcal{N}(f_{\\mu}(a), f_{\\sigma}(a))`$\n\nsuch that:\n\n$`a \\times b \\sim \\mathcal{N}(\\mu_c, \\sigma_c)`$\n\nThat is, Distrel assumes that $a$ and $b$ are weakly related and hence samples $b$ from a normal distribution defined by a particular value of $a$.\n\nBelow is a diagram of the distrel neural network architecture:\n\n![A diagram of the network architectures](drn.png) \n\n## Installation\n\nWith [poetry](https://python-poetry.org/)\n```bash\npoetry add distrel\n```\n\nor with [pip](https://pypi.org/)\n\n```bash\npip install distrel\n```\n\n## Usage\n\nTo make things interesting, let's consider the relationship $a \\times b = c$ but each distribution is a unique non-central t-distribution.\nThe key thing to note is that non-central t-distributions are defined by their degrees of freedom and a non-centrality parameter.\nWe could use any distribution but currently distributions can only be characterised by their mean, variance, skewness and kurtosis - if more properties are required then please do open an issue.\n```python\nfrom scipy.stats import nct\nimport distrel\n\n# Defines the degrees of freedom (df) and non-centrality parameter (nc) for each distribution\na_df, a_nc = 9, 52\nb_df, b_nc = 4, 30\nc_df, c_nc = 143, 90\n\n\n```\n\nThe distribution relation network (drn) needs three things:\n    - A generator function to generate the input data.\n    - A calculator function to obtain the 3rd distribution from the output of the generator and the output of the neural network.\n    - The properties of each of the 3 distributions.\n    \n```python\n# We need to define a generator function that can generate samples of A.\n# It must take N (the number of samples as it's only argument.\n# It doesn't matter that gen_a doesn't return a torch tensor, this is accounted for internally.\ndef gen_a(N):\n    return nct.rvs(a_df, a_nc, size=N)\n    \n# We also need to define a function that calculates $c$ from $a$ and $b$.\n# That is, the neural network will take input $a$, generate $b$ and then calculate $c$ from $a$ and \n# $b$.\ndef calc_c(a, b):\n    return a * b\n    \n# Now we define the distribution relation network\ndrn = distrel.drn(gen=gen_a, calc=calc_c, seed=42)\n\n\n\n# We need to define the distribution properties with the following:\n# nct.stats returns mean, var, skew, kurt\na_mean, a_var, a_skew, a_kurt = nct.stats(a_df, a_nc, moments='mvsk')\nb_mean, b_var, b_skew, b_kurt = nct.stats(b_df, b_nc, moments='mvsk')\nc_mean, c_var, c_skew, c_kurt = nct.stats(c_df, c_nc, moments='mvsk')\n\ndrn.set_properties(\n    {\"mean\": a_mean, \"var\": a_var, \"skew\": a_skew, \"kurt\": a_kurt}, \n    {\"mean\": b_mean, \"var\": b_var, \"skew\": b_skew, \"kurt\": b_kurt}, \n    {\"mean\": c_mean, \"var\": c_var, \"skew\": c_skew, \"kurt\": c_kurt},\n)\n# We could have alternatively initialised the drn with the following\n# drn = distrel.drn(\n#    gen=gen_a, \n#    calc=calc_c, \n#    properties=[\n#        {\"mean\": a_mean, \"var\": a_var},\n#        {\"mean\": b_mean, \"var\": b_var},\n#        {\"mean\": c_mean, \"var\": c_var},\n#    ],\n#)\n# This also shows that we don't /have/ to test for all four metrics.\n# Currently, only mean, variance, skewness and kurtosis are supported.\n```\n\n\nNow the drn is ready to train, a tolerance can be provided to allow for early stopping if ALL of \nthe distribution properties are within a percentage tolerance.\nThe tolerance defaults to zero which means no early stopping occurs.\nAdditionally, the maximum number of epochs can be set (this defaults to 100) and then drn will\nstop training upon reaching this number regardless of performance.\n\nThe training starts off with gradient free optimisation before switching to a gradient based\noptimiser.\nTo disable the gradient free optimiser, set budget to 1.\n\nYou can set the number of samples generated by the generator with N.\n\nFor reproducible results, you can also set the seed which sets the seed for pytorch, numpy and python's random.\n\n`drn.train` returns a boolean, True if the training exited via to early stopping due to \nconvergence, and False otherwise.\n```python\nconverge = drn.train(\n    max_epochs=1000, \n    tol=1e-2, \n    progress_bar=False,\n    lr=1e-3\n    budget=1000,\n    N=1000,\n    seed=42,\n)\n```\n\nIf you want more fine grain controlled over the optimiser - you can do that!\n```python\nfrom torch.optim import LBFGS\n\nconverge = drn.train(\n    max_epochs=max_epochs,\n    tol=tol,\n    lr=lr,\n    progress_bar=True,\n    require_closure=True,\n    optim=LBFGS,\n    optim_kwargs={'line_search_fn': 'strong_wolfe'},\n)\n\nprint(f\"Converged?:\\t{converge}\")\n```\nThe `optim_kwargs` are unpacked when calling the optimiser.\nWhen the optimiser is called internally, set `require_closure` to True if you'd like to call:\n```python\noptim.step(closure)\n```\nor `require_closure` to False if you'd like to call:\n```python\noptim.step()\n```\n\n```python\n# To view the network waits simply print the drn\nprint(drn)\n```\nWhich returns something like:\n```example\n        _l2_ mu\nin _l1_/\n       \\_l3_ sigma\nout ~ N(mu, sigma)\n---\n\nl1:\nLinear(in_features=1, out_features=1, bias=True)\nBias: -3.3627309799194336\nWeight: 0.027783172205090523\n\nl2:\nLinear(in_features=1, out_features=1, bias=True)\nBias: 0.14515426754951477\nWeight: 3.0655031204223633\n\nl3:\nLinear(in_features=1, out_features=1, bias=True)\nBias: -13.486244201660156\nWeight: 2.5820298194885254\n```\n\n```python\n# To use the drn as a predictor, call the predict method, a seed is optional but recommend for.\n# reproducible results.\nnew_a = 37\nnew_b = drn.predict(37, seed=42)\nnew_c = drn.calc(new_a, new_b) # Wrapps around the calc_c function\n```\n\n## Contributing\n\nPlease feel free to submit an issue for something you'd like to see fixed or implemented.\nAlternatively merge requests are always welcome!\n\n## Copying\n\nThis project is licenced under the GNU GPL version 3 license - to read the license see [LICENSE](LICENSE).\n",
    "bugtrack_url": null,
    "license": "AGPL-3.0-or-later",
    "summary": "Calculate the relationship between parameters with known distributions.",
    "version": "0.2.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bafe7f5eeafd82743825ea3bcff35addcc2a9b86ff6f15f9487a319cf4ac3769",
                "md5": "9214ab6e8dae9c0049b33410f189469f",
                "sha256": "e7b601cb5012591e9cb642cf050c92168777ff02f890d6f6b54a71cc7cc04427"
            },
            "downloads": -1,
            "filename": "distrel-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9214ab6e8dae9c0049b33410f189469f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11,<4.0",
            "size": 18987,
            "upload_time": "2023-12-14T19:48:31",
            "upload_time_iso_8601": "2023-12-14T19:48:31.195625Z",
            "url": "https://files.pythonhosted.org/packages/ba/fe/7f5eeafd82743825ea3bcff35addcc2a9b86ff6f15f9487a319cf4ac3769/distrel-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5af8486dd86cd4f5d6fbba63fe2810b6ce802cfb71acab892fd00581d910bbac",
                "md5": "6a731e84be0e978098f93fdf9c1b821d",
                "sha256": "8f65ef0502ffd32b16d49abffe2510a01f7eb7ebbaf2d5b12838327d17a4d5a5"
            },
            "downloads": -1,
            "filename": "distrel-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "6a731e84be0e978098f93fdf9c1b821d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11,<4.0",
            "size": 18181,
            "upload_time": "2023-12-14T19:48:32",
            "upload_time_iso_8601": "2023-12-14T19:48:32.962389Z",
            "url": "https://files.pythonhosted.org/packages/5a/f8/486dd86cd4f5d6fbba63fe2810b6ce802cfb71acab892fd00581d910bbac/distrel-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-14 19:48:32",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "distrel"
}
        
Elapsed time: 0.16084s