tno.fl.protocols.logistic-regression


Nametno.fl.protocols.logistic-regression JSON
Version 0.2.2 PyPI version JSON
download
home_page
SummaryGeneric utilities for implementing encryption schemes
upload_time2023-08-03 10:35:55
maintainer
docs_urlNone
author
requires_python>=3.8
licenseApache License, Version 2.0
keywords tno pet machine learning federated learning logistic regression
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TNO PET Lab - Federated Learning (FL) - Protocols - Logistic Regression

The TNO PET Lab consists of generic software components, procedures, and
functionalities developed and maintained on a regular basis to facilitate and
aid in the development of PET solutions. The lab is a cross-project initiative
allowing us to integrate and reuse previously developed PET functionalities to
boost the development of new protocols and solutions.

The package `tno.fl.protocols.logistic_regression` is part of the TNO Python
Toolbox.

Implementation of a Federated Learning scheme for Logistic Regression. This
library was designed to facilitate both developers that are new to cryptography
and developers that are more familiar with cryptography.

Supports:

- Any number of clients and one server.
- Horizontal fragmentation
- Binary classification (multi-class not yet supported)
- Both fixed learning rate or second-order methods (Hessian)

This software implementation was financed via EUREKA ITEA labeling under Project
reference number 20050.

_Limitations in (end-)use: the content of this software package may solely be
used for applications that comply with international export control laws.  
This implementation of cryptographic software has not been audited. Use at your
own risk._

## Documentation

Documentation of the tno.mpc.communication package can be found
[here](https://docs.pet.tno.nl/fl/protocols/logistic_regression/0.2.2).

## Install

Easily install the tno.fl.protocols.logistic_regression package using pip:

```console
$ python -m pip install tno.fl.protocols.logistic_regression
```

If you wish to run the tests you can use:

```console
$ python -m pip install 'tno.fl.protocols.logistic_regression[tests]'
```

## Usage

This package uses federated learning for training a logistic regression model on
datasets that are distributed amongst several clients. Below is first a short
overview of federated learning in general and how this has been implemented in
this package. In the next section, a minimal working example is provided.

### Federated Learning

In Federated Learning, several clients, each with their own data, wish to fit a
model on their combined data. Each client computes a local update on their model
and sends this update to a central server. This server combines these updates,
updates the global model from this aggregated update and sends this new model
back to the clients. Then the process repeats: the clients compute the local
updates on this new model, send this to the server, which combines it and so on.
This is done until the server notices that the model has converged.

This package implements binary logistic regression. So each client has a data
set, that contains data and for each row a binary indicator. The goal is to
predict the binary indicator for new data. For example, the data could images of
cats and dogs and the binary indicator indicates whether it is a cat or a dog.
The goal of the logistic regression model, is to predict for new images whether
it contains a cat or a dog. More information on logistic regression is widely
available.

In the case of logistic regression, the updates the client compute consist of a
gradient. This model also implements a second-order derivative (Newton's
method).

### Implementation

The implementation of federated logistic regression consist of two classes with
the suggestive names `Client` and `Server`. Each client is an instance of
`Client` and the server is an instance of the `Server` class. These classes are
passed a configuration object and a name (unique identifier for the client).
Calling the `.run()` method on the objects, will perform the federated learning
and returns the resulting logistic regression model (numpy array).

All settings are defined in a configuration file. This file is a `.ini` file and
a template is given in the `config.ini` (in the repository). An example is also
shown below in the minimal example. Here is an overview of what must be in the
configuration.

The files contains a Parties section in which the names of all clients and the
name of the server are listed. Next we have a separate section for each client
and server, containing the IP-address and port on which it can be reached. The
clients also have a link to the location of the `.csv`-file containing the
data.  
The 'Experiment' section contains the experiment configuration. Most of the
fields are self-explanatory:

- **data_columns**: the columns in the csv which should be used for training.
- **target_column**: the target column in the csv (which should be predicted).
- **intercept**: whether an intercept column should be added.
- **n_epochs**: maximum number of epochs
- **learning_rate**: the learning rate (float) or 'hessian'. If this value is
  'hessian', a second-order derivative is used as learning rate (Newton's
  method).

_Note: At this moment, only csv-files are supported as input. Users can use
other file types or databases by overriding the `load_data()` method on the
clients._

#### Communication

This package relies on the `tno.mpc.communication` package, which is also part
of the PET lab. It is used for the communication amongst the server and the
clients. Since this package uses `asyncio` for asynchronous handling, this
federated learning package depends on it as well. For more information about
this, we refer to the
[tno.mpc.communication documentation](https://docs.pet.tno.nl/mpc/communication/)

### Example code

Below is a very minimal example of how to use the library. It consists of two
clients, Alice and Bob, who want to fit a model for recognizing the setosa iris
flower. Below is an excerpt from their data sets:

`data_alice.csv`

```csv
sepal_length,sepal_width,petal_length,petal_width,is_setosa
5.8,2.7,5.1,1.9,0
6.9,3.1,5.4,2.1,0
5,3.4,1.5,0.2,1
5.2,4.1,1.5,0.1,1
6.7,3.1,5.6,2.4,0
6.3,2.9,5.6,1.8,0
5.6,2.5,3.9,1.1,0
5.7,3.8,1.7,0.3,1
5.8,2.6,4,1.2,0
```

`data_bob.csv`

```csv
sepal_length,sepal_width,petal_length,petal_width,is_setosa
7.2,3,5.8,1.6,0
6.7,2.5,5.8,1.8,0
6,3.4,4.5,1.6,0
4.8,3.4,1.6,0.2,1
7.7,3.8,6.7,2.2,0
5.4,3.9,1.3,0.4,1
7.7,3,6.1,2.3,0
7.1,3,5.9,2.1,0
6.1,2.9,4.7,1.4,0
```

Next, we create a configuration file for this experiment.

`iris.ini`

```text
[Experiment]
data_columns=sepal_length,sepal_width,petal_length,petal_width
target_column=is_setosa
intercept=True
n_epochs=10
learning_rate=hessian

[Parties]
clients=Alice,Bob
server=Server

[Server]
address=localhost
port=8000

[Alice]
address=localhost
port=8001
train_data=data_alice.csv

[Bob]
address=localhost
port=8002
train_data=data_bob.csv
```

Finally, we create the code to run the federated learning algorithm:

`main.py`

```python
import asyncio
import sys
from pathlib import Path

from tno.fl.protocols.logistic_regression.client import Client
from tno.fl.protocols.logistic_regression.config import Config
from tno.fl.protocols.logistic_regression.server import Server


async def async_main() -> None:
    config = Config.from_file(Path("iris.ini"))
    if sys.argv[1].lower() == "server":
        server = Server(config)
        print(await server.run())
    elif sys.argv[1].lower() == "alice":
        client = Client(config, "Alice")
        print(await client.run())
    elif sys.argv[1].lower() == "bob":
        client = Client(config, "Bob")
        print(await client.run())
    else:
        raise ValueError(
            "This player has not been implemented. Possible values are: server, alice, bob"
        )


if __name__ == "__main__":
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    loop.run_until_complete(async_main())
```

To run this script, call `main.py` from the folder where the data files and the
config file are located. As command line argument, pass it the name of the party
running the app: 'Alice', 'Bob', or 'Server'. To run in on a single computer,
run the following three command, each in a different terminal: Note that if a
client is started prior to the server, it will throw a ClientConnectorError.
Namely, the client tries to send a message to port the server, which has not
been opened yet. After starting the server, the error disappears.

```console
python main.py alice
python main.py bob
python main.py server
```

The output for the clients will be something similar to:

```console
>>> python main.py alice
2023-07-31 14:21:21,765 - tno.mpc.communication.httphandlers - INFO - Serving on localhost:8001
2023-07-31 14:21:21,780 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000
2023-07-31 14:21:21,796 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000
2023-07-31 14:21:21,811 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000
2023-07-31 14:21:21,833 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000
2023-07-31 14:21:21,833 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000
2023-07-31 14:21:21,851 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000
2023-07-31 14:21:21,867 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000
2023-07-31 14:21:21,882 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000
2023-07-31 14:21:21,898 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000
2023-07-31 14:21:21,914 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000
[[-2.907941249994596], [-0.1876483927585601], [7.728309577725918], [-6.938238886471739], [0.5467650097181181]]
```

We first see the client setting up the connection with the server. Then we have
ten rounds of training, as indicated in the configuration file. Finally, we
print the resulting model. We obtain the following coefficients for classifying
setosa irises:

| Parameter    | Coefficient         |
| ------------ | ------------------- |
| intercept    | -2.907941249994596  |
| sepal_length | -0.1876483927585601 |
| sepal_width  | 7.728309577725918   |
| petal_length | -6.938238886471739  |
| petal_width  | 0.5467650097181181  |

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "tno.fl.protocols.logistic-regression",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "TNO PET Lab <petlab@tno.nl>",
    "keywords": "TNO,PET,machine learning,federated learning,logistic regression",
    "author": "",
    "author_email": "TNO PET Lab <petlab@tno.nl>",
    "download_url": "https://files.pythonhosted.org/packages/da/8e/2a2bfa7086dc924ca04552d0c44c6be85fbc8db217108d4e8fe3bbbc9856/tno.fl.protocols.logistic_regression-0.2.2.tar.gz",
    "platform": "any",
    "description": "# TNO PET Lab - Federated Learning (FL) - Protocols - Logistic Regression\n\nThe TNO PET Lab consists of generic software components, procedures, and\nfunctionalities developed and maintained on a regular basis to facilitate and\naid in the development of PET solutions. The lab is a cross-project initiative\nallowing us to integrate and reuse previously developed PET functionalities to\nboost the development of new protocols and solutions.\n\nThe package `tno.fl.protocols.logistic_regression` is part of the TNO Python\nToolbox.\n\nImplementation of a Federated Learning scheme for Logistic Regression. This\nlibrary was designed to facilitate both developers that are new to cryptography\nand developers that are more familiar with cryptography.\n\nSupports:\n\n- Any number of clients and one server.\n- Horizontal fragmentation\n- Binary classification (multi-class not yet supported)\n- Both fixed learning rate or second-order methods (Hessian)\n\nThis software implementation was financed via EUREKA ITEA labeling under Project\nreference number 20050.\n\n_Limitations in (end-)use: the content of this software package may solely be\nused for applications that comply with international export control laws.  \nThis implementation of cryptographic software has not been audited. Use at your\nown risk._\n\n## Documentation\n\nDocumentation of the tno.mpc.communication package can be found\n[here](https://docs.pet.tno.nl/fl/protocols/logistic_regression/0.2.2).\n\n## Install\n\nEasily install the tno.fl.protocols.logistic_regression package using pip:\n\n```console\n$ python -m pip install tno.fl.protocols.logistic_regression\n```\n\nIf you wish to run the tests you can use:\n\n```console\n$ python -m pip install 'tno.fl.protocols.logistic_regression[tests]'\n```\n\n## Usage\n\nThis package uses federated learning for training a logistic regression model on\ndatasets that are distributed amongst several clients. Below is first a short\noverview of federated learning in general and how this has been implemented in\nthis package. In the next section, a minimal working example is provided.\n\n### Federated Learning\n\nIn Federated Learning, several clients, each with their own data, wish to fit a\nmodel on their combined data. Each client computes a local update on their model\nand sends this update to a central server. This server combines these updates,\nupdates the global model from this aggregated update and sends this new model\nback to the clients. Then the process repeats: the clients compute the local\nupdates on this new model, send this to the server, which combines it and so on.\nThis is done until the server notices that the model has converged.\n\nThis package implements binary logistic regression. So each client has a data\nset, that contains data and for each row a binary indicator. The goal is to\npredict the binary indicator for new data. For example, the data could images of\ncats and dogs and the binary indicator indicates whether it is a cat or a dog.\nThe goal of the logistic regression model, is to predict for new images whether\nit contains a cat or a dog. More information on logistic regression is widely\navailable.\n\nIn the case of logistic regression, the updates the client compute consist of a\ngradient. This model also implements a second-order derivative (Newton's\nmethod).\n\n### Implementation\n\nThe implementation of federated logistic regression consist of two classes with\nthe suggestive names `Client` and `Server`. Each client is an instance of\n`Client` and the server is an instance of the `Server` class. These classes are\npassed a configuration object and a name (unique identifier for the client).\nCalling the `.run()` method on the objects, will perform the federated learning\nand returns the resulting logistic regression model (numpy array).\n\nAll settings are defined in a configuration file. This file is a `.ini` file and\na template is given in the `config.ini` (in the repository). An example is also\nshown below in the minimal example. Here is an overview of what must be in the\nconfiguration.\n\nThe files contains a Parties section in which the names of all clients and the\nname of the server are listed. Next we have a separate section for each client\nand server, containing the IP-address and port on which it can be reached. The\nclients also have a link to the location of the `.csv`-file containing the\ndata.  \nThe 'Experiment' section contains the experiment configuration. Most of the\nfields are self-explanatory:\n\n- **data_columns**: the columns in the csv which should be used for training.\n- **target_column**: the target column in the csv (which should be predicted).\n- **intercept**: whether an intercept column should be added.\n- **n_epochs**: maximum number of epochs\n- **learning_rate**: the learning rate (float) or 'hessian'. If this value is\n  'hessian', a second-order derivative is used as learning rate (Newton's\n  method).\n\n_Note: At this moment, only csv-files are supported as input. Users can use\nother file types or databases by overriding the `load_data()` method on the\nclients._\n\n#### Communication\n\nThis package relies on the `tno.mpc.communication` package, which is also part\nof the PET lab. It is used for the communication amongst the server and the\nclients. Since this package uses `asyncio` for asynchronous handling, this\nfederated learning package depends on it as well. For more information about\nthis, we refer to the\n[tno.mpc.communication documentation](https://docs.pet.tno.nl/mpc/communication/)\n\n### Example code\n\nBelow is a very minimal example of how to use the library. It consists of two\nclients, Alice and Bob, who want to fit a model for recognizing the setosa iris\nflower. Below is an excerpt from their data sets:\n\n`data_alice.csv`\n\n```csv\nsepal_length,sepal_width,petal_length,petal_width,is_setosa\n5.8,2.7,5.1,1.9,0\n6.9,3.1,5.4,2.1,0\n5,3.4,1.5,0.2,1\n5.2,4.1,1.5,0.1,1\n6.7,3.1,5.6,2.4,0\n6.3,2.9,5.6,1.8,0\n5.6,2.5,3.9,1.1,0\n5.7,3.8,1.7,0.3,1\n5.8,2.6,4,1.2,0\n```\n\n`data_bob.csv`\n\n```csv\nsepal_length,sepal_width,petal_length,petal_width,is_setosa\n7.2,3,5.8,1.6,0\n6.7,2.5,5.8,1.8,0\n6,3.4,4.5,1.6,0\n4.8,3.4,1.6,0.2,1\n7.7,3.8,6.7,2.2,0\n5.4,3.9,1.3,0.4,1\n7.7,3,6.1,2.3,0\n7.1,3,5.9,2.1,0\n6.1,2.9,4.7,1.4,0\n```\n\nNext, we create a configuration file for this experiment.\n\n`iris.ini`\n\n```text\n[Experiment]\ndata_columns=sepal_length,sepal_width,petal_length,petal_width\ntarget_column=is_setosa\nintercept=True\nn_epochs=10\nlearning_rate=hessian\n\n[Parties]\nclients=Alice,Bob\nserver=Server\n\n[Server]\naddress=localhost\nport=8000\n\n[Alice]\naddress=localhost\nport=8001\ntrain_data=data_alice.csv\n\n[Bob]\naddress=localhost\nport=8002\ntrain_data=data_bob.csv\n```\n\nFinally, we create the code to run the federated learning algorithm:\n\n`main.py`\n\n```python\nimport asyncio\nimport sys\nfrom pathlib import Path\n\nfrom tno.fl.protocols.logistic_regression.client import Client\nfrom tno.fl.protocols.logistic_regression.config import Config\nfrom tno.fl.protocols.logistic_regression.server import Server\n\n\nasync def async_main() -> None:\n    config = Config.from_file(Path(\"iris.ini\"))\n    if sys.argv[1].lower() == \"server\":\n        server = Server(config)\n        print(await server.run())\n    elif sys.argv[1].lower() == \"alice\":\n        client = Client(config, \"Alice\")\n        print(await client.run())\n    elif sys.argv[1].lower() == \"bob\":\n        client = Client(config, \"Bob\")\n        print(await client.run())\n    else:\n        raise ValueError(\n            \"This player has not been implemented. Possible values are: server, alice, bob\"\n        )\n\n\nif __name__ == \"__main__\":\n    loop = asyncio.new_event_loop()\n    asyncio.set_event_loop(loop)\n    loop.run_until_complete(async_main())\n```\n\nTo run this script, call `main.py` from the folder where the data files and the\nconfig file are located. As command line argument, pass it the name of the party\nrunning the app: 'Alice', 'Bob', or 'Server'. To run in on a single computer,\nrun the following three command, each in a different terminal: Note that if a\nclient is started prior to the server, it will throw a ClientConnectorError.\nNamely, the client tries to send a message to port the server, which has not\nbeen opened yet. After starting the server, the error disappears.\n\n```console\npython main.py alice\npython main.py bob\npython main.py server\n```\n\nThe output for the clients will be something similar to:\n\n```console\n>>> python main.py alice\n2023-07-31 14:21:21,765 - tno.mpc.communication.httphandlers - INFO - Serving on localhost:8001\n2023-07-31 14:21:21,780 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000\n2023-07-31 14:21:21,796 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000\n2023-07-31 14:21:21,811 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000\n2023-07-31 14:21:21,833 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000\n2023-07-31 14:21:21,833 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000\n2023-07-31 14:21:21,851 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000\n2023-07-31 14:21:21,867 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000\n2023-07-31 14:21:21,882 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000\n2023-07-31 14:21:21,898 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000\n2023-07-31 14:21:21,914 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8000\n[[-2.907941249994596], [-0.1876483927585601], [7.728309577725918], [-6.938238886471739], [0.5467650097181181]]\n```\n\nWe first see the client setting up the connection with the server. Then we have\nten rounds of training, as indicated in the configuration file. Finally, we\nprint the resulting model. We obtain the following coefficients for classifying\nsetosa irises:\n\n| Parameter    | Coefficient         |\n| ------------ | ------------------- |\n| intercept    | -2.907941249994596  |\n| sepal_length | -0.1876483927585601 |\n| sepal_width  | 7.728309577725918   |\n| petal_length | -6.938238886471739  |\n| petal_width  | 0.5467650097181181  |\n",
    "bugtrack_url": null,
    "license": "Apache License, Version 2.0",
    "summary": "Generic utilities for implementing encryption schemes",
    "version": "0.2.2",
    "project_urls": {
        "Documentation": "https://docs.pet.tno.nl/fl/protocols/logistic_regression/0.2.2",
        "Homepage": "https://pet.tno.nl/",
        "Source": "https://github.com/TNO-FL/protocols.logistic_regression"
    },
    "split_keywords": [
        "tno",
        "pet",
        "machine learning",
        "federated learning",
        "logistic regression"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b5604721f17c5f6b805a9ec39c4d65672f7770d4d8367aaf31bbb0e3766e4cc8",
                "md5": "d5eb9d382766ae0d9c9c93f674844f18",
                "sha256": "58f04f0af8671d30f5d0d5cb8411b1e886a67ffe48677aaec05edea6aec53fd9"
            },
            "downloads": -1,
            "filename": "tno.fl.protocols.logistic_regression-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d5eb9d382766ae0d9c9c93f674844f18",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 18624,
            "upload_time": "2023-08-03T10:35:53",
            "upload_time_iso_8601": "2023-08-03T10:35:53.955913Z",
            "url": "https://files.pythonhosted.org/packages/b5/60/4721f17c5f6b805a9ec39c4d65672f7770d4d8367aaf31bbb0e3766e4cc8/tno.fl.protocols.logistic_regression-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "da8e2a2bfa7086dc924ca04552d0c44c6be85fbc8db217108d4e8fe3bbbc9856",
                "md5": "32ac205328f1cab8322cfc66cf61a923",
                "sha256": "dbb5c9931436d14ec3277d9e50779bb9c7b1eaf020347836f78d20576738bc22"
            },
            "downloads": -1,
            "filename": "tno.fl.protocols.logistic_regression-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "32ac205328f1cab8322cfc66cf61a923",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 19619,
            "upload_time": "2023-08-03T10:35:55",
            "upload_time_iso_8601": "2023-08-03T10:35:55.423535Z",
            "url": "https://files.pythonhosted.org/packages/da/8e/2a2bfa7086dc924ca04552d0c44c6be85fbc8db217108d4e8fe3bbbc9856/tno.fl.protocols.logistic_regression-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-03 10:35:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TNO-FL",
    "github_project": "protocols.logistic_regression",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "tno.fl.protocols.logistic-regression"
}
        
Elapsed time: 1.68210s