# TNO PET Lab - Federated Learning (FL) - Protocols - Logistic Regression
Implementation of a Federated Learning scheme for Logistic Regression. This
library was designed to facilitate both developers that are new to cryptography
and developers that are more familiar with cryptography.
Supports:
- Any number of clients and one server.
- Horizontal fragmentation
- Binary classification (multi-class not yet supported)
- Both fixed learning rate or second-order methods (Hessian)
### PET Lab
The TNO PET Lab consists of generic software components, procedures, and functionalities developed and maintained on a regular basis to facilitate and aid in the development of PET solutions. The lab is a cross-project initiative allowing us to integrate and reuse previously developed PET functionalities to boost the development of new protocols and solutions.
The package `tno.fl.protocols.logistic_regression` is part of the [TNO Python Toolbox](https://github.com/TNO-PET).
_Limitations in (end-)use: the content of this software package may solely be used for applications that comply with international export control laws._
_This implementation of cryptographic software has not been audited. Use at your own risk._
## Documentation
Documentation of the `tno.fl.protocols.logistic_regression` package can be found
[here](https://docs.pet.tno.nl/fl/protocols/logistic_regression/1.1.0).
## Install
Easily install the `tno.fl.protocols.logistic_regression` package using `pip`:
```console
$ python -m pip install tno.fl.protocols.logistic_regression
```
_Note:_ If you are cloning the repository and wish to edit the source code, be
sure to install the package in editable mode:
```console
$ python -m pip install -e 'tno.fl.protocols.logistic_regression'
```
If you wish to run the tests you can use:
```console
$ python -m pip install 'tno.fl.protocols.logistic_regression[tests]'
```
## Usage
This package uses federated learning for training a logistic regression model on
datasets that are distributed amongst several clients. Below is first a short
overview of federated learning in general and how this has been implemented in
this package. In the next section, a minimal working example is provided. This
code is also available in the repository in the `examples` folder.
### Federated Learning
In Federated Learning, several clients, each with their own data, wish to fit a
model on their combined data. Each client computes a local update on their model
and sends this update to a central server. This server combines these updates,
updates the global model from this aggregated update and sends this new model
back to the clients. Then the process repeats: the clients compute the local
updates on this new model, send this to the server, which combines it and so on.
This is done until the server notices that the model has converged.
This package implements binary logistic regression. So each client has a data
set, that contains data and for each row a binary indicator. The goal is to
predict the binary indicator for new data. For example, the data could images of
cats and dogs and the binary indicator indicates whether it is a cat or a dog.
The goal of the logistic regression model, is to predict for new images whether
it contains a cat or a dog. More information on logistic regression is widely
available.
In the case of logistic regression, the updates the client compute consist of a
gradient. This model also implements a second-order derivative (Newton's
method).
### Implementation
The implementation of federated logistic regression consist of two classes with
the suggestive names `Client` and `Server`. Each client is an instance of
`Client` and the server is an instance of the `Server` class. These classes are
passed the required parameters and a communication pool.
Calling the `.run` method with the data will perform the federated learning
and returns the resulting logistic regression model (as a numpy array).
#### Communication
The client and the servers must be given a communication pool during initialization.
This is a `Pool` object from the `tno.mpc.communication` package, which is also part
of the PET lab. It is used for the communication amongst the server and the
clients. We refer to this package for more information about this.
The example file also gives an example of how to set up a simple communication pool.
Since the communication package uses `asyncio` for asynchronous handling, this
federated learning package depends on it as well. For more information about
this, we refer to the
[tno.mpc.communication documentation](https://docs.pet.tno.nl/mpc/communication/)
#### Passing the data
Once the client and the server have been properly initialized,
the federated learning can be performed using the `.run()` function.
This function has two arguments.
The first is a numpy array containing the covariates / training data.
The second is another numpy array of booleans containing the target data.
So the first one contains the sample data and the second contains the category the sample belongs to.
Currently, only binary classification is supported.
#### Other customization
All settings are passed as parameters to the client and the server.
This includes:
- **fit_intercept:** Should an intercept column be added to the data as first column. Default: False
- **max_iter:** The maximum number of iterations in the learning process. Default: 25.
- **server_name:** The name of the server handler in the pool object. Default: 'server'.
In addition, there are many possibilities for overriding client/server functions,
such as a preprocessing function, computing the client weights, or the initial model.
### Example code
Below is a very minimal example of how to use the library. It consists of two
clients, Alice and Bob, who want to fit a model for recognizing the setosa iris
flower. Below is an excerpt from their data sets:
`data_alice.csv`
```csv
sepal_length,sepal_width,petal_length,petal_width,is_setosa
5.8,2.7,5.1,1.9,0
6.9,3.1,5.4,2.1,0
5,3.4,1.5,0.2,1
5.2,4.1,1.5,0.1,1
6.7,3.1,5.6,2.4,0
6.3,2.9,5.6,1.8,0
5.6,2.5,3.9,1.1,0
5.7,3.8,1.7,0.3,1
5.8,2.6,4,1.2,0
```
`data_bob.csv`
```csv
sepal_length,sepal_width,petal_length,petal_width,is_setosa
7.2,3,5.8,1.6,0
6.7,2.5,5.8,1.8,0
6,3.4,4.5,1.6,0
4.8,3.4,1.6,0.2,1
7.7,3.8,6.7,2.2,0
5.4,3.9,1.3,0.4,1
7.7,3,6.1,2.3,0
7.1,3,5.9,2.1,0
6.1,2.9,4.7,1.4,0
```
We create the following code to run the federated learning algorithm:
`main.py`
```python
"""
This module runs the logistic regression protocol on an example data set.
By running the script three times with command line argument 'server', 'alice'
and 'bob' respectively, you can get a demonstration of how it works.
"""
import asyncio
import sys
import pandas as pd
from tno.mpc.communication import Pool
from tno.fl.protocols.logistic_regression.client import Client
from tno.fl.protocols.logistic_regression.server import Server
async def run_client(name: str, port: int) -> None:
# Create Pool
pool = Pool()
pool.add_http_server(addr="localhost", port=port)
pool.add_http_client(name="server", addr="localhost", port=8080)
# Get Data
csv_data = pd.read_csv("data_" + name + ".csv")
data = csv_data[
["sepal_length", "sepal_width", "petal_length", "petal_width"]
].to_numpy()
target = csv_data["is_setosa"].to_numpy()
# Create Client
client = Client(pool, fit_intercept=True, max_iter=10)
print(await client.run(data, target))
async def run_server() -> None:
# Create Pool
pool = Pool()
pool.add_http_server(addr="localhost", port=8080)
pool.add_http_client(name="alice", addr="localhost", port=8081)
pool.add_http_client(name="bob", addr="localhost", port=8082)
# Create Client
server = Server(pool, max_iter=10)
await server.run()
async def async_main() -> None:
if len(sys.argv) < 2:
raise ValueError("Player name must be provided.")
if sys.argv[1].lower() == "server":
await run_server()
elif sys.argv[1].lower() == "alice":
await run_client("alice", 8081)
elif sys.argv[1].lower() == "bob":
await run_client("bob", 8082)
else:
raise ValueError(
"This player has not been implemented. Possible values are: server, alice, bob"
)
if __name__ == "__main__":
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(async_main())
```
To run this script, call `main.py` from the folder where the data files and the
config file are located. As command line argument, pass it the name of the party
running the app: 'Alice', 'Bob', or 'Server'. To run in on a single computer,
run the following three command, each in a different terminal: Note that if a
client is started prior to the server, it will throw a ClientConnectorError.
Namely, the client tries to send a message to port the server, which has not
been opened yet. After starting the server, the error disappears.
```commandline
python main.py alice
python main.py bob
python main.py server
```
The output for the clients will be something similar to:
```commandline
>>> python main.py alice
2024-01-18 16:01:56,735 - tno.mpc.communication.httphandlers - INFO - Serving on localhost:8081
2024-01-18 16:01:58,655 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080
2024-01-18 16:01:58,655 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080
2024-01-18 16:01:58,671 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080
2024-01-18 16:01:58,693 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080
2024-01-18 16:01:58,709 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080
2024-01-18 16:01:58,709 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080
2024-01-18 16:01:58,724 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080
2024-01-18 16:01:58,740 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080
2024-01-18 16:01:58,756 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080
2024-01-18 16:01:58,771 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080
2024-01-18 16:01:58,793 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080
[[-7.63901840925708], [2.985418690990691], [4.688929649931743], [-6.397069834606601], [-6.008454039386442]]
```
We first see the client setting up the connection with the server. Then we have
ten rounds of training, as indicated in the configuration file. Finally, we
print the resulting model. We obtain the following coefficients for classifying
setosa irises:
| Parameter | Coefficient |
| ------------ | ------------------ |
| intercept | -7.63901840925708 |
| sepal_length | 2.985418690990691 |
| sepal_width | 4.688929649931743 |
| petal_length | -6.397069834606601 |
| petal_width | -6.008454039386442 |
Raw data
{
"_id": null,
"home_page": null,
"name": "tno.fl.protocols.logistic-regression",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "TNO PET Lab <petlab@tno.nl>",
"keywords": "TNO, FL, federated learning, machine learning, logistic regression",
"author": null,
"author_email": "TNO PET Lab <petlab@tno.nl>",
"download_url": "https://files.pythonhosted.org/packages/13/03/a942eaeba21b03ede3fef88837ca43f0dcc5762d042d2cf25952ef5fbb57/tno_fl_protocols_logistic_regression-1.1.0.tar.gz",
"platform": "any",
"description": "# TNO PET Lab - Federated Learning (FL) - Protocols - Logistic Regression\n\nImplementation of a Federated Learning scheme for Logistic Regression. This\nlibrary was designed to facilitate both developers that are new to cryptography\nand developers that are more familiar with cryptography.\n\nSupports:\n\n- Any number of clients and one server.\n- Horizontal fragmentation\n- Binary classification (multi-class not yet supported)\n- Both fixed learning rate or second-order methods (Hessian)\n\n### PET Lab\n\nThe TNO PET Lab consists of generic software components, procedures, and functionalities developed and maintained on a regular basis to facilitate and aid in the development of PET solutions. The lab is a cross-project initiative allowing us to integrate and reuse previously developed PET functionalities to boost the development of new protocols and solutions.\n\nThe package `tno.fl.protocols.logistic_regression` is part of the [TNO Python Toolbox](https://github.com/TNO-PET).\n\n_Limitations in (end-)use: the content of this software package may solely be used for applications that comply with international export control laws._ \n_This implementation of cryptographic software has not been audited. Use at your own risk._\n\n## Documentation\n\nDocumentation of the `tno.fl.protocols.logistic_regression` package can be found\n[here](https://docs.pet.tno.nl/fl/protocols/logistic_regression/1.1.0).\n\n## Install\n\nEasily install the `tno.fl.protocols.logistic_regression` package using `pip`:\n\n```console\n$ python -m pip install tno.fl.protocols.logistic_regression\n```\n\n_Note:_ If you are cloning the repository and wish to edit the source code, be\nsure to install the package in editable mode:\n\n```console\n$ python -m pip install -e 'tno.fl.protocols.logistic_regression'\n```\n\nIf you wish to run the tests you can use:\n\n```console\n$ python -m pip install 'tno.fl.protocols.logistic_regression[tests]'\n```\n\n## Usage\n\nThis package uses federated learning for training a logistic regression model on\ndatasets that are distributed amongst several clients. Below is first a short\noverview of federated learning in general and how this has been implemented in\nthis package. In the next section, a minimal working example is provided. This\ncode is also available in the repository in the `examples` folder.\n\n### Federated Learning\n\nIn Federated Learning, several clients, each with their own data, wish to fit a\nmodel on their combined data. Each client computes a local update on their model\nand sends this update to a central server. This server combines these updates,\nupdates the global model from this aggregated update and sends this new model\nback to the clients. Then the process repeats: the clients compute the local\nupdates on this new model, send this to the server, which combines it and so on.\nThis is done until the server notices that the model has converged.\n\nThis package implements binary logistic regression. So each client has a data\nset, that contains data and for each row a binary indicator. The goal is to\npredict the binary indicator for new data. For example, the data could images of\ncats and dogs and the binary indicator indicates whether it is a cat or a dog.\nThe goal of the logistic regression model, is to predict for new images whether\nit contains a cat or a dog. More information on logistic regression is widely\navailable.\n\nIn the case of logistic regression, the updates the client compute consist of a\ngradient. This model also implements a second-order derivative (Newton's\nmethod).\n\n### Implementation\n\nThe implementation of federated logistic regression consist of two classes with\nthe suggestive names `Client` and `Server`. Each client is an instance of\n`Client` and the server is an instance of the `Server` class. These classes are\npassed the required parameters and a communication pool.\nCalling the `.run` method with the data will perform the federated learning\nand returns the resulting logistic regression model (as a numpy array).\n\n#### Communication\n\nThe client and the servers must be given a communication pool during initialization.\nThis is a `Pool` object from the `tno.mpc.communication` package, which is also part\nof the PET lab. It is used for the communication amongst the server and the\nclients. We refer to this package for more information about this.\nThe example file also gives an example of how to set up a simple communication pool.\n\nSince the communication package uses `asyncio` for asynchronous handling, this\nfederated learning package depends on it as well. For more information about\nthis, we refer to the\n[tno.mpc.communication documentation](https://docs.pet.tno.nl/mpc/communication/)\n\n#### Passing the data\n\nOnce the client and the server have been properly initialized,\nthe federated learning can be performed using the `.run()` function.\nThis function has two arguments.\nThe first is a numpy array containing the covariates / training data.\nThe second is another numpy array of booleans containing the target data.\nSo the first one contains the sample data and the second contains the category the sample belongs to.\nCurrently, only binary classification is supported.\n\n#### Other customization\n\nAll settings are passed as parameters to the client and the server.\nThis includes:\n\n- **fit_intercept:** Should an intercept column be added to the data as first column. Default: False\n- **max_iter:** The maximum number of iterations in the learning process. Default: 25.\n- **server_name:** The name of the server handler in the pool object. Default: 'server'.\n\nIn addition, there are many possibilities for overriding client/server functions,\nsuch as a preprocessing function, computing the client weights, or the initial model.\n\n### Example code\n\nBelow is a very minimal example of how to use the library. It consists of two\nclients, Alice and Bob, who want to fit a model for recognizing the setosa iris\nflower. Below is an excerpt from their data sets:\n\n`data_alice.csv`\n\n```csv\nsepal_length,sepal_width,petal_length,petal_width,is_setosa\n5.8,2.7,5.1,1.9,0\n6.9,3.1,5.4,2.1,0\n5,3.4,1.5,0.2,1\n5.2,4.1,1.5,0.1,1\n6.7,3.1,5.6,2.4,0\n6.3,2.9,5.6,1.8,0\n5.6,2.5,3.9,1.1,0\n5.7,3.8,1.7,0.3,1\n5.8,2.6,4,1.2,0\n```\n\n`data_bob.csv`\n\n```csv\nsepal_length,sepal_width,petal_length,petal_width,is_setosa\n7.2,3,5.8,1.6,0\n6.7,2.5,5.8,1.8,0\n6,3.4,4.5,1.6,0\n4.8,3.4,1.6,0.2,1\n7.7,3.8,6.7,2.2,0\n5.4,3.9,1.3,0.4,1\n7.7,3,6.1,2.3,0\n7.1,3,5.9,2.1,0\n6.1,2.9,4.7,1.4,0\n```\n\nWe create the following code to run the federated learning algorithm:\n\n`main.py`\n\n```python\n\"\"\"\nThis module runs the logistic regression protocol on an example data set.\nBy running the script three times with command line argument 'server', 'alice'\nand 'bob' respectively, you can get a demonstration of how it works.\n\"\"\"\n\nimport asyncio\nimport sys\n\nimport pandas as pd\n\nfrom tno.mpc.communication import Pool\n\nfrom tno.fl.protocols.logistic_regression.client import Client\nfrom tno.fl.protocols.logistic_regression.server import Server\n\n\nasync def run_client(name: str, port: int) -> None:\n # Create Pool\n pool = Pool()\n pool.add_http_server(addr=\"localhost\", port=port)\n pool.add_http_client(name=\"server\", addr=\"localhost\", port=8080)\n # Get Data\n csv_data = pd.read_csv(\"data_\" + name + \".csv\")\n data = csv_data[\n [\"sepal_length\", \"sepal_width\", \"petal_length\", \"petal_width\"]\n ].to_numpy()\n target = csv_data[\"is_setosa\"].to_numpy()\n # Create Client\n client = Client(pool, fit_intercept=True, max_iter=10)\n print(await client.run(data, target))\n\n\nasync def run_server() -> None:\n # Create Pool\n pool = Pool()\n pool.add_http_server(addr=\"localhost\", port=8080)\n pool.add_http_client(name=\"alice\", addr=\"localhost\", port=8081)\n pool.add_http_client(name=\"bob\", addr=\"localhost\", port=8082)\n # Create Client\n server = Server(pool, max_iter=10)\n await server.run()\n\n\nasync def async_main() -> None:\n if len(sys.argv) < 2:\n raise ValueError(\"Player name must be provided.\")\n if sys.argv[1].lower() == \"server\":\n await run_server()\n elif sys.argv[1].lower() == \"alice\":\n await run_client(\"alice\", 8081)\n elif sys.argv[1].lower() == \"bob\":\n await run_client(\"bob\", 8082)\n else:\n raise ValueError(\n \"This player has not been implemented. Possible values are: server, alice, bob\"\n )\n\n\nif __name__ == \"__main__\":\n loop = asyncio.new_event_loop()\n asyncio.set_event_loop(loop)\n loop.run_until_complete(async_main())\n```\n\nTo run this script, call `main.py` from the folder where the data files and the\nconfig file are located. As command line argument, pass it the name of the party\nrunning the app: 'Alice', 'Bob', or 'Server'. To run in on a single computer,\nrun the following three command, each in a different terminal: Note that if a\nclient is started prior to the server, it will throw a ClientConnectorError.\nNamely, the client tries to send a message to port the server, which has not\nbeen opened yet. After starting the server, the error disappears.\n\n```commandline\npython main.py alice\npython main.py bob\npython main.py server\n```\n\nThe output for the clients will be something similar to:\n\n```commandline\n>>> python main.py alice\n2024-01-18 16:01:56,735 - tno.mpc.communication.httphandlers - INFO - Serving on localhost:8081\n2024-01-18 16:01:58,655 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080\n2024-01-18 16:01:58,655 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080\n2024-01-18 16:01:58,671 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080\n2024-01-18 16:01:58,693 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080\n2024-01-18 16:01:58,709 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080\n2024-01-18 16:01:58,709 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080\n2024-01-18 16:01:58,724 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080\n2024-01-18 16:01:58,740 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080\n2024-01-18 16:01:58,756 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080\n2024-01-18 16:01:58,771 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080\n2024-01-18 16:01:58,793 - tno.mpc.communication.httphandlers - INFO - Received message from 127.0.0.1:8080\n[[-7.63901840925708], [2.985418690990691], [4.688929649931743], [-6.397069834606601], [-6.008454039386442]]\n```\n\nWe first see the client setting up the connection with the server. Then we have\nten rounds of training, as indicated in the configuration file. Finally, we\nprint the resulting model. We obtain the following coefficients for classifying\nsetosa irises:\n\n| Parameter | Coefficient |\n| ------------ | ------------------ |\n| intercept | -7.63901840925708 |\n| sepal_length | 2.985418690990691 |\n| sepal_width | 4.688929649931743 |\n| petal_length | -6.397069834606601 |\n| petal_width | -6.008454039386442 |\n",
"bugtrack_url": null,
"license": "Apache License, Version 2.0",
"summary": "Implementation of a Federated Learning scheme for Logistic Regression",
"version": "1.1.0",
"project_urls": {
"Documentation": "https://docs.pet.tno.nl/fl/protocols/logistic_regression/1.1.0",
"Homepage": "https://pet.tno.nl/",
"Source": "https://github.com/TNO-FL/protocols.logistic_regression"
},
"split_keywords": [
"tno",
" fl",
" federated learning",
" machine learning",
" logistic regression"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "541d7f04f7f53a1710127ce79dd60d8afc639962c16ec59eaa7150a6d77c6df3",
"md5": "af39687c3215b70771cc121098a497b9",
"sha256": "4dcc17f41329d71c5bf5e1f5fda90def88f9a72cf8a466cc385a4445129de29f"
},
"downloads": -1,
"filename": "tno.fl.protocols.logistic_regression-1.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "af39687c3215b70771cc121098a497b9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 17533,
"upload_time": "2024-12-10T13:38:44",
"upload_time_iso_8601": "2024-12-10T13:38:44.274118Z",
"url": "https://files.pythonhosted.org/packages/54/1d/7f04f7f53a1710127ce79dd60d8afc639962c16ec59eaa7150a6d77c6df3/tno.fl.protocols.logistic_regression-1.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1303a942eaeba21b03ede3fef88837ca43f0dcc5762d042d2cf25952ef5fbb57",
"md5": "04df66217dd16c504b7e25835c74549f",
"sha256": "87b399b4d51e89fbfffb51e8d32d2ce509055f52b1501e33f1374d94df7c0d01"
},
"downloads": -1,
"filename": "tno_fl_protocols_logistic_regression-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "04df66217dd16c504b7e25835c74549f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 23375,
"upload_time": "2024-12-10T13:38:46",
"upload_time_iso_8601": "2024-12-10T13:38:46.879684Z",
"url": "https://files.pythonhosted.org/packages/13/03/a942eaeba21b03ede3fef88837ca43f0dcc5762d042d2cf25952ef5fbb57/tno_fl_protocols_logistic_regression-1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-10 13:38:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "TNO-FL",
"github_project": "protocols.logistic_regression",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "tno.fl.protocols.logistic-regression"
}