[![PyPi version](https://img.shields.io/pypi/v/lomas_client.svg)](https://pypi.org/project/lomas_client/)
[![PyPi status](https://img.shields.io/pypi/status/lomas_client.svg)](https://pypi.org/project/lomas_client/)
[![Python versions](https://img.shields.io/pypi/pyversions/lomas_client.svg)](https://pypi.org/project/lomas_client/)
<h1 align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/dscc-admin-ch/lomas/blob/wip_322_darkmode-logo/images/lomas_logo_darkmode_txt.png" width="300">
<source media="(prefers-color-scheme: light)" srcset="https://github.com/dscc-admin-ch/lomas/blob/wip_322_darkmode-logo/images/lomas_logo_txt.png" width="300">
<img alt="Lomas" src="https://github.com/dscc-admin-ch/lomas/blob/wip_322_darkmode-logo/images/lomas_logo_txt.png">
</picture>
</h1><br>
# Lomas Client
The `lomas_client` library is a client to interact with the Lomas server.
Utilizing this client library is strongly advised for querying and interacting with the server, as it takes care of all the necessary tasks such as serialization, deserialization, REST API calls, and ensures the correct installation of other required libraries. In short, it enables a seamless interaction with the server.
### Installation
It can be installed with the command:
```python
pip install lomas_client
```
### Simple introduction to clien use
#### Creat Client object:
Once the library is installed, a Client object must be created. To create the client, the user needs to give it a few parameters:
- a url: the root application endpoint to the remote secure server.
- a user_name: her name as registered in the database (Emilie)
- a dataset_name: the name of the dataset that she wants to query (PENGUIN)
```python
from lomas_client.client.client import Client
client = Client(url="http://lomas_server_dev:80", user_name = "Emilie", dataset_name = "PENGUIN")
```
Once `client` is initialized it can be used to send requests to respective DP frameworks.
#### Get metadata
Metadata information aout the dataset can be accessed in a format based on SmartnoiseSQL dictionary format, where among other, there is information about all the available columns, their type, bound values (see Smartnoise page for more details). Any metadata is required for Smartnoise-SQL is also required here and additional information such that the different categories in a string type column column can be added.
```python
metadata = client.get_dataset_metadata()
```
#### Get a dummy dataset
Based on the public metadata of the dataset, a random dataframe can be created. By default, there will be 100 rows and the seed is set to 42 to ensure reproducibility, but these 2 variables can be changed to obtain different dummy datasets.
Getting a dummy dataset does not affect the budget as there is no differential privacy here. It is not a synthetic dataset and all that could be learn here is already present in the public metadata (it is created randomly on the fly based on the metadata).
```python
df_dummy = client.get_dummy_dataset(nb_rows = 200, seed = 1)
```
#### Query smartnoise-sql
She can query on the sensitive dataset using smartnoise-sql library in the back-end with the following method:
```python
response = client.smartnoise_sql_query(
query = ""SELECT COUNT(*) AS nb_penguins FROM df"",
epsilon = 0.1,
delta = 0.00001,
dummy = False # Optionnal
)
```
To query on a dummy dataset for testing purposes she can set the dummy flag to True (see notebooks or white paper for further explanations).
NOTE: the 'FROM' of the SQL query must be followed by 'df' for the command to work.
#### Get smartnoise-sql query cost
In SmartnoiseSQL, the budget that will by used by a query might be different than what is asked by the user. The estimate cost function returns the estimated real cost of any query.
```python
real_cost_epsilon, real_cost_delta = client.estimate_smartnoise_sql_cost(
query = "SELECT COUNT(*) AS nb_penguins FROM df",
epsilon = 0.1,
delta = 0.000001
)
```
Usually real_cost_epsilon > input_epsilon and real_cost_delta > delta.
NOTE: the 'FROM' of the SQL query must be followed by 'df' for the command to work.
#### Query opendp
She can query on the sensitive dataset using opendp library in the back-end with the following method:
```python
import opendp as dp
import opendp.transformations as trans
import opendp.measurements as meas
pipeline = (
trans.make_split_dataframe(separator=",", col_names=columns) >>
trans.make_select_column(key="bill_length_mm", TOA=str) >>
trans.then_cast_default(TOA=float) >>
trans.then_clamp(bounds=(bill_length_min, bill_length_max)) >>
trans.then_resize(size=nb_penguins.tolist(), constant=avg_bill_length) >>
trans.then_variance() >>
meas.then_laplace(scale=5.0)
)
result = client.opendp_query(
opendp_pipeline = pipeline,
)
```
Similarly as in Smartnoise-sql, to query on a dummy dataset for testing purposes she can set the summy flag to True (see notebooks or white paper for further explanations).
#### Get opendp query cost
The budget that will by used by a query is usually not expressed in the epsilon, delta format used in the server. For instance, in the pipeline exemple above the noise is expressed as `meas.then_laplace(scale=5.0)`. It can be converted in term of the epsilon and delta cost with the function below:
```python
real_cost_epsilon, real_cost_delta = client.estimate_opendp_cost(opendp_pipeline = pipeline)
```
#### Get budget information
There are various functions for the user to track her budget:
- get\_initial\_budget() retrieves the initial budget that was allocated to her by the platform administrator.
- get\_total\_spent\_budget() provides the total amount spent from the budget (accumulated from all previous queries).
- get\_remaining\_budget() returns the remaining budget available for future queries. It is the difference between the initial budget and the total spent budget.
Each of these functionalities return two values, one for epsilon and one for delta.
```python
initial_epsilon, initial_delta = client.get_initial_budget()
total_spent_epsilon, total_spent_delta = client.get_total_spent_budget()
remaining_epsilon, remaining_delta = client.get_remaining_budget()
```
#### Get archives
All queries that are made on the sensitive data are kept in a secure database. With a function call she can see her queries, budget spent and associated responses.
```python
previous_queries = client.get_previous_queries()
```
### Examples
To see detailed examples of the library, many notebooks are available in the [client](https://github.com/dscc-admin-ch/lomas/tree/master/client/notebooks) folder. For instance, refer to [Demo_Client_Notebook.ipynb](https://github.com/dscc-admin-ch/lomas/blob/master/client/notebooks/Demo_Client_Notebook.ipynb).
### More detailed documentation
More detailed documentation is available on [this GitHub Page](https://dscc-admin-ch.github.io/lomas-docs/).
Raw data
{
"_id": null,
"home_page": "https://github.com/dscc-admin/lomas/",
"name": "lomas-client",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.11",
"maintainer_email": null,
"keywords": "differential privacy, DP, diffprivlib, logger, opendp, privacy, smartnoise-sql, smartnoise-synth",
"author": "Data Science Competence Center, Swiss Federal Statistical Office",
"author_email": "dscc@bfs.admin.ch",
"download_url": "https://files.pythonhosted.org/packages/ef/63/a83b5c15c707da054e6870000955cc2379d3aa0086a748e88a412956faab/lomas_client-0.3.5.tar.gz",
"platform": null,
"description": "[![PyPi version](https://img.shields.io/pypi/v/lomas_client.svg)](https://pypi.org/project/lomas_client/)\n[![PyPi status](https://img.shields.io/pypi/status/lomas_client.svg)](https://pypi.org/project/lomas_client/)\n[![Python versions](https://img.shields.io/pypi/pyversions/lomas_client.svg)](https://pypi.org/project/lomas_client/)\n\n<h1 align=\"center\">\n<picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"https://github.com/dscc-admin-ch/lomas/blob/wip_322_darkmode-logo/images/lomas_logo_darkmode_txt.png\" width=\"300\">\n <source media=\"(prefers-color-scheme: light)\" srcset=\"https://github.com/dscc-admin-ch/lomas/blob/wip_322_darkmode-logo/images/lomas_logo_txt.png\" width=\"300\">\n <img alt=\"Lomas\" src=\"https://github.com/dscc-admin-ch/lomas/blob/wip_322_darkmode-logo/images/lomas_logo_txt.png\">\n</picture>\n</h1><br>\n\n\n# Lomas Client\n\nThe `lomas_client` library is a client to interact with the Lomas server.\n\nUtilizing this client library is strongly advised for querying and interacting with the server, as it takes care of all the necessary tasks such as serialization, deserialization, REST API calls, and ensures the correct installation of other required libraries. In short, it enables a seamless interaction with the server.\n\n### Installation\nIt can be installed with the command:\n```python\npip install lomas_client\n```\n\n### Simple introduction to clien use\n\n#### Creat Client object:\nOnce the library is installed, a Client object must be created. To create the client, the user needs to give it a few parameters:\n- a url: the root application endpoint to the remote secure server.\n- a user_name: her name as registered in the database (Emilie)\n- a dataset_name: the name of the dataset that she wants to query (PENGUIN)\n\n```python\nfrom lomas_client.client.client import Client\nclient = Client(url=\"http://lomas_server_dev:80\", user_name = \"Emilie\", dataset_name = \"PENGUIN\")\n```\nOnce `client` is initialized it can be used to send requests to respective DP frameworks.\n\n#### Get metadata\nMetadata information aout the dataset can be accessed in a format based on SmartnoiseSQL dictionary format, where among other, there is information about all the available columns, their type, bound values (see Smartnoise page for more details). Any metadata is required for Smartnoise-SQL is also required here and additional information such that the different categories in a string type column column can be added.\n\n```python\nmetadata = client.get_dataset_metadata()\n```\n\n#### Get a dummy dataset\nBased on the public metadata of the dataset, a random dataframe can be created. By default, there will be 100 rows and the seed is set to 42 to ensure reproducibility, but these 2 variables can be changed to obtain different dummy datasets.\nGetting a dummy dataset does not affect the budget as there is no differential privacy here. It is not a synthetic dataset and all that could be learn here is already present in the public metadata (it is created randomly on the fly based on the metadata).\n\n```python\ndf_dummy = client.get_dummy_dataset(nb_rows = 200, seed = 1)\n```\n\n#### Query smartnoise-sql\nShe can query on the sensitive dataset using smartnoise-sql library in the back-end with the following method:\n```python\nresponse = client.smartnoise_sql_query(\n query = \"\"SELECT COUNT(*) AS nb_penguins FROM df\"\", \n epsilon = 0.1, \n delta = 0.00001,\n dummy = False # Optionnal\n)\n```\nTo query on a dummy dataset for testing purposes she can set the dummy flag to True (see notebooks or white paper for further explanations).\nNOTE: the 'FROM' of the SQL query must be followed by 'df' for the command to work.\n\n#### Get smartnoise-sql query cost\nIn SmartnoiseSQL, the budget that will by used by a query might be different than what is asked by the user. The estimate cost function returns the estimated real cost of any query.\n```python\nreal_cost_epsilon, real_cost_delta = client.estimate_smartnoise_sql_cost(\n query = \"SELECT COUNT(*) AS nb_penguins FROM df\", \n epsilon = 0.1, \n delta = 0.000001\n)\n```\nUsually real_cost_epsilon > input_epsilon and real_cost_delta > delta.\nNOTE: the 'FROM' of the SQL query must be followed by 'df' for the command to work.\n\n\n#### Query opendp\nShe can query on the sensitive dataset using opendp library in the back-end with the following method:\n```python\nimport opendp as dp\nimport opendp.transformations as trans\nimport opendp.measurements as meas\n\npipeline = (\n trans.make_split_dataframe(separator=\",\", col_names=columns) >>\n trans.make_select_column(key=\"bill_length_mm\", TOA=str) >>\n trans.then_cast_default(TOA=float) >>\n trans.then_clamp(bounds=(bill_length_min, bill_length_max)) >>\n trans.then_resize(size=nb_penguins.tolist(), constant=avg_bill_length) >>\n trans.then_variance() >>\n meas.then_laplace(scale=5.0)\n)\nresult = client.opendp_query(\n opendp_pipeline = pipeline, \n)\n```\n\nSimilarly as in Smartnoise-sql, to query on a dummy dataset for testing purposes she can set the summy flag to True (see notebooks or white paper for further explanations).\n\n#### Get opendp query cost\nThe budget that will by used by a query is usually not expressed in the epsilon, delta format used in the server. For instance, in the pipeline exemple above the noise is expressed as `meas.then_laplace(scale=5.0)`. It can be converted in term of the epsilon and delta cost with the function below:\n```python\nreal_cost_epsilon, real_cost_delta = client.estimate_opendp_cost(opendp_pipeline = pipeline)\n```\n\n\n#### Get budget information\nThere are various functions for the user to track her budget:\n- get\\_initial\\_budget() retrieves the initial budget that was allocated to her by the platform administrator.\n- get\\_total\\_spent\\_budget() provides the total amount spent from the budget (accumulated from all previous queries).\n- get\\_remaining\\_budget() returns the remaining budget available for future queries. It is the difference between the initial budget and the total spent budget.\nEach of these functionalities return two values, one for epsilon and one for delta.\n\n```python\ninitial_epsilon, initial_delta = client.get_initial_budget()\ntotal_spent_epsilon, total_spent_delta = client.get_total_spent_budget()\nremaining_epsilon, remaining_delta = client.get_remaining_budget()\n```\n\n\n#### Get archives\nAll queries that are made on the sensitive data are kept in a secure database. With a function call she can see her queries, budget spent and associated responses.\n\n```python\nprevious_queries = client.get_previous_queries()\n```\n\n\n### Examples\nTo see detailed examples of the library, many notebooks are available in the [client](https://github.com/dscc-admin-ch/lomas/tree/master/client/notebooks) folder. For instance, refer to [Demo_Client_Notebook.ipynb](https://github.com/dscc-admin-ch/lomas/blob/master/client/notebooks/Demo_Client_Notebook.ipynb).\n\n\n### More detailed documentation\nMore detailed documentation is available on [this GitHub Page](https://dscc-admin-ch.github.io/lomas-docs/). \n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A client to interact with the Lomas server.",
"version": "0.3.5",
"project_urls": {
"Homepage": "https://github.com/dscc-admin/lomas/"
},
"split_keywords": [
"differential privacy",
" dp",
" diffprivlib",
" logger",
" opendp",
" privacy",
" smartnoise-sql",
" smartnoise-synth"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ef63a83b5c15c707da054e6870000955cc2379d3aa0086a748e88a412956faab",
"md5": "5b6dd6502e73898f8d594b3af041c66e",
"sha256": "bd4f3c439278977308beb39606cd483352e1192f19baae8dcabd3852447fa681"
},
"downloads": -1,
"filename": "lomas_client-0.3.5.tar.gz",
"has_sig": false,
"md5_digest": "5b6dd6502e73898f8d594b3af041c66e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.11",
"size": 14641,
"upload_time": "2024-10-30T16:38:51",
"upload_time_iso_8601": "2024-10-30T16:38:51.310595Z",
"url": "https://files.pythonhosted.org/packages/ef/63/a83b5c15c707da054e6870000955cc2379d3aa0086a748e88a412956faab/lomas_client-0.3.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-30 16:38:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dscc-admin",
"github_project": "lomas",
"github_not_found": true,
"lcname": "lomas-client"
}