qurix-dataframe-anonymizer


Namequrix-dataframe-anonymizer JSON
Version 0.2.0 PyPI version JSON
download
home_pagehttps://github.com/qurixtechnology/qurix-dataframe-anonymizer.git
Summaryqurix dataframe anonymizer for kafka
upload_time2023-11-22 11:26:36
maintainer
docs_urlNone
authorqurix Technology
requires_python>=3.10, <4
license
keywords python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # KafkaAnonymizer

## What is it?

The KafkaAnonymizer, powered by the Qurix Dataframe Anonymizer, is a Python package designed for anonymizing data within Kafka streams. Achieve data privacy compliance and protect sensitive information in real-time data pipelines.

## Main Features

1. Descriptive Statistics:
Generate comprehensive descriptive statistics for each column, including mean, min, max, frequency, unique values, and data type.

2. Anonymization Techniques:
Anonymize diverse data types such as float, int, string, and date columns based on statistical properties like mean, standard deviation, count, min, and max values.

3. Configurability:
Customize the anonymization process using white and blacklists, providing fine-grained control over which columns to include or exclude from anonymization.

4. String Anonymization Providers:
Support various string anonymization strategies, including generic text, gender, addresses, and person names.
Specify preferred string anonymization providers for each string column.

5. Dataframe Anonymization:
Anonymize entire DataFrames using the anonymize_dataframe method.
Flexibility to choose specific columns for anonymization through white and blacklists.

6. Randomization and Shuffling:
Utilize randomization techniques to generate synthetic data, ensuring representative yet anonymized information.
Implement shuffling mechanisms for randomizing string values during the anonymization process.

7. Data Type Handling:
Handle different data types (float, int, object, datetime) with dedicated anonymization logic for each type.

## Requirements

- `confluent-kafka`

You can install these dependencies manually or use the provided `requirement.txt` file in the repository.

## Installation

1. Create a New Virtual Environment (named `.venv` in this case):

```bash
python3 -m venv venv
```

2. Activate the Virtual Environment:

```bash
source venv/bin/activate
```

3. Install the Package:

To install the `qurix-dataframe-anonymizer` package, use `pip`:

```bash
pip install qurix-dataframe-anonymizer
```

## Usage

### Dataframe anonymizer

Anonymize dataframes using the DataframeAnonymizer class:

```python
import pandas as pd
from qurix.dataframe.anonymizer import DataframeAnonymizer, AnonymizeStrProvider

df = pd.read_csv("<my_csv_file.csv>")

anonymizer = DataframeAnonymizer()
df_anonymized = anonymizer.anonymize_dataframe(df)
df_anonymized.head()

# Dictionary specifiying specific anonymizer string for a particular column, e.g. GENDER, NAMES
anonymize_str_map = {
    "Sex": AnonymizeStrProvider.GENDER,
    "Name": AnonymizeStrProvider.PERSON_NAME
}

# Anonymize
df_anonymized = anonymizer.anonymize_dataframe(df, anonymize_str_map)
df_anonymized.head()

#For more advanced usage and customization, explore additional parameters in the anonymize_dataframe method, such as white and blacklists.

anonymized_df = anonymizer.anonymize_dataframe(df, white_list=["column1"], black_list=["column2"])

# Specify string anonymization providers
anonymized_df = anonymizer.anonymize_dataframe(df, anonymize_str_map={"column3": "gender"})
```

## Contact

For any inquiries or questions, feel free [reach out](https://qurix.tech/about_us.html).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/qurixtechnology/qurix-dataframe-anonymizer.git",
    "name": "qurix-dataframe-anonymizer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10, <4",
    "maintainer_email": "",
    "keywords": "python",
    "author": "qurix Technology",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/10/ef/5a175fd0a8eaadacbbad07c50601f63b2593eaad5fad08d000a200ef9933/qurix-dataframe-anonymizer-0.2.0.tar.gz",
    "platform": null,
    "description": "# KafkaAnonymizer\n\n## What is it?\n\nThe KafkaAnonymizer, powered by the Qurix Dataframe Anonymizer, is a Python package designed for anonymizing data within Kafka streams. Achieve data privacy compliance and protect sensitive information in real-time data pipelines.\n\n## Main Features\n\n1. Descriptive Statistics:\nGenerate comprehensive descriptive statistics for each column, including mean, min, max, frequency, unique values, and data type.\n\n2. Anonymization Techniques:\nAnonymize diverse data types such as float, int, string, and date columns based on statistical properties like mean, standard deviation, count, min, and max values.\n\n3. Configurability:\nCustomize the anonymization process using white and blacklists, providing fine-grained control over which columns to include or exclude from anonymization.\n\n4. String Anonymization Providers:\nSupport various string anonymization strategies, including generic text, gender, addresses, and person names.\nSpecify preferred string anonymization providers for each string column.\n\n5. Dataframe Anonymization:\nAnonymize entire DataFrames using the anonymize_dataframe method.\nFlexibility to choose specific columns for anonymization through white and blacklists.\n\n6. Randomization and Shuffling:\nUtilize randomization techniques to generate synthetic data, ensuring representative yet anonymized information.\nImplement shuffling mechanisms for randomizing string values during the anonymization process.\n\n7. Data Type Handling:\nHandle different data types (float, int, object, datetime) with dedicated anonymization logic for each type.\n\n## Requirements\n\n- `confluent-kafka`\n\nYou can install these dependencies manually or use the provided `requirement.txt` file in the repository.\n\n## Installation\n\n1. Create a New Virtual Environment (named `.venv` in this case):\n\n```bash\npython3 -m venv venv\n```\n\n2. Activate the Virtual Environment:\n\n```bash\nsource venv/bin/activate\n```\n\n3. Install the Package:\n\nTo install the `qurix-dataframe-anonymizer` package, use `pip`:\n\n```bash\npip install qurix-dataframe-anonymizer\n```\n\n## Usage\n\n### Dataframe anonymizer\n\nAnonymize dataframes using the DataframeAnonymizer class:\n\n```python\nimport pandas as pd\nfrom qurix.dataframe.anonymizer import DataframeAnonymizer, AnonymizeStrProvider\n\ndf = pd.read_csv(\"<my_csv_file.csv>\")\n\nanonymizer = DataframeAnonymizer()\ndf_anonymized = anonymizer.anonymize_dataframe(df)\ndf_anonymized.head()\n\n# Dictionary specifiying specific anonymizer string for a particular column, e.g. GENDER, NAMES\nanonymize_str_map = {\n    \"Sex\": AnonymizeStrProvider.GENDER,\n    \"Name\": AnonymizeStrProvider.PERSON_NAME\n}\n\n# Anonymize\ndf_anonymized = anonymizer.anonymize_dataframe(df, anonymize_str_map)\ndf_anonymized.head()\n\n#For more advanced usage and customization, explore additional parameters in the anonymize_dataframe method, such as white and blacklists.\n\nanonymized_df = anonymizer.anonymize_dataframe(df, white_list=[\"column1\"], black_list=[\"column2\"])\n\n# Specify string anonymization providers\nanonymized_df = anonymizer.anonymize_dataframe(df, anonymize_str_map={\"column3\": \"gender\"})\n```\n\n## Contact\n\nFor any inquiries or questions, feel free [reach out](https://qurix.tech/about_us.html).\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "qurix dataframe anonymizer for kafka",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/qurixtechnology/qurix-dataframe-anonymizer.git"
    },
    "split_keywords": [
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1779c1b0527be4b60333d3fa7cbba1d0e88af24ba54b5efc77b2557cfc2b0ae9",
                "md5": "a1bdd47269d9359ec9c33ec1ff4ccf72",
                "sha256": "ef7ef82046ee9c35920c0c1db0b275a3e7d31d80300e91222bf34a0e05d9de4a"
            },
            "downloads": -1,
            "filename": "qurix_dataframe_anonymizer-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a1bdd47269d9359ec9c33ec1ff4ccf72",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10, <4",
            "size": 6884,
            "upload_time": "2023-11-22T11:26:34",
            "upload_time_iso_8601": "2023-11-22T11:26:34.753263Z",
            "url": "https://files.pythonhosted.org/packages/17/79/c1b0527be4b60333d3fa7cbba1d0e88af24ba54b5efc77b2557cfc2b0ae9/qurix_dataframe_anonymizer-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "10ef5a175fd0a8eaadacbbad07c50601f63b2593eaad5fad08d000a200ef9933",
                "md5": "777041dfb890dafa40c3df6c9a3720ff",
                "sha256": "e11f52246b153377d901663cb4380b59cb60539b5f1b2b66cc449b09d2fdbca6"
            },
            "downloads": -1,
            "filename": "qurix-dataframe-anonymizer-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "777041dfb890dafa40c3df6c9a3720ff",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10, <4",
            "size": 6123,
            "upload_time": "2023-11-22T11:26:36",
            "upload_time_iso_8601": "2023-11-22T11:26:36.277482Z",
            "url": "https://files.pythonhosted.org/packages/10/ef/5a175fd0a8eaadacbbad07c50601f63b2593eaad5fad08d000a200ef9933/qurix-dataframe-anonymizer-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-22 11:26:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "qurixtechnology",
    "github_project": "qurix-dataframe-anonymizer",
    "github_not_found": true,
    "lcname": "qurix-dataframe-anonymizer"
}
        
Elapsed time: 0.59460s