data-harmonization-ai-dp


Namedata-harmonization-ai-dp JSON
Version 2.0.0 PyPI version JSON
download
home_page
SummaryCreate data quality rules and apply them to datasets.
upload_time2023-06-01 12:34:41
maintainer
docs_urlNone
authorHimanshu
requires_python>=3.8
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Data Harmonization Package

This package provides functionality for file merging and data harmonization using various algorithms and AI models.
It allows you to merge CSV files and harmonize data based on a sample-based approach using GPT-based models.

## Installation

You can install the package from PyPI using pip:

	pip install data-harmonization-ai


## Usage

### DataHarmonizer Class

The `DataHarmonizer` class provides the capability to merge CSV files based on different options.
It supports the following merge options:

- ChatGPT
- GPT4
- Fuzzy Wuzzy
- Rapidfuzz
- Jaro Winkler
- JW Layered with ChatGPT
- JW Layered with GPT4
- FW Layered with GPT4
- Recursive Data Harmonization

Example usage:

from utility import DataHarmonizer

# Create an instance of DataHarmonizer
key='openai-key'
harmonizer = DataHarmonizer(key,'file1.csv', 'file2.csv', 'ChatGPT')

# Merge the files based on the specified option
result = harmonizer.merge_files()

print(result)

### DataHarmonizationWithSuggestion Class

# The DataHarmonizationWithSuggestion class allows you to harmonize data using a sample-based approach.
 It takes a sample file and two data files as input.

Example usage:
from utility import DataHarmonizationWithSuggestion

# Create an instance of DataHarmonizationWithSuggestion

key = 'openai-key'
harmonizer = DataHarmonizationWithSuggestion(key, "sample_harmonized_data.csv", "file1.csv", "file2.csv")

# Harmonize the data based on the sample
	result = harmonizer.harmonize_data()

print(result)

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "data-harmonization-ai-dp",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "",
    "author": "Himanshu",
    "author_email": "himanshu.tomar@decisionpoint.in",
    "download_url": "https://files.pythonhosted.org/packages/95/83/268df0b072c7e4bfb636ec20a0157faf94427524ea551d11b36c4aa3abe6/data-harmonization-ai-dp-2.0.0.tar.gz",
    "platform": null,
    "description": "# Data Harmonization Package\r\n\r\nThis package provides functionality for file merging and data harmonization using various algorithms and AI models.\r\nIt allows you to merge CSV files and harmonize data based on a sample-based approach using GPT-based models.\r\n\r\n## Installation\r\n\r\nYou can install the package from PyPI using pip:\r\n\r\n\tpip install data-harmonization-ai\r\n\r\n\r\n## Usage\r\n\r\n### DataHarmonizer Class\r\n\r\nThe `DataHarmonizer` class provides the capability to merge CSV files based on different options.\r\nIt supports the following merge options:\r\n\r\n- ChatGPT\r\n- GPT4\r\n- Fuzzy Wuzzy\r\n- Rapidfuzz\r\n- Jaro Winkler\r\n- JW Layered with ChatGPT\r\n- JW Layered with GPT4\r\n- FW Layered with GPT4\r\n- Recursive Data Harmonization\r\n\r\nExample usage:\r\n\r\nfrom utility import DataHarmonizer\r\n\r\n# Create an instance of DataHarmonizer\r\nkey='openai-key'\r\nharmonizer = DataHarmonizer(key,'file1.csv', 'file2.csv', 'ChatGPT')\r\n\r\n# Merge the files based on the specified option\r\nresult = harmonizer.merge_files()\r\n\r\nprint(result)\r\n\r\n### DataHarmonizationWithSuggestion Class\r\n\r\n# The DataHarmonizationWithSuggestion class allows you to harmonize data using a sample-based approach.\r\n It takes a sample file and two data files as input.\r\n\r\nExample usage:\r\nfrom utility import DataHarmonizationWithSuggestion\r\n\r\n# Create an instance of DataHarmonizationWithSuggestion\r\n\r\nkey = 'openai-key'\r\nharmonizer = DataHarmonizationWithSuggestion(key, \"sample_harmonized_data.csv\", \"file1.csv\", \"file2.csv\")\r\n\r\n# Harmonize the data based on the sample\r\n\tresult = harmonizer.harmonize_data()\r\n\r\nprint(result)\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Create data quality rules and apply them to datasets.",
    "version": "2.0.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "482f8482b1087e50f0473a16f36db469fd37e0d43b96e1355858be964e8159fe",
                "md5": "d9fee21997a1c209e6913bedc71860ce",
                "sha256": "2036f5b2e142e96b8674f8ac493076c8a1ab945893673887b7c4508d6acc9ace"
            },
            "downloads": -1,
            "filename": "data_harmonization_ai_dp-2.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d9fee21997a1c209e6913bedc71860ce",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 18569,
            "upload_time": "2023-06-01T12:34:40",
            "upload_time_iso_8601": "2023-06-01T12:34:40.055474Z",
            "url": "https://files.pythonhosted.org/packages/48/2f/8482b1087e50f0473a16f36db469fd37e0d43b96e1355858be964e8159fe/data_harmonization_ai_dp-2.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9583268df0b072c7e4bfb636ec20a0157faf94427524ea551d11b36c4aa3abe6",
                "md5": "fa18d63523779d87a772f3d1503001fd",
                "sha256": "dd8d1c41500c16b7f57350364e66ed3f03c678833f52150860261c3b7dd08f65"
            },
            "downloads": -1,
            "filename": "data-harmonization-ai-dp-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "fa18d63523779d87a772f3d1503001fd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 10562,
            "upload_time": "2023-06-01T12:34:41",
            "upload_time_iso_8601": "2023-06-01T12:34:41.670061Z",
            "url": "https://files.pythonhosted.org/packages/95/83/268df0b072c7e4bfb636ec20a0157faf94427524ea551d11b36c4aa3abe6/data-harmonization-ai-dp-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-01 12:34:41",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "data-harmonization-ai-dp"
}
        
Elapsed time: 0.07801s