datadict-toolbox

Name	datadict-toolbox JSON
Version	1.0.1 JSON
	download
home_page	https://github.com/LauNop/datadict-toolbox
Summary	A package to build a data dictionary from .xmla MS SQL SERVER file and select SQL query.
upload_time	2023-09-24 14:03:21
maintainer
docs_url	None
author	Laurent NOPOLY
requires_python
license	MIT
keywords	python data dictionary governance sql
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Dictionary_ToolBox (v1.0.1)

Dictionary_ToolBox is a python package containing classes for building a tabular, multidimensional cube data dictionary and SQL select queries using OpenAI's GPT templates.

This package processes `.xmla` files generated by MS SQL SERVER software. 

Extracts SQL queries from `.dtsx` files.

Generates a partial data dictionary from a SQL select query.

## Installation
***
You can install `datadict_toolbox` using `pip` from the Python Package Index (PyPI).

```bash
pip install datadict_toolbox
```

The package is hosted on PyPI and can be found at <https://pypi.org/project/datadict-toolbox/>. You can visit the link for more information about the package and its available versions.

## Setup
***
Before you start using DataDict Toolbox, you need to set up a suitable Python environment. We recommend using a virtual environment (venv). Here's how you can set this up:

1. Install a Python  Interpreter (3.9) minimum. You can download it from the official Python website. Make sure to allow the installer to set the PATH variables for you.

2. Check your Python version by running `python --version` or `python3 --version` from the command line.

3. Once you have Python, you can create a virtual environment or it's automatically generated in IDE as Pycharm. To do this, run the following command in the root directory of the project:

   ```bash
   python -m venv venv
   ```

   This command creates a new directory named `venv` in your project. This directory will contain the Python executable files and a copy of the pip library which you can use to install other packages within this environment.

4. To start using this environment, you have to activate it.

   On Unix or MacOS, run:

   ```bash
   source venv/bin/activate
   ```

   On Windows, run:

   ```bash
   venv\Scripts\activate
   ```

5. Once the virtual environment is activated, you can install the necessary dependencies. To do this, run the following command:

   ```bash
   pip install -r requirements.txt
   ```

6. Now you should be all set! Remember to activate the venv environment every time you work on the project.

7. Finally, ensure you have an OpenAI API key and organization ID. The API key is necessary for making requests to the OpenAI service. To use them safely you must write them in a `.env` file and use the `python-dotenv` Python Package to get the key and id.

## Usage
***
### Importing the package

The easiest way to use the package is to copy the following codes in your `main.py` file to use the different classes :

#### Class: ExtractorTabularCubeCatalog
##### Create a tabular cube data dictionary from one .xmla file

```python
from datadict_toolbox import ExtractorTabularCubeCatalog
file_path = "paste the path of your .xmla file"
etcc = ExtractorTabularCubeCatalog(file_path) # Generate the data dictionary at the Instance of the class
print(etcc.cube_struct) # Display the python dictionary that contain your data dictionary
etcc.save() # By default it will create a "catalog_name_tabular_nbrtable.xslx" in a "excel_result/" folder automatically created in the root path of your running .py file
```

##### Create several tabular cube data dictionary from a folder of .xmla files

```python
import os
from datadict_toolbox import ExtractorTabularCubeCatalog
folder_path = "paste the path of the folder containing your .xmla files"
files_path = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if
                      os.path.isfile(os.path.join(folder_path, f))]
print("Nbr fichier: ", len(files_path))
for file_path in files_path:
    etcc = ExtractorTabularCubeCatalog(file_path) # Generate the data dictionary at the Instance of the class
    print(etcc.cube_struct) # Display the python dictionary that contain your data dictionary
    etcc.save() # By default it will create a "catalog_name_tabular_nbrtable.xslx" in a "excel_result/" folder automatically created in the root path of your running .py file
```
You can change the save path and the filename of the `.xlsx` file by changing the `save_path` and `filename` parameters.

#### Class: ExtractorMultidimCubeCatalog
##### Create a multidimensional cube data dictionary from one .xmla file

```python
from datadict_toolbox import ExtractorMultidimCubeCatalog
file_path = "paste the path of your .xmla file"
emcc = ExtractorMultidimCubeCatalog(file_path) # Generate the data dictionary at the Instance of the class
print(emcc.cube_struct) # Display the python dictionary that contain your data dictionary
emcc.save() # By default it will create a "catalog_name_tabular_nbrtable.xslx" in a "excel_result/" folder automatically created in the root path of your running .py file
```

##### Create several multidimensional cube data dictionary from a folder of .xmla files

```python
import os
from datadict_toolbox import ExtractorMultidimCubeCatalog
folder_path = "paste the path of the folder containing your .xmla files"
files_path = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if
                      os.path.isfile(os.path.join(folder_path, f))]
print("Nbr fichier: ", len(files_path))
for file_path in files_path:
    emcc = ExtractorMultidimCubeCatalog(file_path) # Generate the data dictionary at the Instance of the class
    print(emcc.cube_struct) # Display the python dictionary that contain your data dictionary
    emcc.save() # By default it will create a "catalog_name_tabular_nbrtable.xslx" in a "excel_result/" folder automatically created in the root path of your running .py file
```
You can change the save path and the filename of the `.xlsx` file by changing the `save_path` and `filename` parameters.

#### Class: SelectGPTDeduce
```python
import os
from dotenv import load_dotenv
from datadict_toolbox import SelectGPTDeduce

load_dotenv("path of your .env file relative to your root path")  # load .env file
openai_api_key = os.getenv("OPENAI_API_KEY")  # get OPENAI_KEY value from .env file{}
openai_org_id = os.getenv("OPENAI_ORG_ID")
select_query="paste your SQL select query here"

deduce = SelectGPTDeduce(openai_org_id, openai_api_key, select_query)
print(deduce.select_data_dict) # Display the python dictionary that contain your data dictionary
deduce.save()
```
The `save()` method is not configurable in this version.

#### File: extract_from_dtsx.py

The most usefull function is the `extract_erp_query` which extract the SQL query from a `.dtsx` file.
```python
import os
from datadict_toolbox import extract_erp_query
folder_path = "paste the path of the folder containing your .xmla files"
files_path = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if
                      os.path.isfile(os.path.join(folder_path, f))]
queries = extract_erp_query(files_path)['SQL_QUERY']
```
This function takes a list as argument and outputs a dictionary of 2 key `DEST_TABLE` and `SQL_QUERY`.
You can fill in the `SelectGPTDeduce` class the outputs of the `extract_erp_query`.

### CLI
***
No command for this package

## Configuration
***
if you don't have a `.env` file create it and put 2 important variables
```txt
OPENAI_API_KEY="your api key"
OPENAI_ORG_ID="your organization id"
```


The assessment aims to provide comprehensive information that can help a new developer understand the purpose and functionality of the code, as well as areas that could potentially be refactored or optimized.

## Goal
***
This package was done for a internship project. It will probably not be maintained in the future.

## License
***
This project is licensed under the MIT License. See the LICENSE file for details.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/LauNop/datadict-toolbox",
    "name": "datadict-toolbox",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,data,dictionary,governance,SQL",
    "author": "Laurent NOPOLY",
    "author_email": "laurentnopoly@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/3d/75/9d395f8d8128f380a7942a8e9955cf63025a32cf2cb6b34140f0ecce80c2/datadict-toolbox-1.0.1.tar.gz",
    "platform": null,
    "description": "# Dictionary_ToolBox (v1.0.1)\r\n\r\nDictionary_ToolBox is a python package containing classes for building a tabular, multidimensional cube data dictionary and SQL select queries using OpenAI's GPT templates.\r\n\r\nThis package processes `.xmla` files generated by MS SQL SERVER software. \r\n\r\nExtracts SQL queries from `.dtsx` files.\r\n\r\nGenerates a partial data dictionary from a SQL select query.\r\n\r\n## Installation\r\n***\r\nYou can install `datadict_toolbox` using `pip` from the Python Package Index (PyPI).\r\n\r\n```bash\r\npip install datadict_toolbox\r\n```\r\n\r\nThe package is hosted on PyPI and can be found at <https://pypi.org/project/datadict-toolbox/>. You can visit the link for more information about the package and its available versions.\r\n\r\n## Setup\r\n***\r\nBefore you start using DataDict Toolbox, you need to set up a suitable Python environment. We recommend using a virtual environment (venv). Here's how you can set this up:\r\n\r\n1. Install a Python  Interpreter (3.9) minimum. You can download it from the official Python website. Make sure to allow the installer to set the PATH variables for you.\r\n\r\n2. Check your Python version by running `python --version` or `python3 --version` from the command line.\r\n\r\n3. Once you have Python, you can create a virtual environment or it's automatically generated in IDE as Pycharm. To do this, run the following command in the root directory of the project:\r\n\r\n   ```bash\r\n   python -m venv venv\r\n   ```\r\n\r\n   This command creates a new directory named `venv` in your project. This directory will contain the Python executable files and a copy of the pip library which you can use to install other packages within this environment.\r\n\r\n4. To start using this environment, you have to activate it.\r\n\r\n   On Unix or MacOS, run:\r\n\r\n   ```bash\r\n   source venv/bin/activate\r\n   ```\r\n\r\n   On Windows, run:\r\n\r\n   ```bash\r\n   venv\\Scripts\\activate\r\n   ```\r\n\r\n5. Once the virtual environment is activated, you can install the necessary dependencies. To do this, run the following command:\r\n\r\n   ```bash\r\n   pip install -r requirements.txt\r\n   ```\r\n\r\n6. Now you should be all set! Remember to activate the venv environment every time you work on the project.\r\n\r\n7. Finally, ensure you have an OpenAI API key and organization ID. The API key is necessary for making requests to the OpenAI service. To use them safely you must write them in a `.env` file and use the `python-dotenv` Python Package to get the key and id.\r\n\r\n## Usage\r\n***\r\n### Importing the package\r\n\r\nThe easiest way to use the package is to copy the following codes in your `main.py` file to use the different classes :\r\n\r\n#### Class: ExtractorTabularCubeCatalog\r\n##### Create a tabular cube data dictionary from one .xmla file\r\n\r\n```python\r\nfrom datadict_toolbox import ExtractorTabularCubeCatalog\r\nfile_path = \"paste the path of your .xmla file\"\r\netcc = ExtractorTabularCubeCatalog(file_path) # Generate the data dictionary at the Instance of the class\r\nprint(etcc.cube_struct) # Display the python dictionary that contain your data dictionary\r\netcc.save() # By default it will create a \"catalog_name_tabular_nbrtable.xslx\" in a \"excel_result/\" folder automatically created in the root path of your running .py file\r\n```\r\n\r\n##### Create several tabular cube data dictionary from a folder of .xmla files\r\n\r\n```python\r\nimport os\r\nfrom datadict_toolbox import ExtractorTabularCubeCatalog\r\nfolder_path = \"paste the path of the folder containing your .xmla files\"\r\nfiles_path = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if\r\n                      os.path.isfile(os.path.join(folder_path, f))]\r\nprint(\"Nbr fichier: \", len(files_path))\r\nfor file_path in files_path:\r\n    etcc = ExtractorTabularCubeCatalog(file_path) # Generate the data dictionary at the Instance of the class\r\n    print(etcc.cube_struct) # Display the python dictionary that contain your data dictionary\r\n    etcc.save() # By default it will create a \"catalog_name_tabular_nbrtable.xslx\" in a \"excel_result/\" folder automatically created in the root path of your running .py file\r\n```\r\nYou can change the save path and the filename of the `.xlsx` file by changing the `save_path` and `filename` parameters.\r\n\r\n#### Class: ExtractorMultidimCubeCatalog\r\n##### Create a multidimensional cube data dictionary from one .xmla file\r\n\r\n```python\r\nfrom datadict_toolbox import ExtractorMultidimCubeCatalog\r\nfile_path = \"paste the path of your .xmla file\"\r\nemcc = ExtractorMultidimCubeCatalog(file_path) # Generate the data dictionary at the Instance of the class\r\nprint(emcc.cube_struct) # Display the python dictionary that contain your data dictionary\r\nemcc.save() # By default it will create a \"catalog_name_tabular_nbrtable.xslx\" in a \"excel_result/\" folder automatically created in the root path of your running .py file\r\n```\r\n\r\n##### Create several multidimensional cube data dictionary from a folder of .xmla files\r\n\r\n```python\r\nimport os\r\nfrom datadict_toolbox import ExtractorMultidimCubeCatalog\r\nfolder_path = \"paste the path of the folder containing your .xmla files\"\r\nfiles_path = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if\r\n                      os.path.isfile(os.path.join(folder_path, f))]\r\nprint(\"Nbr fichier: \", len(files_path))\r\nfor file_path in files_path:\r\n    emcc = ExtractorMultidimCubeCatalog(file_path) # Generate the data dictionary at the Instance of the class\r\n    print(emcc.cube_struct) # Display the python dictionary that contain your data dictionary\r\n    emcc.save() # By default it will create a \"catalog_name_tabular_nbrtable.xslx\" in a \"excel_result/\" folder automatically created in the root path of your running .py file\r\n```\r\nYou can change the save path and the filename of the `.xlsx` file by changing the `save_path` and `filename` parameters.\r\n\r\n#### Class: SelectGPTDeduce\r\n```python\r\nimport os\r\nfrom dotenv import load_dotenv\r\nfrom datadict_toolbox import SelectGPTDeduce\r\n\r\nload_dotenv(\"path of your .env file relative to your root path\")  # load .env file\r\nopenai_api_key = os.getenv(\"OPENAI_API_KEY\")  # get OPENAI_KEY value from .env file{}\r\nopenai_org_id = os.getenv(\"OPENAI_ORG_ID\")\r\nselect_query=\"paste your SQL select query here\"\r\n\r\ndeduce = SelectGPTDeduce(openai_org_id, openai_api_key, select_query)\r\nprint(deduce.select_data_dict) # Display the python dictionary that contain your data dictionary\r\ndeduce.save()\r\n```\r\nThe `save()` method is not configurable in this version.\r\n\r\n#### File: extract_from_dtsx.py\r\n\r\nThe most usefull function is the `extract_erp_query` which extract the SQL query from a `.dtsx` file.\r\n```python\r\nimport os\r\nfrom datadict_toolbox import extract_erp_query\r\nfolder_path = \"paste the path of the folder containing your .xmla files\"\r\nfiles_path = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if\r\n                      os.path.isfile(os.path.join(folder_path, f))]\r\nqueries = extract_erp_query(files_path)['SQL_QUERY']\r\n```\r\nThis function takes a list as argument and outputs a dictionary of 2 key `DEST_TABLE` and `SQL_QUERY`.\r\nYou can fill in the `SelectGPTDeduce` class the outputs of the `extract_erp_query`.\r\n\r\n### CLI\r\n***\r\nNo command for this package\r\n\r\n## Configuration\r\n***\r\nif you don't have a `.env` file create it and put 2 important variables\r\n```txt\r\nOPENAI_API_KEY=\"your api key\"\r\nOPENAI_ORG_ID=\"your organization id\"\r\n```\r\n\r\n\r\nThe assessment aims to provide comprehensive information that can help a new developer understand the purpose and functionality of the code, as well as areas that could potentially be refactored or optimized.\r\n\r\n## Goal\r\n***\r\nThis package was done for a internship project. It will probably not be maintained in the future.\r\n\r\n## License\r\n***\r\nThis project is licensed under the MIT License. See the LICENSE file for details.\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A package to build a data dictionary from .xmla MS SQL SERVER file and select SQL query.",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/LauNop/datadict-toolbox"
    },
    "split_keywords": [
        "python",
        "data",
        "dictionary",
        "governance",
        "sql"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f71b758722d7c3f1e350d3c266ce1b37d874868593be127522337eb0c00cbb68",
                "md5": "a89b4f5842c3642dcb1bc4329d2b5571",
                "sha256": "585f1aba1548375233a308bf47b3ec7a26efb00ee20db384e1ca450de91b6154"
            },
            "downloads": -1,
            "filename": "datadict_toolbox-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a89b4f5842c3642dcb1bc4329d2b5571",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 17752,
            "upload_time": "2023-09-24T14:03:18",
            "upload_time_iso_8601": "2023-09-24T14:03:18.824355Z",
            "url": "https://files.pythonhosted.org/packages/f7/1b/758722d7c3f1e350d3c266ce1b37d874868593be127522337eb0c00cbb68/datadict_toolbox-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3d759d395f8d8128f380a7942a8e9955cf63025a32cf2cb6b34140f0ecce80c2",
                "md5": "64ac3ec368a10095b3d2e8ea158f567b",
                "sha256": "7cb07e3df7dc3277ab5808c3f989d36d22d3641cb05968f83a8788406f1f2f15"
            },
            "downloads": -1,
            "filename": "datadict-toolbox-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "64ac3ec368a10095b3d2e8ea158f567b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 18359,
            "upload_time": "2023-09-24T14:03:21",
            "upload_time_iso_8601": "2023-09-24T14:03:21.360184Z",
            "url": "https://files.pythonhosted.org/packages/3d/75/9d395f8d8128f380a7942a8e9955cf63025a32cf2cb6b34140f0ecce80c2/datadict-toolbox-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-24 14:03:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LauNop",
    "github_project": "datadict-toolbox",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "datadict-toolbox"
}

Laurent NOPOLY