NexR-qc


NameNexR-qc JSON
Version 0.0.12 PyPI version JSON
download
home_pagehttps://github.com/mata-1223/NexR_qc
SummaryPYPI package creation written by NexR-qc
upload_time2024-04-12 05:04:40
maintainerNone
docs_urlNone
authormata.lee
requires_python>=3.6
licenseNone
keywords qc nexr mata.lee nexr_qc python python tutorial pypi
VCS
bugtrack_url
requirements et-xmlfile numpy openpyxl pandas python-dateutil pytz six tzdata
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # NexR_qc
[![PyPI version](https://badge.fury.io/py/NexR-qc.svg)](https://badge.fury.io/py/NexR-qc)
<br><br>

## 요구사항
- python >= 3.6
- numpy
- pandas
- openpyxl
<br>

## 설치

### pip 설치
```
#!/bin/bash
pip install NexR_qc
```

### 디렉토리 기본 구성
- documents 하위 항목(테이블정의서, 컬럼정의서, 코드정의서)은 필수 항목은 아니지만, 테이블별 정확한 정보를 얻기위해서 작성되는 문서임 ([Github 링크](https://github.com/mata-1223/NexR_qc)의 document 폴더 내 문서 양식 참고)
- log, output 폴더는 초기에 생성되어 있지않아도 수행 결과로 자동 생성됨
- config.json 파일은 데이터 내 결측값을 커스텀하기 위한 파일로 초기에 생성되어 있지않아도 수행 결과로 자동 생성됨 (결측처리 default 값: "?", "na", "null", "Null", "NULL", " ", "[NULL]")

```
.
├── data/ (optional)
│   ├── 데이터_001.csv
│   ├── 데이터_002.csv
│   ├── 데이터_003.xlsx
│   ├── ...
├── documents/
│   ├── 테이블정의서.xlsx
│   ├── 컬럼정의서.xlsx
│   └── 코드정의서.xlsx
├── log/
│   ├── QualityCheck_yyyymmdd_hhmmss.log
│   ├── ...
├── output/
│   └── QC결과서_yyyymmdd_hhmmss.xlsx
└── config.json
``` 
<br>

## 예제 실행 
```
#!bin/usr/python3
from NexR_qc.QualityCheck import *

# 데이터 불러오기 (데이터 파일 활용 시)
PathDict = {}
PathDict["ROOT"] = os.getcwd()
PathDict["DATA"] = os.path.join(PathDict["ROOT"], "data")  # 데이터 파일이 있는 디렉토리 경로

# 데이터 불러오기 (DB 활용시)
# DB에 적재된 데이터를 데이터프레임 형태로 불러와 하단 DataDict 형태에 맞게 준비

DataDict = {}  # DataDict: 데이터명(key)-데이터프레임(value)로 이루어짐
for path in [i for i in os.listdir(PathDict["DATA"]) if not i.startswith(".")]:
    data_name = os.path.splitext(os.path.basename(path))[0].upper()
    DataDict[data_name] = pd.read_csv(os.path.join(PathDict["DATA"], path))

Process = QualityCheck(DataDict)
Process.data_check()
Process.document_check()
Process.na_check()
Process.run()
Process.save()
```

<br>

## Input / Output 정보

### Input
* 데이터 타입: Dictionary 형태
	* 상세 형상: {data_name1: Dataframe1, data_name2: Dataframe2,…}
		* data_name: 데이터 테이블명 or 데이터 파일명 
		* Dataframe: 데이터를 불러온 Dataframe 형상
* 예시
![NexR_qc_Info_002](https://github.com/mata-1223/NexR_qc/assets/131343466/5e28e8bf-37f2-4cc0-acca-c288bfbd5ccb)

### Output
* 결과 파일 경로: output/QC_결과서.xlsx
* 예시
1) 예시 1: 테이블 리스트 시트
![NexR_qc_Info_003](https://github.com/mata-1223/NexR_qc/assets/131343466/54605ebe-d45c-4ba9-b219-dd177e08a6b7)

2) 예시 2: 데이터 별 QC 수행 결과 시트
![NexR_qc_Info_001](https://github.com/mata-1223/NexR_qc/assets/131343466/a1613944-4812-40a2-9ec3-6452c104a96b)


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mata-1223/NexR_qc",
    "name": "NexR-qc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "qc, NexR, mata.lee, NexR_qc, python, python tutorial, pypi",
    "author": "mata.lee",
    "author_email": "ldh3810@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ac/46/44b1d93a7873c3df83675a820de1936635ace7d3f20761b871bdcd8b4a55/NexR_qc-0.0.12.tar.gz",
    "platform": null,
    "description": "# NexR_qc\n[![PyPI version](https://badge.fury.io/py/NexR-qc.svg)](https://badge.fury.io/py/NexR-qc)\n<br><br>\n\n## \uc694\uad6c\uc0ac\ud56d\n- python >= 3.6\n- numpy\n- pandas\n- openpyxl\n<br>\n\n## \uc124\uce58\n\n### pip \uc124\uce58\n```\n#!/bin/bash\npip install NexR_qc\n```\n\n### \ub514\ub809\ud1a0\ub9ac \uae30\ubcf8 \uad6c\uc131\n- documents \ud558\uc704 \ud56d\ubaa9(\ud14c\uc774\ube14\uc815\uc758\uc11c, \uceec\ub7fc\uc815\uc758\uc11c, \ucf54\ub4dc\uc815\uc758\uc11c)\uc740 \ud544\uc218 \ud56d\ubaa9\uc740 \uc544\ub2c8\uc9c0\ub9cc, \ud14c\uc774\ube14\ubcc4 \uc815\ud655\ud55c \uc815\ubcf4\ub97c \uc5bb\uae30\uc704\ud574\uc11c \uc791\uc131\ub418\ub294 \ubb38\uc11c\uc784 ([Github \ub9c1\ud06c](https://github.com/mata-1223/NexR_qc)\uc758 document \ud3f4\ub354 \ub0b4 \ubb38\uc11c \uc591\uc2dd \ucc38\uace0)\n- log, output \ud3f4\ub354\ub294 \ucd08\uae30\uc5d0 \uc0dd\uc131\ub418\uc5b4 \uc788\uc9c0\uc54a\uc544\ub3c4 \uc218\ud589 \uacb0\uacfc\ub85c \uc790\ub3d9 \uc0dd\uc131\ub428\n- config.json \ud30c\uc77c\uc740 \ub370\uc774\ud130 \ub0b4 \uacb0\uce21\uac12\uc744 \ucee4\uc2a4\ud140\ud558\uae30 \uc704\ud55c \ud30c\uc77c\ub85c \ucd08\uae30\uc5d0 \uc0dd\uc131\ub418\uc5b4 \uc788\uc9c0\uc54a\uc544\ub3c4 \uc218\ud589 \uacb0\uacfc\ub85c \uc790\ub3d9 \uc0dd\uc131\ub428 (\uacb0\uce21\ucc98\ub9ac default \uac12: \"?\", \"na\", \"null\", \"Null\", \"NULL\", \" \", \"[NULL]\")\n\n```\n.\n\u251c\u2500\u2500 data/ (optional)\n\u2502   \u251c\u2500\u2500 \ub370\uc774\ud130_001.csv\n\u2502   \u251c\u2500\u2500 \ub370\uc774\ud130_002.csv\n\u2502   \u251c\u2500\u2500 \ub370\uc774\ud130_003.xlsx\n\u2502   \u251c\u2500\u2500 ...\n\u251c\u2500\u2500 documents/\n\u2502   \u251c\u2500\u2500 \ud14c\uc774\ube14\uc815\uc758\uc11c.xlsx\n\u2502   \u251c\u2500\u2500 \uceec\ub7fc\uc815\uc758\uc11c.xlsx\n\u2502   \u2514\u2500\u2500 \ucf54\ub4dc\uc815\uc758\uc11c.xlsx\n\u251c\u2500\u2500 log/\n\u2502   \u251c\u2500\u2500 QualityCheck_yyyymmdd_hhmmss.log\n\u2502   \u251c\u2500\u2500 ...\n\u251c\u2500\u2500 output/\n\u2502   \u2514\u2500\u2500 QC\uacb0\uacfc\uc11c_yyyymmdd_hhmmss.xlsx\n\u2514\u2500\u2500 config.json\n``` \n<br>\n\n## \uc608\uc81c \uc2e4\ud589 \n```\n#!bin/usr/python3\nfrom NexR_qc.QualityCheck import *\n\n# \ub370\uc774\ud130 \ubd88\ub7ec\uc624\uae30 (\ub370\uc774\ud130 \ud30c\uc77c \ud65c\uc6a9 \uc2dc)\nPathDict = {}\nPathDict[\"ROOT\"] = os.getcwd()\nPathDict[\"DATA\"] = os.path.join(PathDict[\"ROOT\"], \"data\")  # \ub370\uc774\ud130 \ud30c\uc77c\uc774 \uc788\ub294 \ub514\ub809\ud1a0\ub9ac \uacbd\ub85c\n\n# \ub370\uc774\ud130 \ubd88\ub7ec\uc624\uae30 (DB \ud65c\uc6a9\uc2dc)\n# DB\uc5d0 \uc801\uc7ac\ub41c \ub370\uc774\ud130\ub97c \ub370\uc774\ud130\ud504\ub808\uc784 \ud615\ud0dc\ub85c \ubd88\ub7ec\uc640 \ud558\ub2e8 DataDict \ud615\ud0dc\uc5d0 \ub9de\uac8c \uc900\ube44\n\nDataDict = {}  # DataDict: \ub370\uc774\ud130\uba85(key)-\ub370\uc774\ud130\ud504\ub808\uc784(value)\ub85c \uc774\ub8e8\uc5b4\uc9d0\nfor path in [i for i in os.listdir(PathDict[\"DATA\"]) if not i.startswith(\".\")]:\n    data_name = os.path.splitext(os.path.basename(path))[0].upper()\n    DataDict[data_name] = pd.read_csv(os.path.join(PathDict[\"DATA\"], path))\n\nProcess = QualityCheck(DataDict)\nProcess.data_check()\nProcess.document_check()\nProcess.na_check()\nProcess.run()\nProcess.save()\n```\n\n<br>\n\n## Input / Output \uc815\ubcf4\n\n### Input\n* \ub370\uc774\ud130 \ud0c0\uc785: Dictionary \ud615\ud0dc\n\t* \uc0c1\uc138 \ud615\uc0c1: {data_name1: Dataframe1, data_name2: Dataframe2,\u2026}\n\t\t* data_name: \ub370\uc774\ud130 \ud14c\uc774\ube14\uba85 or \ub370\uc774\ud130 \ud30c\uc77c\uba85 \n\t\t* Dataframe: \ub370\uc774\ud130\ub97c \ubd88\ub7ec\uc628 Dataframe \ud615\uc0c1\n* \uc608\uc2dc\n![NexR_qc_Info_002](https://github.com/mata-1223/NexR_qc/assets/131343466/5e28e8bf-37f2-4cc0-acca-c288bfbd5ccb)\n\n### Output\n* \uacb0\uacfc \ud30c\uc77c \uacbd\ub85c: output/QC_\uacb0\uacfc\uc11c.xlsx\n* \uc608\uc2dc\n1) \uc608\uc2dc 1: \ud14c\uc774\ube14 \ub9ac\uc2a4\ud2b8 \uc2dc\ud2b8\n![NexR_qc_Info_003](https://github.com/mata-1223/NexR_qc/assets/131343466/54605ebe-d45c-4ba9-b219-dd177e08a6b7)\n\n2) \uc608\uc2dc 2: \ub370\uc774\ud130 \ubcc4 QC \uc218\ud589 \uacb0\uacfc \uc2dc\ud2b8\n![NexR_qc_Info_001](https://github.com/mata-1223/NexR_qc/assets/131343466/a1613944-4812-40a2-9ec3-6452c104a96b)\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "PYPI package creation written by NexR-qc",
    "version": "0.0.12",
    "project_urls": {
        "Homepage": "https://github.com/mata-1223/NexR_qc"
    },
    "split_keywords": [
        "qc",
        " nexr",
        " mata.lee",
        " nexr_qc",
        " python",
        " python tutorial",
        " pypi"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6fe460be1e6ccc4e1c88095623d8fc5b44ee1b2be6a49308bd0dbb9ec2af7fb6",
                "md5": "dcffdb9cbbb1e374f16191616e2fc8ba",
                "sha256": "c4156beb9b31a25825e1366be3ec355851eaf32060f8971bd3d94585dd32af89"
            },
            "downloads": -1,
            "filename": "NexR_qc-0.0.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dcffdb9cbbb1e374f16191616e2fc8ba",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 12910,
            "upload_time": "2024-04-12T05:04:32",
            "upload_time_iso_8601": "2024-04-12T05:04:32.660827Z",
            "url": "https://files.pythonhosted.org/packages/6f/e4/60be1e6ccc4e1c88095623d8fc5b44ee1b2be6a49308bd0dbb9ec2af7fb6/NexR_qc-0.0.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ac4644b1d93a7873c3df83675a820de1936635ace7d3f20761b871bdcd8b4a55",
                "md5": "59e7acc8691d2000b2c0a093851989f5",
                "sha256": "a8845bbfaf8640796eff5b3f05dbd5b263fb7b17a5f5f55af533208025ec4a96"
            },
            "downloads": -1,
            "filename": "NexR_qc-0.0.12.tar.gz",
            "has_sig": false,
            "md5_digest": "59e7acc8691d2000b2c0a093851989f5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 10738,
            "upload_time": "2024-04-12T05:04:40",
            "upload_time_iso_8601": "2024-04-12T05:04:40.034300Z",
            "url": "https://files.pythonhosted.org/packages/ac/46/44b1d93a7873c3df83675a820de1936635ace7d3f20761b871bdcd8b4a55/NexR_qc-0.0.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-12 05:04:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mata-1223",
    "github_project": "NexR_qc",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "et-xmlfile",
            "specs": [
                [
                    "==",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.26.3"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    "==",
                    "3.1.2"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.0"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.8.2"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2023.3.post1"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "tzdata",
            "specs": [
                [
                    "==",
                    "2023.4"
                ]
            ]
        }
    ],
    "lcname": "nexr-qc"
}
        
Elapsed time: 0.25821s