# NexR_qc
[![PyPI version](https://badge.fury.io/py/NexR-qc.svg)](https://badge.fury.io/py/NexR-qc)
<br><br>
## 요구사항
- python >= 3.6
- numpy
- pandas
- openpyxl
<br>
## 설치
### pip 설치
```
#!/bin/bash
pip install NexR_qc
```
### 디렉토리 기본 구성
- documents 하위 항목(테이블정의서, 컬럼정의서, 코드정의서)은 필수 항목은 아니지만, 테이블별 정확한 정보를 얻기위해서 작성되는 문서임 ([Github 링크](https://github.com/mata-1223/NexR_qc)의 document 폴더 내 문서 양식 참고)
- log, output 폴더는 초기에 생성되어 있지않아도 수행 결과로 자동 생성됨
- config.json 파일은 데이터 내 결측값을 커스텀하기 위한 파일로 초기에 생성되어 있지않아도 수행 결과로 자동 생성됨 (결측처리 default 값: "?", "na", "null", "Null", "NULL", " ", "[NULL]")
```
.
├── data/ (optional)
│ ├── 데이터_001.csv
│ ├── 데이터_002.csv
│ ├── 데이터_003.xlsx
│ ├── ...
├── documents/
│ ├── 테이블정의서.xlsx
│ ├── 컬럼정의서.xlsx
│ └── 코드정의서.xlsx
├── log/
│ ├── QualityCheck_yyyymmdd_hhmmss.log
│ ├── ...
├── output/
│ └── QC결과서_yyyymmdd_hhmmss.xlsx
└── config.json
```
<br>
## 예제 실행
```
#!bin/usr/python3
from NexR_qc.QualityCheck import *
# 데이터 불러오기 (데이터 파일 활용 시)
PathDict = {}
PathDict["ROOT"] = os.getcwd()
PathDict["DATA"] = os.path.join(PathDict["ROOT"], "data") # 데이터 파일이 있는 디렉토리 경로
# 데이터 불러오기 (DB 활용시)
# DB에 적재된 데이터를 데이터프레임 형태로 불러와 하단 DataDict 형태에 맞게 준비
DataDict = {} # DataDict: 데이터명(key)-데이터프레임(value)로 이루어짐
for path in [i for i in os.listdir(PathDict["DATA"]) if not i.startswith(".")]:
data_name = os.path.splitext(os.path.basename(path))[0].upper()
DataDict[data_name] = pd.read_csv(os.path.join(PathDict["DATA"], path))
Process = QualityCheck(DataDict)
Process.data_check()
Process.document_check()
Process.na_check()
Process.run()
Process.save()
```
<br>
## Input / Output 정보
### Input
* 데이터 타입: Dictionary 형태
* 상세 형상: {data_name1: Dataframe1, data_name2: Dataframe2,…}
* data_name: 데이터 테이블명 or 데이터 파일명
* Dataframe: 데이터를 불러온 Dataframe 형상
* 예시
![NexR_qc_Info_002](https://github.com/mata-1223/NexR_qc/assets/131343466/5e28e8bf-37f2-4cc0-acca-c288bfbd5ccb)
### Output
* 결과 파일 경로: output/QC_결과서.xlsx
* 예시
1) 예시 1: 테이블 리스트 시트
![NexR_qc_Info_003](https://github.com/mata-1223/NexR_qc/assets/131343466/54605ebe-d45c-4ba9-b219-dd177e08a6b7)
2) 예시 2: 데이터 별 QC 수행 결과 시트
![NexR_qc_Info_001](https://github.com/mata-1223/NexR_qc/assets/131343466/a1613944-4812-40a2-9ec3-6452c104a96b)
Raw data
{
"_id": null,
"home_page": "https://github.com/mata-1223/NexR_qc",
"name": "NexR-qc",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "qc, NexR, mata.lee, NexR_qc, python, python tutorial, pypi",
"author": "mata.lee",
"author_email": "ldh3810@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/ac/46/44b1d93a7873c3df83675a820de1936635ace7d3f20761b871bdcd8b4a55/NexR_qc-0.0.12.tar.gz",
"platform": null,
"description": "# NexR_qc\n[![PyPI version](https://badge.fury.io/py/NexR-qc.svg)](https://badge.fury.io/py/NexR-qc)\n<br><br>\n\n## \uc694\uad6c\uc0ac\ud56d\n- python >= 3.6\n- numpy\n- pandas\n- openpyxl\n<br>\n\n## \uc124\uce58\n\n### pip \uc124\uce58\n```\n#!/bin/bash\npip install NexR_qc\n```\n\n### \ub514\ub809\ud1a0\ub9ac \uae30\ubcf8 \uad6c\uc131\n- documents \ud558\uc704 \ud56d\ubaa9(\ud14c\uc774\ube14\uc815\uc758\uc11c, \uceec\ub7fc\uc815\uc758\uc11c, \ucf54\ub4dc\uc815\uc758\uc11c)\uc740 \ud544\uc218 \ud56d\ubaa9\uc740 \uc544\ub2c8\uc9c0\ub9cc, \ud14c\uc774\ube14\ubcc4 \uc815\ud655\ud55c \uc815\ubcf4\ub97c \uc5bb\uae30\uc704\ud574\uc11c \uc791\uc131\ub418\ub294 \ubb38\uc11c\uc784 ([Github \ub9c1\ud06c](https://github.com/mata-1223/NexR_qc)\uc758 document \ud3f4\ub354 \ub0b4 \ubb38\uc11c \uc591\uc2dd \ucc38\uace0)\n- log, output \ud3f4\ub354\ub294 \ucd08\uae30\uc5d0 \uc0dd\uc131\ub418\uc5b4 \uc788\uc9c0\uc54a\uc544\ub3c4 \uc218\ud589 \uacb0\uacfc\ub85c \uc790\ub3d9 \uc0dd\uc131\ub428\n- config.json \ud30c\uc77c\uc740 \ub370\uc774\ud130 \ub0b4 \uacb0\uce21\uac12\uc744 \ucee4\uc2a4\ud140\ud558\uae30 \uc704\ud55c \ud30c\uc77c\ub85c \ucd08\uae30\uc5d0 \uc0dd\uc131\ub418\uc5b4 \uc788\uc9c0\uc54a\uc544\ub3c4 \uc218\ud589 \uacb0\uacfc\ub85c \uc790\ub3d9 \uc0dd\uc131\ub428 (\uacb0\uce21\ucc98\ub9ac default \uac12: \"?\", \"na\", \"null\", \"Null\", \"NULL\", \" \", \"[NULL]\")\n\n```\n.\n\u251c\u2500\u2500 data/ (optional)\n\u2502 \u251c\u2500\u2500 \ub370\uc774\ud130_001.csv\n\u2502 \u251c\u2500\u2500 \ub370\uc774\ud130_002.csv\n\u2502 \u251c\u2500\u2500 \ub370\uc774\ud130_003.xlsx\n\u2502 \u251c\u2500\u2500 ...\n\u251c\u2500\u2500 documents/\n\u2502 \u251c\u2500\u2500 \ud14c\uc774\ube14\uc815\uc758\uc11c.xlsx\n\u2502 \u251c\u2500\u2500 \uceec\ub7fc\uc815\uc758\uc11c.xlsx\n\u2502 \u2514\u2500\u2500 \ucf54\ub4dc\uc815\uc758\uc11c.xlsx\n\u251c\u2500\u2500 log/\n\u2502 \u251c\u2500\u2500 QualityCheck_yyyymmdd_hhmmss.log\n\u2502 \u251c\u2500\u2500 ...\n\u251c\u2500\u2500 output/\n\u2502 \u2514\u2500\u2500 QC\uacb0\uacfc\uc11c_yyyymmdd_hhmmss.xlsx\n\u2514\u2500\u2500 config.json\n``` \n<br>\n\n## \uc608\uc81c \uc2e4\ud589 \n```\n#!bin/usr/python3\nfrom NexR_qc.QualityCheck import *\n\n# \ub370\uc774\ud130 \ubd88\ub7ec\uc624\uae30 (\ub370\uc774\ud130 \ud30c\uc77c \ud65c\uc6a9 \uc2dc)\nPathDict = {}\nPathDict[\"ROOT\"] = os.getcwd()\nPathDict[\"DATA\"] = os.path.join(PathDict[\"ROOT\"], \"data\") # \ub370\uc774\ud130 \ud30c\uc77c\uc774 \uc788\ub294 \ub514\ub809\ud1a0\ub9ac \uacbd\ub85c\n\n# \ub370\uc774\ud130 \ubd88\ub7ec\uc624\uae30 (DB \ud65c\uc6a9\uc2dc)\n# DB\uc5d0 \uc801\uc7ac\ub41c \ub370\uc774\ud130\ub97c \ub370\uc774\ud130\ud504\ub808\uc784 \ud615\ud0dc\ub85c \ubd88\ub7ec\uc640 \ud558\ub2e8 DataDict \ud615\ud0dc\uc5d0 \ub9de\uac8c \uc900\ube44\n\nDataDict = {} # DataDict: \ub370\uc774\ud130\uba85(key)-\ub370\uc774\ud130\ud504\ub808\uc784(value)\ub85c \uc774\ub8e8\uc5b4\uc9d0\nfor path in [i for i in os.listdir(PathDict[\"DATA\"]) if not i.startswith(\".\")]:\n data_name = os.path.splitext(os.path.basename(path))[0].upper()\n DataDict[data_name] = pd.read_csv(os.path.join(PathDict[\"DATA\"], path))\n\nProcess = QualityCheck(DataDict)\nProcess.data_check()\nProcess.document_check()\nProcess.na_check()\nProcess.run()\nProcess.save()\n```\n\n<br>\n\n## Input / Output \uc815\ubcf4\n\n### Input\n* \ub370\uc774\ud130 \ud0c0\uc785: Dictionary \ud615\ud0dc\n\t* \uc0c1\uc138 \ud615\uc0c1: {data_name1: Dataframe1, data_name2: Dataframe2,\u2026}\n\t\t* data_name: \ub370\uc774\ud130 \ud14c\uc774\ube14\uba85 or \ub370\uc774\ud130 \ud30c\uc77c\uba85 \n\t\t* Dataframe: \ub370\uc774\ud130\ub97c \ubd88\ub7ec\uc628 Dataframe \ud615\uc0c1\n* \uc608\uc2dc\n![NexR_qc_Info_002](https://github.com/mata-1223/NexR_qc/assets/131343466/5e28e8bf-37f2-4cc0-acca-c288bfbd5ccb)\n\n### Output\n* \uacb0\uacfc \ud30c\uc77c \uacbd\ub85c: output/QC_\uacb0\uacfc\uc11c.xlsx\n* \uc608\uc2dc\n1) \uc608\uc2dc 1: \ud14c\uc774\ube14 \ub9ac\uc2a4\ud2b8 \uc2dc\ud2b8\n![NexR_qc_Info_003](https://github.com/mata-1223/NexR_qc/assets/131343466/54605ebe-d45c-4ba9-b219-dd177e08a6b7)\n\n2) \uc608\uc2dc 2: \ub370\uc774\ud130 \ubcc4 QC \uc218\ud589 \uacb0\uacfc \uc2dc\ud2b8\n![NexR_qc_Info_001](https://github.com/mata-1223/NexR_qc/assets/131343466/a1613944-4812-40a2-9ec3-6452c104a96b)\n\n",
"bugtrack_url": null,
"license": null,
"summary": "PYPI package creation written by NexR-qc",
"version": "0.0.12",
"project_urls": {
"Homepage": "https://github.com/mata-1223/NexR_qc"
},
"split_keywords": [
"qc",
" nexr",
" mata.lee",
" nexr_qc",
" python",
" python tutorial",
" pypi"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6fe460be1e6ccc4e1c88095623d8fc5b44ee1b2be6a49308bd0dbb9ec2af7fb6",
"md5": "dcffdb9cbbb1e374f16191616e2fc8ba",
"sha256": "c4156beb9b31a25825e1366be3ec355851eaf32060f8971bd3d94585dd32af89"
},
"downloads": -1,
"filename": "NexR_qc-0.0.12-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dcffdb9cbbb1e374f16191616e2fc8ba",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 12910,
"upload_time": "2024-04-12T05:04:32",
"upload_time_iso_8601": "2024-04-12T05:04:32.660827Z",
"url": "https://files.pythonhosted.org/packages/6f/e4/60be1e6ccc4e1c88095623d8fc5b44ee1b2be6a49308bd0dbb9ec2af7fb6/NexR_qc-0.0.12-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ac4644b1d93a7873c3df83675a820de1936635ace7d3f20761b871bdcd8b4a55",
"md5": "59e7acc8691d2000b2c0a093851989f5",
"sha256": "a8845bbfaf8640796eff5b3f05dbd5b263fb7b17a5f5f55af533208025ec4a96"
},
"downloads": -1,
"filename": "NexR_qc-0.0.12.tar.gz",
"has_sig": false,
"md5_digest": "59e7acc8691d2000b2c0a093851989f5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 10738,
"upload_time": "2024-04-12T05:04:40",
"upload_time_iso_8601": "2024-04-12T05:04:40.034300Z",
"url": "https://files.pythonhosted.org/packages/ac/46/44b1d93a7873c3df83675a820de1936635ace7d3f20761b871bdcd8b4a55/NexR_qc-0.0.12.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-12 05:04:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mata-1223",
"github_project": "NexR_qc",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "et-xmlfile",
"specs": [
[
"==",
"1.1.0"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.26.3"
]
]
},
{
"name": "openpyxl",
"specs": [
[
"==",
"3.1.2"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.2.0"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.8.2"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2023.3.post1"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2023.4"
]
]
}
],
"lcname": "nexr-qc"
}