# ekorpkit 【iːkɔːkɪt】 : **eKo**nomic **R**esearch **P**ython Tool**kit**
[![PyPI version](https://badge.fury.io/py/ekorpkit.svg)](https://badge.fury.io/py/ekorpkit) [![Jupyter Book Badge](https://jupyterbook.org/en/stable/_images/badge.svg)](https://entelecheia.github.io/ekorpkit-book/) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6497226.svg)](https://doi.org/10.5281/zenodo.6497226) [![release](https://github.com/entelecheia/ekorpkit/actions/workflows/release.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/release.yaml) [![CodeQL](https://github.com/entelecheia/ekorpkit/actions/workflows/codeql-analysis.yml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/codeql-analysis.yml) [![test](https://github.com/entelecheia/ekorpkit/actions/workflows/test.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/test.yaml) [![CircleCI](https://circleci.com/gh/entelecheia/ekorpkit/tree/main.svg?style=shield)](https://circleci.com/gh/entelecheia/ekorpkit/tree/main) [![codecov](https://codecov.io/gh/entelecheia/ekorpkit/branch/main/graph/badge.svg?token=8I4ORHRREL)](https://codecov.io/gh/entelecheia/ekorpkit) [![markdown-autodocs](https://github.com/entelecheia/ekorpkit/actions/workflows/markdown-autodocs.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/markdown-autodocs.yaml)
eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization. Its powerful config composition is backed by [Hydra](https://hydra.cc/).
## Key features
### Easy Configuration
- You can compose your configuration dynamically, enabling you to easily get the perfect configuration for each research.
- You can override everything from the command line, which makes experimentation fast, and removes the need to maintain multiple similar configuration files.
- With a help of the **eKonf** class, it is also easy to compose configurations in a jupyter notebook environment.
### No Boilerplate
- eKorpkit lets you focus on the problem at hand instead of spending time on boilerplate code like command line flags, loading configuration files, logging etc.
### Workflows
- A workflow is a configurable automated process that will run one or more jobs.
- You can divide your research into several unit jobs (tasks), then combine those jobs into one workflow.
- You can have multiple workflows, each of which can perform a different set of tasks.
### Sharable and Reproducible
- With eKorpkit, you can easily share your datasets and models.
- Sharing configs along with datasets and models makes every research reproducible.
- You can share each unit jobs or an entire workflow.
### Pluggable Architecture
- eKorpkit has a pluggable architecture, enabling it to combine with your own implementation.
## [Tutorials](https://entelecheia.github.io/ekorpkit-book)
Tutorials for [ekorpkit](https://github.com/entelecheia/ekorpkit) package can be found at https://entelecheia.github.io/ekorpkit-book/
## [Installation](https://entelecheia.github.io/ekorpkit-book/docs/basics/install.html)
Install the latest version of ekorpkit:
```bash
pip install ekorpkit
```
To install all extra dependencies,
```bash
pip install ekorpkit[all]
```
## [The eKorpkit Corpus](https://github.com/entelecheia/ekorpkit/blob/main/docs/corpus/README.md)
The eKorpkit Corpus is a large, diverse, bilingual (ko/en) language modelling dataset.
![ekorpkit corpus](https://github.com/entelecheia/ekorpkit/blob/main/docs/figs/ekorpkit_corpus.png?raw=true)
## Citation
```tex
@software{lee_2022_6497226,
author = {Young Joon Lee},
title = {eKorpkit: eKonomic Research Python Toolkit},
month = apr,
year = 2022,
publisher = {Zenodo},
doi = {10.5281/zenodo.6497226},
url = {https://doi.org/10.5281/zenodo.6497226}
}
```
```tex
@software{lee_2022_ekorpkit,
author = {Young Joon Lee},
title = {eKorpkit: eKonomic Research Python Toolkit},
month = apr,
year = 2022,
publisher = {GitHub},
url = {https://github.com/entelecheia/ekorpkit}
}
```
## License
- eKorpkit is licensed under the [MIT License](https://opensource.org/licenses/MIT). This license covers the eKorpkit package and all of its components.
- Each corpus adheres to its own license policy. Please check the license of the corpus before using it!
Raw data
{
"_id": null,
"home_page": "https://github.com/entelecheia/ekorpkit",
"name": "ekorpkit",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "Young Joon Lee",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/18/e6/962894cbfafa452474a9e6c545af5ed9a5681e30cce3a01ea5d3bbab9ec2/ekorpkit-0.1.40.tar.gz",
"platform": null,
"description": "# ekorpkit \u3010i\u02d0k\u0254\u02d0k\u026at\u3011 : **eKo**nomic **R**esearch **P**ython Tool**kit**\n\n[![PyPI version](https://badge.fury.io/py/ekorpkit.svg)](https://badge.fury.io/py/ekorpkit) [![Jupyter Book Badge](https://jupyterbook.org/en/stable/_images/badge.svg)](https://entelecheia.github.io/ekorpkit-book/) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6497226.svg)](https://doi.org/10.5281/zenodo.6497226) [![release](https://github.com/entelecheia/ekorpkit/actions/workflows/release.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/release.yaml) [![CodeQL](https://github.com/entelecheia/ekorpkit/actions/workflows/codeql-analysis.yml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/codeql-analysis.yml) [![test](https://github.com/entelecheia/ekorpkit/actions/workflows/test.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/test.yaml) [![CircleCI](https://circleci.com/gh/entelecheia/ekorpkit/tree/main.svg?style=shield)](https://circleci.com/gh/entelecheia/ekorpkit/tree/main) [![codecov](https://codecov.io/gh/entelecheia/ekorpkit/branch/main/graph/badge.svg?token=8I4ORHRREL)](https://codecov.io/gh/entelecheia/ekorpkit) [![markdown-autodocs](https://github.com/entelecheia/ekorpkit/actions/workflows/markdown-autodocs.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/markdown-autodocs.yaml)\n\neKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization. Its powerful config composition is backed by [Hydra](https://hydra.cc/).\n\n## Key features\n\n### Easy Configuration\n\n- You can compose your configuration dynamically, enabling you to easily get the perfect configuration for each research. \n- You can override everything from the command line, which makes experimentation fast, and removes the need to maintain multiple similar configuration files. \n- With a help of the **eKonf** class, it is also easy to compose configurations in a jupyter notebook environment.\n\n### No Boilerplate\n\n- eKorpkit lets you focus on the problem at hand instead of spending time on boilerplate code like command line flags, loading configuration files, logging etc.\n\n### Workflows\n\n- A workflow is a configurable automated process that will run one or more jobs.\n- You can divide your research into several unit jobs (tasks), then combine those jobs into one workflow.\n- You can have multiple workflows, each of which can perform a different set of tasks.\n\n### Sharable and Reproducible\n\n- With eKorpkit, you can easily share your datasets and models.\n- Sharing configs along with datasets and models makes every research reproducible.\n- You can share each unit jobs or an entire workflow.\n\n### Pluggable Architecture\n\n- eKorpkit has a pluggable architecture, enabling it to combine with your own implementation.\n\n## [Tutorials](https://entelecheia.github.io/ekorpkit-book)\n\nTutorials for [ekorpkit](https://github.com/entelecheia/ekorpkit) package can be found at https://entelecheia.github.io/ekorpkit-book/\n\n## [Installation](https://entelecheia.github.io/ekorpkit-book/docs/basics/install.html)\n\nInstall the latest version of ekorpkit:\n\n```bash\npip install ekorpkit\n```\n\nTo install all extra dependencies,\n\n```bash\npip install ekorpkit[all]\n```\n\n## [The eKorpkit Corpus](https://github.com/entelecheia/ekorpkit/blob/main/docs/corpus/README.md)\n\nThe eKorpkit Corpus is a large, diverse, bilingual (ko/en) language modelling dataset.\n\n![ekorpkit corpus](https://github.com/entelecheia/ekorpkit/blob/main/docs/figs/ekorpkit_corpus.png?raw=true)\n\n## Citation\n\n```tex\n@software{lee_2022_6497226,\n author = {Young Joon Lee},\n title = {eKorpkit: eKonomic Research Python Toolkit},\n month = apr,\n year = 2022,\n publisher = {Zenodo},\n doi = {10.5281/zenodo.6497226},\n url = {https://doi.org/10.5281/zenodo.6497226}\n}\n```\n\n```tex\n@software{lee_2022_ekorpkit,\n author = {Young Joon Lee},\n title = {eKorpkit: eKonomic Research Python Toolkit},\n month = apr,\n year = 2022,\n publisher = {GitHub},\n url = {https://github.com/entelecheia/ekorpkit}\n}\n```\n\n## License\n\n- eKorpkit is licensed under the [MIT License](https://opensource.org/licenses/MIT). This license covers the eKorpkit package and all of its components.\n- Each corpus adheres to its own license policy. Please check the license of the corpus before using it!\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization.",
"version": "0.1.40",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b78c96a2d8445153840fc83242cff54ce8a1d43857c1e9557808848c4aada905",
"md5": "9b3d17ee1d8d0a531080456406e46f49",
"sha256": "71d9f35e0443f1d21b4221599a93494679e0b3612b5b0a8605df4033c2e2c1d7"
},
"downloads": -1,
"filename": "ekorpkit-0.1.40-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9b3d17ee1d8d0a531080456406e46f49",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 7225376,
"upload_time": "2022-10-24T04:22:04",
"upload_time_iso_8601": "2022-10-24T04:22:04.102265Z",
"url": "https://files.pythonhosted.org/packages/b7/8c/96a2d8445153840fc83242cff54ce8a1d43857c1e9557808848c4aada905/ekorpkit-0.1.40-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "18e6962894cbfafa452474a9e6c545af5ed9a5681e30cce3a01ea5d3bbab9ec2",
"md5": "b96f3c31b44f7ae65e3f41e4ed6114f0",
"sha256": "9264dbfc4c8965b1f76a92ad82137e15d44ced02ae2a046fb3168a8c2ec607bf"
},
"downloads": -1,
"filename": "ekorpkit-0.1.40.tar.gz",
"has_sig": false,
"md5_digest": "b96f3c31b44f7ae65e3f41e4ed6114f0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 6981530,
"upload_time": "2022-10-24T04:22:06",
"upload_time_iso_8601": "2022-10-24T04:22:06.987722Z",
"url": "https://files.pythonhosted.org/packages/18/e6/962894cbfafa452474a9e6c545af5ed9a5681e30cce3a01ea5d3bbab9ec2/ekorpkit-0.1.40.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-10-24 04:22:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "entelecheia",
"github_project": "ekorpkit",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"circle": true,
"requirements": [
{
"name": "numpy",
"specs": []
},
{
"name": "tqdm",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "hydra-core",
"specs": [
[
">=",
"1.2.0"
]
]
},
{
"name": "hydra-colorlog",
"specs": []
},
{
"name": "pydantic",
"specs": []
},
{
"name": "python-dotenv",
"specs": []
},
{
"name": "gdown",
"specs": []
},
{
"name": "chardet",
"specs": []
},
{
"name": "rehash",
"specs": []
},
{
"name": "requests",
"specs": []
},
{
"name": "scipy",
"specs": []
},
{
"name": "pytablewriter",
"specs": [
[
">=",
"0.64.0"
]
]
},
{
"name": "ftfy",
"specs": []
},
{
"name": "requests",
"specs": [
[
"<",
"3.0"
],
[
">=",
"2.0"
]
]
},
{
"name": "rich",
"specs": [
[
">=",
"11.1"
]
]
},
{
"name": "filelock",
"specs": [
[
">=",
"3.4"
],
[
"<",
"3.8"
]
]
},
{
"name": "dataclasses",
"specs": []
},
{
"name": "huggingface-hub",
"specs": [
[
">=",
"0.8.1"
]
]
}
],
"lcname": "ekorpkit"
}