ekorpkit


Nameekorpkit JSON
Version 0.1.40 PyPI version JSON
download
home_pagehttps://github.com/entelecheia/ekorpkit
SummaryeKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization.
upload_time2022-10-24 04:22:06
maintainer
docs_urlNone
authorYoung Joon Lee
requires_python>=3.7
license
keywords
VCS
bugtrack_url
requirements numpy tqdm pandas hydra-core hydra-colorlog pydantic python-dotenv gdown chardet rehash requests scipy pytablewriter ftfy requests rich filelock dataclasses huggingface-hub
Travis-CI No Travis.
coveralls test coverage
            # ekorpkit 【iːkɔːkɪt】 : **eKo**nomic **R**esearch **P**ython Tool**kit**

[![PyPI version](https://badge.fury.io/py/ekorpkit.svg)](https://badge.fury.io/py/ekorpkit) [![Jupyter Book Badge](https://jupyterbook.org/en/stable/_images/badge.svg)](https://entelecheia.github.io/ekorpkit-book/) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6497226.svg)](https://doi.org/10.5281/zenodo.6497226) [![release](https://github.com/entelecheia/ekorpkit/actions/workflows/release.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/release.yaml) [![CodeQL](https://github.com/entelecheia/ekorpkit/actions/workflows/codeql-analysis.yml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/codeql-analysis.yml) [![test](https://github.com/entelecheia/ekorpkit/actions/workflows/test.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/test.yaml) [![CircleCI](https://circleci.com/gh/entelecheia/ekorpkit/tree/main.svg?style=shield)](https://circleci.com/gh/entelecheia/ekorpkit/tree/main) [![codecov](https://codecov.io/gh/entelecheia/ekorpkit/branch/main/graph/badge.svg?token=8I4ORHRREL)](https://codecov.io/gh/entelecheia/ekorpkit) [![markdown-autodocs](https://github.com/entelecheia/ekorpkit/actions/workflows/markdown-autodocs.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/markdown-autodocs.yaml)

eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization. Its powerful config composition is backed by [Hydra](https://hydra.cc/).

## Key features

### Easy Configuration

- You can compose your configuration dynamically, enabling you to easily get the perfect configuration for each research. 
- You can override everything from the command line, which makes experimentation fast, and removes the need to maintain multiple similar configuration files. 
- With a help of the **eKonf** class, it is also easy to compose configurations in a jupyter notebook environment.

### No Boilerplate

- eKorpkit lets you focus on the problem at hand instead of spending time on boilerplate code like command line flags, loading configuration files, logging etc.

### Workflows

- A workflow is a configurable automated process that will run one or more jobs.
- You can divide your research into several unit jobs (tasks), then combine those jobs into one workflow.
- You can have multiple workflows, each of which can perform a different set of tasks.

### Sharable and Reproducible

- With eKorpkit, you can easily share your datasets and models.
- Sharing configs along with datasets and models makes every research reproducible.
- You can share each unit jobs or an entire workflow.

### Pluggable Architecture

- eKorpkit has a pluggable architecture, enabling it to combine with your own implementation.

## [Tutorials](https://entelecheia.github.io/ekorpkit-book)

Tutorials for [ekorpkit](https://github.com/entelecheia/ekorpkit) package can be found at https://entelecheia.github.io/ekorpkit-book/

## [Installation](https://entelecheia.github.io/ekorpkit-book/docs/basics/install.html)

Install the latest version of ekorpkit:

```bash
pip install ekorpkit
```

To install all extra dependencies,

```bash
pip install ekorpkit[all]
```

## [The eKorpkit Corpus](https://github.com/entelecheia/ekorpkit/blob/main/docs/corpus/README.md)

The eKorpkit Corpus is a large, diverse, bilingual (ko/en) language modelling dataset.

![ekorpkit corpus](https://github.com/entelecheia/ekorpkit/blob/main/docs/figs/ekorpkit_corpus.png?raw=true)

## Citation

```tex
@software{lee_2022_6497226,
  author       = {Young Joon Lee},
  title        = {eKorpkit: eKonomic Research Python Toolkit},
  month        = apr,
  year         = 2022,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.6497226},
  url          = {https://doi.org/10.5281/zenodo.6497226}
}
```

```tex
@software{lee_2022_ekorpkit,
  author       = {Young Joon Lee},
  title        = {eKorpkit: eKonomic Research Python Toolkit},
  month        = apr,
  year         = 2022,
  publisher    = {GitHub},
  url          = {https://github.com/entelecheia/ekorpkit}
}
```

## License

- eKorpkit is licensed under the [MIT License](https://opensource.org/licenses/MIT). This license covers the eKorpkit package and all of its components.
- Each corpus adheres to its own license policy. Please check the license of the corpus before using it!



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/entelecheia/ekorpkit",
    "name": "ekorpkit",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Young Joon Lee",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/18/e6/962894cbfafa452474a9e6c545af5ed9a5681e30cce3a01ea5d3bbab9ec2/ekorpkit-0.1.40.tar.gz",
    "platform": null,
    "description": "# ekorpkit \u3010i\u02d0k\u0254\u02d0k\u026at\u3011 : **eKo**nomic **R**esearch **P**ython Tool**kit**\n\n[![PyPI version](https://badge.fury.io/py/ekorpkit.svg)](https://badge.fury.io/py/ekorpkit) [![Jupyter Book Badge](https://jupyterbook.org/en/stable/_images/badge.svg)](https://entelecheia.github.io/ekorpkit-book/) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6497226.svg)](https://doi.org/10.5281/zenodo.6497226) [![release](https://github.com/entelecheia/ekorpkit/actions/workflows/release.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/release.yaml) [![CodeQL](https://github.com/entelecheia/ekorpkit/actions/workflows/codeql-analysis.yml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/codeql-analysis.yml) [![test](https://github.com/entelecheia/ekorpkit/actions/workflows/test.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/test.yaml) [![CircleCI](https://circleci.com/gh/entelecheia/ekorpkit/tree/main.svg?style=shield)](https://circleci.com/gh/entelecheia/ekorpkit/tree/main) [![codecov](https://codecov.io/gh/entelecheia/ekorpkit/branch/main/graph/badge.svg?token=8I4ORHRREL)](https://codecov.io/gh/entelecheia/ekorpkit) [![markdown-autodocs](https://github.com/entelecheia/ekorpkit/actions/workflows/markdown-autodocs.yaml/badge.svg)](https://github.com/entelecheia/ekorpkit/actions/workflows/markdown-autodocs.yaml)\n\neKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization. Its powerful config composition is backed by [Hydra](https://hydra.cc/).\n\n## Key features\n\n### Easy Configuration\n\n- You can compose your configuration dynamically, enabling you to easily get the perfect configuration for each research. \n- You can override everything from the command line, which makes experimentation fast, and removes the need to maintain multiple similar configuration files. \n- With a help of the **eKonf** class, it is also easy to compose configurations in a jupyter notebook environment.\n\n### No Boilerplate\n\n- eKorpkit lets you focus on the problem at hand instead of spending time on boilerplate code like command line flags, loading configuration files, logging etc.\n\n### Workflows\n\n- A workflow is a configurable automated process that will run one or more jobs.\n- You can divide your research into several unit jobs (tasks), then combine those jobs into one workflow.\n- You can have multiple workflows, each of which can perform a different set of tasks.\n\n### Sharable and Reproducible\n\n- With eKorpkit, you can easily share your datasets and models.\n- Sharing configs along with datasets and models makes every research reproducible.\n- You can share each unit jobs or an entire workflow.\n\n### Pluggable Architecture\n\n- eKorpkit has a pluggable architecture, enabling it to combine with your own implementation.\n\n## [Tutorials](https://entelecheia.github.io/ekorpkit-book)\n\nTutorials for [ekorpkit](https://github.com/entelecheia/ekorpkit) package can be found at https://entelecheia.github.io/ekorpkit-book/\n\n## [Installation](https://entelecheia.github.io/ekorpkit-book/docs/basics/install.html)\n\nInstall the latest version of ekorpkit:\n\n```bash\npip install ekorpkit\n```\n\nTo install all extra dependencies,\n\n```bash\npip install ekorpkit[all]\n```\n\n## [The eKorpkit Corpus](https://github.com/entelecheia/ekorpkit/blob/main/docs/corpus/README.md)\n\nThe eKorpkit Corpus is a large, diverse, bilingual (ko/en) language modelling dataset.\n\n![ekorpkit corpus](https://github.com/entelecheia/ekorpkit/blob/main/docs/figs/ekorpkit_corpus.png?raw=true)\n\n## Citation\n\n```tex\n@software{lee_2022_6497226,\n  author       = {Young Joon Lee},\n  title        = {eKorpkit: eKonomic Research Python Toolkit},\n  month        = apr,\n  year         = 2022,\n  publisher    = {Zenodo},\n  doi          = {10.5281/zenodo.6497226},\n  url          = {https://doi.org/10.5281/zenodo.6497226}\n}\n```\n\n```tex\n@software{lee_2022_ekorpkit,\n  author       = {Young Joon Lee},\n  title        = {eKorpkit: eKonomic Research Python Toolkit},\n  month        = apr,\n  year         = 2022,\n  publisher    = {GitHub},\n  url          = {https://github.com/entelecheia/ekorpkit}\n}\n```\n\n## License\n\n- eKorpkit is licensed under the [MIT License](https://opensource.org/licenses/MIT). This license covers the eKorpkit package and all of its components.\n- Each corpus adheres to its own license policy. Please check the license of the corpus before using it!\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization.",
    "version": "0.1.40",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b78c96a2d8445153840fc83242cff54ce8a1d43857c1e9557808848c4aada905",
                "md5": "9b3d17ee1d8d0a531080456406e46f49",
                "sha256": "71d9f35e0443f1d21b4221599a93494679e0b3612b5b0a8605df4033c2e2c1d7"
            },
            "downloads": -1,
            "filename": "ekorpkit-0.1.40-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9b3d17ee1d8d0a531080456406e46f49",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 7225376,
            "upload_time": "2022-10-24T04:22:04",
            "upload_time_iso_8601": "2022-10-24T04:22:04.102265Z",
            "url": "https://files.pythonhosted.org/packages/b7/8c/96a2d8445153840fc83242cff54ce8a1d43857c1e9557808848c4aada905/ekorpkit-0.1.40-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "18e6962894cbfafa452474a9e6c545af5ed9a5681e30cce3a01ea5d3bbab9ec2",
                "md5": "b96f3c31b44f7ae65e3f41e4ed6114f0",
                "sha256": "9264dbfc4c8965b1f76a92ad82137e15d44ced02ae2a046fb3168a8c2ec607bf"
            },
            "downloads": -1,
            "filename": "ekorpkit-0.1.40.tar.gz",
            "has_sig": false,
            "md5_digest": "b96f3c31b44f7ae65e3f41e4ed6114f0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 6981530,
            "upload_time": "2022-10-24T04:22:06",
            "upload_time_iso_8601": "2022-10-24T04:22:06.987722Z",
            "url": "https://files.pythonhosted.org/packages/18/e6/962894cbfafa452474a9e6c545af5ed9a5681e30cce3a01ea5d3bbab9ec2/ekorpkit-0.1.40.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-10-24 04:22:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "entelecheia",
    "github_project": "ekorpkit",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "circle": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "hydra-core",
            "specs": [
                [
                    ">=",
                    "1.2.0"
                ]
            ]
        },
        {
            "name": "hydra-colorlog",
            "specs": []
        },
        {
            "name": "pydantic",
            "specs": []
        },
        {
            "name": "python-dotenv",
            "specs": []
        },
        {
            "name": "gdown",
            "specs": []
        },
        {
            "name": "chardet",
            "specs": []
        },
        {
            "name": "rehash",
            "specs": []
        },
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": []
        },
        {
            "name": "pytablewriter",
            "specs": [
                [
                    ">=",
                    "0.64.0"
                ]
            ]
        },
        {
            "name": "ftfy",
            "specs": []
        },
        {
            "name": "requests",
            "specs": [
                [
                    "<",
                    "3.0"
                ],
                [
                    ">=",
                    "2.0"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    ">=",
                    "11.1"
                ]
            ]
        },
        {
            "name": "filelock",
            "specs": [
                [
                    ">=",
                    "3.4"
                ],
                [
                    "<",
                    "3.8"
                ]
            ]
        },
        {
            "name": "dataclasses",
            "specs": []
        },
        {
            "name": "huggingface-hub",
            "specs": [
                [
                    ">=",
                    "0.8.1"
                ]
            ]
        }
    ],
    "lcname": "ekorpkit"
}
        
Elapsed time: 0.03250s