mecab-ko-dic


Namemecab-ko-dic JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/LuminosoInsight/mecab-ko-dic
Summarymecab-ko-dic packaged for Python
upload_time2021-02-19 20:41:22
maintainer
docs_urlNone
authorRobyn Speer
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # mecab-ko-dic

This is a Korean dictionary for the MeCab tokenizer, packaged for use in Python
with [mecab-python3](https://github.com/SamuraiT/mecab-python3) or
[fugashi](https://github.com/polm/fugashi). 


## Should you use this?

We (at Luminoso) think this is probably a better way to find tokens in Korean
words than just splitting the text on spaces would be. We also know that MeCab
wasn't designed for Korean. But it gives us the kind of tokens we need better
than other things we've tried.

One reason we're packaging this is to use it in [wordfreq][], which needs to
have a reasonable set of consistent tokens so it can look up their frequencies,
and already uses MeCab for Japanese.

[wordfreq]: https://github.com/LuminosoInsight/wordfreq

If interoperability with things that already use MeCab is not your goal, you
probably have better options for Korean NLP than this.


## Credits

The dictionary data was created by Yongwoon Lee and Yungho Yu. We've included it
as part of this package under the terms of the Apache License 2.0. The original
dictionary can be found [here][original-dict].

The idea to package a MeCab dictionary as a Python repository, as well as the
code structure for doing so, come from [Paul McCann's ipadic package][ipadic-py].

[original-dict]: https://bitbucket.org/eunjeon/mecab-ko-dic/
[ipadic-py]: https://github.com/polm/ipadic-py


## Usage

To install:

    pip install mecab-ko-dic

To initialize with mecab-python3:

    import MeCab
    import mecab_ko_dic
    tagger = MeCab.Tagger(mecab_ko_dic.MECAB_ARGS)
    print(tagger.parse("안녕하세요세계."))


## License

The data we use, and this code repository itself, are released under the Apache
License 2.0. See the LICENSE.txt file included in this distribution.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/LuminosoInsight/mecab-ko-dic",
    "name": "mecab-ko-dic",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Robyn Speer",
    "author_email": "rspeer@luminoso.com",
    "download_url": "https://files.pythonhosted.org/packages/db/86/b88d823b0c912bb8e86584d697d51c866df567bd328f9196efab7835b8ab/mecab-ko-dic-1.0.0.tar.gz",
    "platform": "",
    "description": "# mecab-ko-dic\n\nThis is a Korean dictionary for the MeCab tokenizer, packaged for use in Python\nwith [mecab-python3](https://github.com/SamuraiT/mecab-python3) or\n[fugashi](https://github.com/polm/fugashi). \n\n\n## Should you use this?\n\nWe (at Luminoso) think this is probably a better way to find tokens in Korean\nwords than just splitting the text on spaces would be. We also know that MeCab\nwasn't designed for Korean. But it gives us the kind of tokens we need better\nthan other things we've tried.\n\nOne reason we're packaging this is to use it in [wordfreq][], which needs to\nhave a reasonable set of consistent tokens so it can look up their frequencies,\nand already uses MeCab for Japanese.\n\n[wordfreq]: https://github.com/LuminosoInsight/wordfreq\n\nIf interoperability with things that already use MeCab is not your goal, you\nprobably have better options for Korean NLP than this.\n\n\n## Credits\n\nThe dictionary data was created by Yongwoon Lee and Yungho Yu. We've included it\nas part of this package under the terms of the Apache License 2.0. The original\ndictionary can be found [here][original-dict].\n\nThe idea to package a MeCab dictionary as a Python repository, as well as the\ncode structure for doing so, come from [Paul McCann's ipadic package][ipadic-py].\n\n[original-dict]: https://bitbucket.org/eunjeon/mecab-ko-dic/\n[ipadic-py]: https://github.com/polm/ipadic-py\n\n\n## Usage\n\nTo install:\n\n    pip install mecab-ko-dic\n\nTo initialize with mecab-python3:\n\n    import MeCab\n    import mecab_ko_dic\n    tagger = MeCab.Tagger(mecab_ko_dic.MECAB_ARGS)\n    print(tagger.parse(\"\uc548\ub155\ud558\uc138\uc694\uc138\uacc4.\"))\n\n\n## License\n\nThe data we use, and this code repository itself, are released under the Apache\nLicense 2.0. See the LICENSE.txt file included in this distribution.",
    "bugtrack_url": null,
    "license": "",
    "summary": "mecab-ko-dic packaged for Python",
    "version": "1.0.0",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "0a7cd0f4d032f18fb8d3f7c63e45c180",
                "sha256": "3ba22858736e02e8a0e92f2a7f099528c733ae47701b29d12c75e982a85d1f11"
            },
            "downloads": -1,
            "filename": "mecab-ko-dic-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "0a7cd0f4d032f18fb8d3f7c63e45c180",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 33150228,
            "upload_time": "2021-02-19T20:41:22",
            "upload_time_iso_8601": "2021-02-19T20:41:22.160292Z",
            "url": "https://files.pythonhosted.org/packages/db/86/b88d823b0c912bb8e86584d697d51c866df567bd328f9196efab7835b8ab/mecab-ko-dic-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-02-19 20:41:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "LuminosoInsight",
    "github_project": "mecab-ko-dic",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "mecab-ko-dic"
}
        
Elapsed time: 0.01285s