NCHU-nlptoolkit

Name	NCHU-nlptoolkit JSON
Version	2.0.5 JSON
	download
home_page
Summary	nlplab dictionary, stopwords module
upload_time	2023-09-19 14:33:19
maintainer
docs_url	None
author	['davidtnfsh', 'CYJiang0718', 'dancheng']
requires_python
license	GPL3.0
keywords	nchu_nlptoolkit jieba dictionary stopwords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
自己蒐集的training data、字典和stopwords並且包成package，讓大家不用重複造輪子。

## Usage

安裝：`pip install NCHU_nlptoolkit`

1. 濾掉stopwords, remove stopwords 並且斷詞
p.s. rm stop words時就會跟著載入實驗室字典了
  ```
  from NCHU_nlptoolkit.cut import *
  
  # minword 是最小詞的字數(斷詞最少幾個字)
  
  # default
  cut_sentence(input string, flag=False, minword=1)

  # return segmentation with part of speech.
  cut_sentence(input string, flag=True, minword=1)
  ```
2. 載入法律辭典
   ```
   from NCHU_nlptoolkit.cut import *

   load_law_dict()
   ```
3. demo:
  * zh:

    ```
    >>> doc = '首先，對區塊鏈需要的第一個理解是，它是一種「將資料寫錄的技術」。'
    >>> cut_sentence(doc, flag=True)
    [('區塊鏈', 'n'), ('需要', 'n'), ('第一個', 'm'), ('理解', 'n'), ('一種', 'm'), ('資料', 'n'), ('寫錄', 'v'), ('技術', 'n')]
    ```

  * en:

    ```
    >>> doc = 'The City of New York, often called New York City (NYC) or simply New York, is the most populous city in the United States.'
    >>> list(cut_sentence_en(doc))
    ['City', 'New York', 'called', 'New York City', 'NYC', 'simply', 'New York', 'populous', 'city', 'United States']
    
    >>> list(cut_sentence_en(doc, flag=True))
    >>> [('City', 'NNP'), ('New York', 'NNP/NNP'), ('called', 'VBN'), ('New York City', 'NNP/NNP/NNP'), ('NYC', 'NN'), ('simply', 'RB'), ('New York', 'NNP/NNP'), ('populous', 'JJ'), ('city', 'NN'), ('United States', 'NNP/NNS')]
    ```

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "NCHU-nlptoolkit",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "NCHU_nlptoolkit,jieba,dictionary,stopwords",
    "author": "['davidtnfsh', 'CYJiang0718', 'dancheng']",
    "author_email": "nlpnchu@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ef/51/6465d9bfe7dd1ec49b44e709209135832485acedbe2a58e54b33099ad679/NCHU_nlptoolkit-2.0.5.tar.gz",
    "platform": null,
    "description": "\r\n\u81ea\u5df1\u8490\u96c6\u7684training data\u3001\u5b57\u5178\u548cstopwords\u4e26\u4e14\u5305\u6210package\uff0c\u8b93\u5927\u5bb6\u4e0d\u7528\u91cd\u8907\u9020\u8f2a\u5b50\u3002\r\n\r\n## Usage\r\n\r\n\u5b89\u88dd\uff1a`pip install NCHU_nlptoolkit`\r\n\r\n1. \u6ffe\u6389stopwords, remove stopwords \u4e26\u4e14\u65b7\u8a5e\r\np.s. rm stop words\u6642\u5c31\u6703\u8ddf\u8457\u8f09\u5165\u5be6\u9a57\u5ba4\u5b57\u5178\u4e86\r\n  ```\r\n  from NCHU_nlptoolkit.cut import *\r\n  \r\n  # minword \u662f\u6700\u5c0f\u8a5e\u7684\u5b57\u6578(\u65b7\u8a5e\u6700\u5c11\u5e7e\u500b\u5b57)\r\n  \r\n  # default\r\n  cut_sentence(input string, flag=False, minword=1)\r\n\r\n  # return segmentation with part of speech.\r\n  cut_sentence(input string, flag=True, minword=1)\r\n  ```\r\n2. \u8f09\u5165\u6cd5\u5f8b\u8fad\u5178\r\n   ```\r\n   from NCHU_nlptoolkit.cut import *\r\n\r\n   load_law_dict()\r\n   ```\r\n3. demo:\r\n  * zh:\r\n\r\n    ```\r\n    >>> doc = '\u9996\u5148\uff0c\u5c0d\u5340\u584a\u93c8\u9700\u8981\u7684\u7b2c\u4e00\u500b\u7406\u89e3\u662f\uff0c\u5b83\u662f\u4e00\u7a2e\u300c\u5c07\u8cc7\u6599\u5beb\u9304\u7684\u6280\u8853\u300d\u3002'\r\n    >>> cut_sentence(doc, flag=True)\r\n    [('\u5340\u584a\u93c8', 'n'), ('\u9700\u8981', 'n'), ('\u7b2c\u4e00\u500b', 'm'), ('\u7406\u89e3', 'n'), ('\u4e00\u7a2e', 'm'), ('\u8cc7\u6599', 'n'), ('\u5beb\u9304', 'v'), ('\u6280\u8853', 'n')]\r\n    ```\r\n\r\n  * en:\r\n\r\n    ```\r\n    >>> doc = 'The City of New York, often called New York City (NYC) or simply New York, is the most populous city in the United States.'\r\n    >>> list(cut_sentence_en(doc))\r\n    ['City', 'New York', 'called', 'New York City', 'NYC', 'simply', 'New York', 'populous', 'city', 'United States']\r\n    \r\n    >>> list(cut_sentence_en(doc, flag=True))\r\n    >>> [('City', 'NNP'), ('New York', 'NNP/NNP'), ('called', 'VBN'), ('New York City', 'NNP/NNP/NNP'), ('NYC', 'NN'), ('simply', 'RB'), ('New York', 'NNP/NNP'), ('populous', 'JJ'), ('city', 'NN'), ('United States', 'NNP/NNS')]\r\n    ```\r\n   \r\n",
    "bugtrack_url": null,
    "license": "GPL3.0",
    "summary": "nlplab dictionary, stopwords module",
    "version": "2.0.5",
    "project_urls": null,
    "split_keywords": [
        "nchu_nlptoolkit",
        "jieba",
        "dictionary",
        "stopwords"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ef516465d9bfe7dd1ec49b44e709209135832485acedbe2a58e54b33099ad679",
                "md5": "e728ec2652dd9c9707675b54ef74f373",
                "sha256": "86afedaacca1d798fc30a8aea34ee2994f9f61aeb007abe32142f000915e43bc"
            },
            "downloads": -1,
            "filename": "NCHU_nlptoolkit-2.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "e728ec2652dd9c9707675b54ef74f373",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 12909393,
            "upload_time": "2023-09-19T14:33:19",
            "upload_time_iso_8601": "2023-09-19T14:33:19.264433Z",
            "url": "https://files.pythonhosted.org/packages/ef/51/6465d9bfe7dd1ec49b44e709209135832485acedbe2a58e54b33099ad679/NCHU_nlptoolkit-2.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-19 14:33:19",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "nchu-nlptoolkit"
}

['davidtnfsh', 'CYJiang0718', 'dancheng']