SuffixAutomaton


NameSuffixAutomaton JSON
Version 0.0.9 PyPI version JSON
download
home_pagehttps://github.com/laohur/SuffixAutomaton
Summarysuffix automaton by words, to get text common substrings
upload_time2022-12-22 14:00:05
maintainer
docs_urlNone
authorlaohur
requires_python>=3.0
license[Anti-996 License](https: // github.com/996icu/996.ICU/blob/master/LICENSE)
keywords suffix automaton sam
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# SuffixAutomaton 后缀自动机
suffix automaton by words, to get text common substrings


## usage
> pip install SuffixAutomaton 

```python
raw = """
    ASE : International Conference on Automated Software Engineering
    ESEC/FSE : ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
    ICSE : International Conference on Software Engineering
    ISSTA : The International Symposium on Software Testing and Analysis
    """
doc = raw.splitlines()
doc = [x for x in doc if x]
doc = [x.split() for x in doc]

from SuffixAutomaton import SuffixAutomaton,lcs1,lcs2
# tokenize in words
# longest
# [(['Software', 'Engineering'], 14, 5)]
print(lcs1(doc[1], doc[2]))
# [([':'], 1), (['on'], 4), (['Software'], 6)]
print(lcs2(doc[0], doc[1:4]))

# tokenize in chars
# all common substrings
poet = "江天一色无纤尘皎皎空中孤月轮 江畔何人初见月江月何年初照人 人生代代无穷已江月年年望相似 不知江月待何人但见长江送流水"
doc = poet.split()
[('江', 0, 10), ('何人', 2, 5), ('见', 5, 8), ('江月', 7, 2)]
print(lcs1(doc[1], doc[3], 1))
[('人', 0), ('江月', 7)]
print(lcs2(doc[2], doc[2:4], 1))


```

## feature
* suffix automaton [in words] 可分词后缀自动机
* [Longest] Common Substring of two lines 两文[最长]共串
* [Longest] Common Substring of document 多文[最长]共串


## inspired by 
    参照:https://www.cnblogs.com/shld/p/10444808.html
    讲解:https://www.cnblogs.com/zjp-shadow/p/9218214.html
    详解:https://www.cnblogs.com/1625--H/p/12416198.html
    证明:https://oi-wiki.org/string/sam/
    题解:https://www.cnblogs.com/Lyush/archive/2013/08/25/3281546.html https://www.cnblogs.com/mollnn/p/13175736.html



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/laohur/SuffixAutomaton",
    "name": "SuffixAutomaton",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.0",
    "maintainer_email": "",
    "keywords": "Suffix Automaton,sam",
    "author": "laohur",
    "author_email": "laohur@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e8/30/97e3750a69c4f9ca1ac47af91098b35dc61b3cff1569772dab7ec28af3e8/SuffixAutomaton-0.0.9.tar.gz",
    "platform": null,
    "description": "\n# SuffixAutomaton \u540e\u7f00\u81ea\u52a8\u673a\nsuffix automaton by words, to get text common substrings\n\n\n## usage\n> pip install SuffixAutomaton \n\n```python\nraw = \"\"\"\n    ASE : International Conference on Automated Software Engineering\n    ESEC/FSE : ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering\n    ICSE : International Conference on Software Engineering\n    ISSTA : The International Symposium on Software Testing and Analysis\n    \"\"\"\ndoc = raw.splitlines()\ndoc = [x for x in doc if x]\ndoc = [x.split() for x in doc]\n\nfrom SuffixAutomaton import SuffixAutomaton,lcs1,lcs2\n# tokenize in words\n# longest\n# [(['Software', 'Engineering'], 14, 5)]\nprint(lcs1(doc[1], doc[2]))\n# [([':'], 1), (['on'], 4), (['Software'], 6)]\nprint(lcs2(doc[0], doc[1:4]))\n\n# tokenize in chars\n# all common substrings\npoet = \"\u6c5f\u5929\u4e00\u8272\u65e0\u7ea4\u5c18\u768e\u768e\u7a7a\u4e2d\u5b64\u6708\u8f6e \u6c5f\u7554\u4f55\u4eba\u521d\u89c1\u6708\u6c5f\u6708\u4f55\u5e74\u521d\u7167\u4eba \u4eba\u751f\u4ee3\u4ee3\u65e0\u7a77\u5df2\u6c5f\u6708\u5e74\u5e74\u671b\u76f8\u4f3c \u4e0d\u77e5\u6c5f\u6708\u5f85\u4f55\u4eba\u4f46\u89c1\u957f\u6c5f\u9001\u6d41\u6c34\"\ndoc = poet.split()\n[('\u6c5f', 0, 10), ('\u4f55\u4eba', 2, 5), ('\u89c1', 5, 8), ('\u6c5f\u6708', 7, 2)]\nprint(lcs1(doc[1], doc[3], 1))\n[('\u4eba', 0), ('\u6c5f\u6708', 7)]\nprint(lcs2(doc[2], doc[2:4], 1))\n\n\n```\n\n## feature\n* suffix automaton [in words] \u53ef\u5206\u8bcd\u540e\u7f00\u81ea\u52a8\u673a\n* [Longest] Common Substring of two lines \u4e24\u6587[\u6700\u957f]\u5171\u4e32\n* [Longest] Common Substring of document \u591a\u6587[\u6700\u957f]\u5171\u4e32\n\n\n## inspired by \n    \u53c2\u7167\uff1ahttps://www.cnblogs.com/shld/p/10444808.html\n    \u8bb2\u89e3\uff1ahttps://www.cnblogs.com/zjp-shadow/p/9218214.html\n    \u8be6\u89e3\uff1ahttps://www.cnblogs.com/1625--H/p/12416198.html\n    \u8bc1\u660e\uff1ahttps://oi-wiki.org/string/sam/\n    \u9898\u89e3\uff1ahttps://www.cnblogs.com/Lyush/archive/2013/08/25/3281546.html https://www.cnblogs.com/mollnn/p/13175736.html\n\n\n",
    "bugtrack_url": null,
    "license": "[Anti-996 License](https: // github.com/996icu/996.ICU/blob/master/LICENSE)",
    "summary": "suffix automaton by words, to get text common substrings",
    "version": "0.0.9",
    "split_keywords": [
        "suffix automaton",
        "sam"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "ad2739ac25791989b90326faa52c3afd",
                "sha256": "e64b1b30d3d938c37010b0528c21af991f0a798d366b0d6583946ccdded85ab6"
            },
            "downloads": -1,
            "filename": "SuffixAutomaton-0.0.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ad2739ac25791989b90326faa52c3afd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.0",
            "size": 5957,
            "upload_time": "2022-12-22T14:00:03",
            "upload_time_iso_8601": "2022-12-22T14:00:03.418315Z",
            "url": "https://files.pythonhosted.org/packages/17/b0/be8571b74f17f277ecac2a48608521eec73d33ebe4db08434567a8d026da/SuffixAutomaton-0.0.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "ffed929450807d5f1e36e5be79e03368",
                "sha256": "eda58a9cd9661b47e1398e5a66dd8c158ad2a3aef00ff23682e60e3c2b947150"
            },
            "downloads": -1,
            "filename": "SuffixAutomaton-0.0.9.tar.gz",
            "has_sig": false,
            "md5_digest": "ffed929450807d5f1e36e5be79e03368",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.0",
            "size": 5645,
            "upload_time": "2022-12-22T14:00:05",
            "upload_time_iso_8601": "2022-12-22T14:00:05.170750Z",
            "url": "https://files.pythonhosted.org/packages/e8/30/97e3750a69c4f9ca1ac47af91098b35dc61b3cff1569772dab7ec28af3e8/SuffixAutomaton-0.0.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-22 14:00:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "laohur",
    "github_project": "SuffixAutomaton",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "suffixautomaton"
}
        
Elapsed time: 0.02291s