# SuffixAutomaton 后缀自动机
suffix automaton by words, to get text common substrings
## usage
> pip install SuffixAutomaton
```python
raw = """
ASE : International Conference on Automated Software Engineering
ESEC/FSE : ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
ICSE : International Conference on Software Engineering
ISSTA : The International Symposium on Software Testing and Analysis
"""
doc = raw.splitlines()
doc = [x for x in doc if x]
doc = [x.split() for x in doc]
from SuffixAutomaton import SuffixAutomaton,lcs1,lcs2
# tokenize in words
# longest
# [(['Software', 'Engineering'], 14, 5)]
print(lcs1(doc[1], doc[2]))
# [([':'], 1), (['on'], 4), (['Software'], 6)]
print(lcs2(doc[0], doc[1:4]))
# tokenize in chars
# all common substrings
poet = "江天一色无纤尘皎皎空中孤月轮 江畔何人初见月江月何年初照人 人生代代无穷已江月年年望相似 不知江月待何人但见长江送流水"
doc = poet.split()
[('江', 0, 10), ('何人', 2, 5), ('见', 5, 8), ('江月', 7, 2)]
print(lcs1(doc[1], doc[3], 1))
[('人', 0), ('江月', 7)]
print(lcs2(doc[2], doc[2:4], 1))
```
## feature
* suffix automaton [in words] 可分词后缀自动机
* [Longest] Common Substring of two lines 两文[最长]共串
* [Longest] Common Substring of document 多文[最长]共串
## inspired by
参照:https://www.cnblogs.com/shld/p/10444808.html
讲解:https://www.cnblogs.com/zjp-shadow/p/9218214.html
详解:https://www.cnblogs.com/1625--H/p/12416198.html
证明:https://oi-wiki.org/string/sam/
题解:https://www.cnblogs.com/Lyush/archive/2013/08/25/3281546.html https://www.cnblogs.com/mollnn/p/13175736.html
Raw data
{
"_id": null,
"home_page": "https://github.com/laohur/SuffixAutomaton",
"name": "SuffixAutomaton",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.0",
"maintainer_email": "",
"keywords": "Suffix Automaton,sam",
"author": "laohur",
"author_email": "laohur@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e8/30/97e3750a69c4f9ca1ac47af91098b35dc61b3cff1569772dab7ec28af3e8/SuffixAutomaton-0.0.9.tar.gz",
"platform": null,
"description": "\n# SuffixAutomaton \u540e\u7f00\u81ea\u52a8\u673a\nsuffix automaton by words, to get text common substrings\n\n\n## usage\n> pip install SuffixAutomaton \n\n```python\nraw = \"\"\"\n ASE : International Conference on Automated Software Engineering\n ESEC/FSE : ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering\n ICSE : International Conference on Software Engineering\n ISSTA : The International Symposium on Software Testing and Analysis\n \"\"\"\ndoc = raw.splitlines()\ndoc = [x for x in doc if x]\ndoc = [x.split() for x in doc]\n\nfrom SuffixAutomaton import SuffixAutomaton,lcs1,lcs2\n# tokenize in words\n# longest\n# [(['Software', 'Engineering'], 14, 5)]\nprint(lcs1(doc[1], doc[2]))\n# [([':'], 1), (['on'], 4), (['Software'], 6)]\nprint(lcs2(doc[0], doc[1:4]))\n\n# tokenize in chars\n# all common substrings\npoet = \"\u6c5f\u5929\u4e00\u8272\u65e0\u7ea4\u5c18\u768e\u768e\u7a7a\u4e2d\u5b64\u6708\u8f6e \u6c5f\u7554\u4f55\u4eba\u521d\u89c1\u6708\u6c5f\u6708\u4f55\u5e74\u521d\u7167\u4eba \u4eba\u751f\u4ee3\u4ee3\u65e0\u7a77\u5df2\u6c5f\u6708\u5e74\u5e74\u671b\u76f8\u4f3c \u4e0d\u77e5\u6c5f\u6708\u5f85\u4f55\u4eba\u4f46\u89c1\u957f\u6c5f\u9001\u6d41\u6c34\"\ndoc = poet.split()\n[('\u6c5f', 0, 10), ('\u4f55\u4eba', 2, 5), ('\u89c1', 5, 8), ('\u6c5f\u6708', 7, 2)]\nprint(lcs1(doc[1], doc[3], 1))\n[('\u4eba', 0), ('\u6c5f\u6708', 7)]\nprint(lcs2(doc[2], doc[2:4], 1))\n\n\n```\n\n## feature\n* suffix automaton [in words] \u53ef\u5206\u8bcd\u540e\u7f00\u81ea\u52a8\u673a\n* [Longest] Common Substring of two lines \u4e24\u6587[\u6700\u957f]\u5171\u4e32\n* [Longest] Common Substring of document \u591a\u6587[\u6700\u957f]\u5171\u4e32\n\n\n## inspired by \n \u53c2\u7167\uff1ahttps://www.cnblogs.com/shld/p/10444808.html\n \u8bb2\u89e3\uff1ahttps://www.cnblogs.com/zjp-shadow/p/9218214.html\n \u8be6\u89e3\uff1ahttps://www.cnblogs.com/1625--H/p/12416198.html\n \u8bc1\u660e\uff1ahttps://oi-wiki.org/string/sam/\n \u9898\u89e3\uff1ahttps://www.cnblogs.com/Lyush/archive/2013/08/25/3281546.html https://www.cnblogs.com/mollnn/p/13175736.html\n\n\n",
"bugtrack_url": null,
"license": "[Anti-996 License](https: // github.com/996icu/996.ICU/blob/master/LICENSE)",
"summary": "suffix automaton by words, to get text common substrings",
"version": "0.0.9",
"split_keywords": [
"suffix automaton",
"sam"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "ad2739ac25791989b90326faa52c3afd",
"sha256": "e64b1b30d3d938c37010b0528c21af991f0a798d366b0d6583946ccdded85ab6"
},
"downloads": -1,
"filename": "SuffixAutomaton-0.0.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ad2739ac25791989b90326faa52c3afd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.0",
"size": 5957,
"upload_time": "2022-12-22T14:00:03",
"upload_time_iso_8601": "2022-12-22T14:00:03.418315Z",
"url": "https://files.pythonhosted.org/packages/17/b0/be8571b74f17f277ecac2a48608521eec73d33ebe4db08434567a8d026da/SuffixAutomaton-0.0.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "ffed929450807d5f1e36e5be79e03368",
"sha256": "eda58a9cd9661b47e1398e5a66dd8c158ad2a3aef00ff23682e60e3c2b947150"
},
"downloads": -1,
"filename": "SuffixAutomaton-0.0.9.tar.gz",
"has_sig": false,
"md5_digest": "ffed929450807d5f1e36e5be79e03368",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.0",
"size": 5645,
"upload_time": "2022-12-22T14:00:05",
"upload_time_iso_8601": "2022-12-22T14:00:05.170750Z",
"url": "https://files.pythonhosted.org/packages/e8/30/97e3750a69c4f9ca1ac47af91098b35dc61b3cff1569772dab7ec28af3e8/SuffixAutomaton-0.0.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-12-22 14:00:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "laohur",
"github_project": "SuffixAutomaton",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "suffixautomaton"
}