# TrieMatch
A multiple pattern matching library. Trying to implement some algorithms for matching many string patterns in the pattern.
This project is a WIP and I would like to receive your idea on improving it.
## Trie (Prefix Tree)
This class implements a prefix tree for matching the patterns.
It uses a normal Trir data structure, and Aho-Corasick is used for pattern search (`Trie.search`). Just make sure to use `Trie.link_nodes()`.
```python
import this
from triematch import Trie
zen_of_klingon = this.s
print(zen_of_klingon)
zen_of_klingon = zen_of_klingon.lower()
words = {
"gur": "the",
"mra": "zen",
"gur mra": "the zen",
"guna": "than",
"chevgl": "purity",
"Pbzcyrk": "complex",
}
wordset = Trie(words) ## or Trie(**words) like a dict initalization
## Similar behavior with dict object
"error" in wordset # Output: True
wordset.get('error') # Output: False
wordset.setdefault("error", "reebef") # Outpit: reebef
wordset
# Output: {'error': 'reebef', 'complex': 'Pbzcyrk', 'purity': 'chevgl', 'mra': 'zen', 'gur': 'the', 'gur mra': 'the zen'}
## Get list of all patterns which zen_of_klingon.strtswith(pattern)
list(wordset.match(zen_of_klingon))
# Output: [(3, 'the'), (7, 'the zen')]
## where wordset[zen_of_klingon[:3]] == 'the'
wordset.link_nodes() ## do this to speed up the search process
list(wordset.search(zen_of_klingon))
# Output: [(0, 3, 'the'), (0, 7, 'the zen'), (4, 7, 'zen'), (54, 58, 'than'), ...]
## where wordset[zen_of_klingon[4:7]] == 'zen'
## Compressed regex of Trie
wordset.to_regex()
'Pbzcyrk|chevgl|gu(?:na|r)|mra'
```
## Tuples as Trie keys
`TupleTrie` treats keys as tuples (instead of strings), so you can pass keys like tuple of numbers as keys.
```python
from triematch import TupleTrie
trie = TupleTrie()
trie[127,0,0,1] = "home"
trie[(8,8,8,8)] = "Google Public DNS"
trie["hello", "python"] = object()
list(trie.match((127,0,0,1,2,3)))
## Output; [(4, 'home')]
```
## Radix
A compressed prefix tree. This is a memory efficient data structure compared to `Trie` with same features (but current version is slower that Trie).
Raw data
{
"_id": null,
"home_page": null,
"name": "triematch",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "triematch, trie, prefix tree, tree, radix, aho-corasick, prefix-tree",
"author": "Jafar Khakpour",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/86/fd/c630886634ab421fa0832b5c675d0d69de78904714183ae01ce706a51d7b/triematch-0.0.1rc0.tar.gz",
"platform": null,
"description": "# TrieMatch\nA multiple pattern matching library. Trying to implement some algorithms for matching many string patterns in the pattern.\n\nThis project is a WIP and I would like to receive your idea on improving it.\n\n## Trie (Prefix Tree)\nThis class implements a prefix tree for matching the patterns.\n\nIt uses a normal Trir data structure, and Aho-Corasick is used for pattern search (`Trie.search`). Just make sure to use `Trie.link_nodes()`.\n\n```python\nimport this\nfrom triematch import Trie\n\nzen_of_klingon = this.s\nprint(zen_of_klingon)\n\nzen_of_klingon = zen_of_klingon.lower()\n\nwords = {\n \"gur\": \"the\",\n \"mra\": \"zen\",\n \"gur mra\": \"the zen\",\n \"guna\": \"than\",\n \"chevgl\": \"purity\",\n \"Pbzcyrk\": \"complex\",\n}\n\n\nwordset = Trie(words) ## or Trie(**words) like a dict initalization\n\n## Similar behavior with dict object\n\"error\" in wordset # Output: True\nwordset.get('error') # Output: False\nwordset.setdefault(\"error\", \"reebef\") # Outpit: reebef\n\nwordset\n# Output: {'error': 'reebef', 'complex': 'Pbzcyrk', 'purity': 'chevgl', 'mra': 'zen', 'gur': 'the', 'gur mra': 'the zen'}\n\n## Get list of all patterns which zen_of_klingon.strtswith(pattern)\nlist(wordset.match(zen_of_klingon))\n# Output: [(3, 'the'), (7, 'the zen')]\n## where wordset[zen_of_klingon[:3]] == 'the'\n\nwordset.link_nodes() ## do this to speed up the search process\nlist(wordset.search(zen_of_klingon))\n# Output: [(0, 3, 'the'), (0, 7, 'the zen'), (4, 7, 'zen'), (54, 58, 'than'), ...]\n## where wordset[zen_of_klingon[4:7]] == 'zen'\n\n## Compressed regex of Trie\nwordset.to_regex()\n'Pbzcyrk|chevgl|gu(?:na|r)|mra'\n```\n\n## Tuples as Trie keys\n`TupleTrie` treats keys as tuples (instead of strings), so you can pass keys like tuple of numbers as keys.\n\n```python\nfrom triematch import TupleTrie\n\ntrie = TupleTrie()\ntrie[127,0,0,1] = \"home\"\ntrie[(8,8,8,8)] = \"Google Public DNS\"\ntrie[\"hello\", \"python\"] = object()\n\nlist(trie.match((127,0,0,1,2,3)))\n## Output; [(4, 'home')]\n```\n\n## Radix\nA compressed prefix tree. This is a memory efficient data structure compared to `Trie` with same features (but current version is slower that Trie).\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Fast lookup for string patterns using Trie, Radix & Aho-Corasick algorithms",
"version": "0.0.1rc0",
"project_urls": null,
"split_keywords": [
"triematch",
" trie",
" prefix tree",
" tree",
" radix",
" aho-corasick",
" prefix-tree"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "74e9ee8ced7e0851cb6e60b479c92a039ea6cb431abc75b50d2d088b7ab56fe4",
"md5": "1201a2ff5e2d04b6687a2a6ca838a5e3",
"sha256": "704e06e712f47e573fa2efc4a956efd2de33b860bb00113548236316b724044b"
},
"downloads": -1,
"filename": "triematch-0.0.1rc0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1201a2ff5e2d04b6687a2a6ca838a5e3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 11462,
"upload_time": "2025-01-08T14:34:27",
"upload_time_iso_8601": "2025-01-08T14:34:27.104611Z",
"url": "https://files.pythonhosted.org/packages/74/e9/ee8ced7e0851cb6e60b479c92a039ea6cb431abc75b50d2d088b7ab56fe4/triematch-0.0.1rc0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "86fdc630886634ab421fa0832b5c675d0d69de78904714183ae01ce706a51d7b",
"md5": "f4a2ec7bd68131b01008380f4d221e23",
"sha256": "b2ee3affafd84aa839c24620da51a850acf97a5a4f0af5a861b24975f3d3c68c"
},
"downloads": -1,
"filename": "triematch-0.0.1rc0.tar.gz",
"has_sig": false,
"md5_digest": "f4a2ec7bd68131b01008380f4d221e23",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 12137,
"upload_time": "2025-01-08T14:34:31",
"upload_time_iso_8601": "2025-01-08T14:34:31.170220Z",
"url": "https://files.pythonhosted.org/packages/86/fd/c630886634ab421fa0832b5c675d0d69de78904714183ae01ce706a51d7b/triematch-0.0.1rc0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-08 14:34:31",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "triematch"
}