triematch


Nametriematch JSON
Version 0.0.1rc0 PyPI version JSON
download
home_pageNone
SummaryFast lookup for string patterns using Trie, Radix & Aho-Corasick algorithms
upload_time2025-01-08 14:34:31
maintainerNone
docs_urlNone
authorJafar Khakpour
requires_python>=3.9
licenseMIT License
keywords triematch trie prefix tree tree radix aho-corasick prefix-tree
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TrieMatch
A multiple pattern matching library. Trying to implement some algorithms for matching many string patterns in the pattern.

This project is a WIP and I would like to receive your idea on improving it.

## Trie (Prefix Tree)
This class implements a prefix tree for matching the patterns.

It uses a normal Trir data structure, and Aho-Corasick is used for pattern search (`Trie.search`). Just make sure to use `Trie.link_nodes()`.

```python
import this
from triematch import Trie

zen_of_klingon = this.s
print(zen_of_klingon)

zen_of_klingon = zen_of_klingon.lower()

words = {
    "gur": "the",
    "mra": "zen",
    "gur mra": "the zen",
    "guna": "than",
    "chevgl": "purity",
    "Pbzcyrk": "complex",
}


wordset = Trie(words) ## or Trie(**words) like a dict initalization

## Similar behavior with dict object
"error" in wordset # Output: True
wordset.get('error') # Output: False
wordset.setdefault("error", "reebef") # Outpit: reebef

wordset
# Output: {'error': 'reebef', 'complex': 'Pbzcyrk', 'purity': 'chevgl', 'mra': 'zen', 'gur': 'the', 'gur mra': 'the zen'}

## Get list of all patterns which zen_of_klingon.strtswith(pattern)
list(wordset.match(zen_of_klingon))
# Output: [(3, 'the'), (7, 'the zen')]
## where wordset[zen_of_klingon[:3]] == 'the'

wordset.link_nodes() ## do this to speed up the search process
list(wordset.search(zen_of_klingon))
# Output: [(0, 3, 'the'), (0, 7, 'the zen'), (4, 7, 'zen'), (54, 58, 'than'), ...]
## where wordset[zen_of_klingon[4:7]] == 'zen'

## Compressed regex of Trie
wordset.to_regex()
'Pbzcyrk|chevgl|gu(?:na|r)|mra'
```

## Tuples as Trie keys
`TupleTrie` treats keys as tuples (instead of strings), so you can pass keys like tuple of numbers as keys.

```python
from triematch import TupleTrie

trie = TupleTrie()
trie[127,0,0,1] = "home"
trie[(8,8,8,8)] = "Google Public DNS"
trie["hello", "python"] = object()

list(trie.match((127,0,0,1,2,3)))
## Output; [(4, 'home')]
```

## Radix
A compressed prefix tree. This is a memory efficient data structure compared to `Trie` with same features (but current version is slower that Trie).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "triematch",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "triematch, trie, prefix tree, tree, radix, aho-corasick, prefix-tree",
    "author": "Jafar Khakpour",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/86/fd/c630886634ab421fa0832b5c675d0d69de78904714183ae01ce706a51d7b/triematch-0.0.1rc0.tar.gz",
    "platform": null,
    "description": "# TrieMatch\nA multiple pattern matching library. Trying to implement some algorithms for matching many string patterns in the pattern.\n\nThis project is a WIP and I would like to receive your idea on improving it.\n\n## Trie (Prefix Tree)\nThis class implements a prefix tree for matching the patterns.\n\nIt uses a normal Trir data structure, and Aho-Corasick is used for pattern search (`Trie.search`). Just make sure to use `Trie.link_nodes()`.\n\n```python\nimport this\nfrom triematch import Trie\n\nzen_of_klingon = this.s\nprint(zen_of_klingon)\n\nzen_of_klingon = zen_of_klingon.lower()\n\nwords = {\n    \"gur\": \"the\",\n    \"mra\": \"zen\",\n    \"gur mra\": \"the zen\",\n    \"guna\": \"than\",\n    \"chevgl\": \"purity\",\n    \"Pbzcyrk\": \"complex\",\n}\n\n\nwordset = Trie(words) ## or Trie(**words) like a dict initalization\n\n## Similar behavior with dict object\n\"error\" in wordset # Output: True\nwordset.get('error') # Output: False\nwordset.setdefault(\"error\", \"reebef\") # Outpit: reebef\n\nwordset\n# Output: {'error': 'reebef', 'complex': 'Pbzcyrk', 'purity': 'chevgl', 'mra': 'zen', 'gur': 'the', 'gur mra': 'the zen'}\n\n## Get list of all patterns which zen_of_klingon.strtswith(pattern)\nlist(wordset.match(zen_of_klingon))\n# Output: [(3, 'the'), (7, 'the zen')]\n## where wordset[zen_of_klingon[:3]] == 'the'\n\nwordset.link_nodes() ## do this to speed up the search process\nlist(wordset.search(zen_of_klingon))\n# Output: [(0, 3, 'the'), (0, 7, 'the zen'), (4, 7, 'zen'), (54, 58, 'than'), ...]\n## where wordset[zen_of_klingon[4:7]] == 'zen'\n\n## Compressed regex of Trie\nwordset.to_regex()\n'Pbzcyrk|chevgl|gu(?:na|r)|mra'\n```\n\n## Tuples as Trie keys\n`TupleTrie` treats keys as tuples (instead of strings), so you can pass keys like tuple of numbers as keys.\n\n```python\nfrom triematch import TupleTrie\n\ntrie = TupleTrie()\ntrie[127,0,0,1] = \"home\"\ntrie[(8,8,8,8)] = \"Google Public DNS\"\ntrie[\"hello\", \"python\"] = object()\n\nlist(trie.match((127,0,0,1,2,3)))\n## Output; [(4, 'home')]\n```\n\n## Radix\nA compressed prefix tree. This is a memory efficient data structure compared to `Trie` with same features (but current version is slower that Trie).\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Fast lookup for string patterns using Trie, Radix & Aho-Corasick algorithms",
    "version": "0.0.1rc0",
    "project_urls": null,
    "split_keywords": [
        "triematch",
        " trie",
        " prefix tree",
        " tree",
        " radix",
        " aho-corasick",
        " prefix-tree"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "74e9ee8ced7e0851cb6e60b479c92a039ea6cb431abc75b50d2d088b7ab56fe4",
                "md5": "1201a2ff5e2d04b6687a2a6ca838a5e3",
                "sha256": "704e06e712f47e573fa2efc4a956efd2de33b860bb00113548236316b724044b"
            },
            "downloads": -1,
            "filename": "triematch-0.0.1rc0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1201a2ff5e2d04b6687a2a6ca838a5e3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 11462,
            "upload_time": "2025-01-08T14:34:27",
            "upload_time_iso_8601": "2025-01-08T14:34:27.104611Z",
            "url": "https://files.pythonhosted.org/packages/74/e9/ee8ced7e0851cb6e60b479c92a039ea6cb431abc75b50d2d088b7ab56fe4/triematch-0.0.1rc0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "86fdc630886634ab421fa0832b5c675d0d69de78904714183ae01ce706a51d7b",
                "md5": "f4a2ec7bd68131b01008380f4d221e23",
                "sha256": "b2ee3affafd84aa839c24620da51a850acf97a5a4f0af5a861b24975f3d3c68c"
            },
            "downloads": -1,
            "filename": "triematch-0.0.1rc0.tar.gz",
            "has_sig": false,
            "md5_digest": "f4a2ec7bd68131b01008380f4d221e23",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 12137,
            "upload_time": "2025-01-08T14:34:31",
            "upload_time_iso_8601": "2025-01-08T14:34:31.170220Z",
            "url": "https://files.pythonhosted.org/packages/86/fd/c630886634ab421fa0832b5c675d0d69de78904714183ae01ce706a51d7b/triematch-0.0.1rc0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-08 14:34:31",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "triematch"
}
        
Elapsed time: 1.13884s