# ngram-trie
`ngram-trie` is a Rust library designed to efficiently handle n-gram data structures using a trie-based approach. It provides functionalities for fitting, saving, loading, and querying n-gram models, with support for various smoothing techniques.
## Installation Rust
1. Include it in the Cargo.toml:
```toml
[dependencies]
ngram-trie = { git = "https://github.com/behappiness/ngram-trie" }
```
## Installation Python
1. Install from pip:
```bash
pip install ngram-trie
```
## Example Usage
```python
from ngram_trie import PySmoothedTrie
trie = PySmoothedTrie(n_gram_max_length=7, root_capacity=None)
trie.fit(tokenized_data, n_gram_max_length=7, root_capacity=None, max_tokens=None)
trie.set_rule_set(["++++++", "+++++", "++++", "+++", "++", "+"])
trie.fit_smoothing()
trie.get_prediction_probabilities(tokenized_context)
```
#### Specify the smoothing
```python
trie.fit_smoothing("modified_kneser_ney"/"stupid_backoff")
```
#### Unsmoothed
```python
from ngram_trie import PySmoothedTrie
trie = PySmoothedTrie(n_gram_max_length=7, root_capacity=None)
trie.fit(tokenized_data, n_gram_max_length=7, root_capacity=None, max_tokens=None)
trie.set_rule_set(rules)
trie.get_unsmoothed_probabilities(tokenized_context)
```
## Dev
```bash
cargo add pyo3 --features extension-module
```
#### Build wheel
```bash
maturin build
```
Raw data
{
"_id": null,
"home_page": null,
"name": "ngram-trie",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "Botond Lov\u00e1sz <botilovasz@gmail.com>",
"keywords": null,
"author": "Botond Lov\u00e1sz",
"author_email": "botilovasz@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/64/77/d87440b8aa94fe96c4fd638a12bebbd999502d797795b7bf5cb45b237cfb/ngram_trie-1.2.6.tar.gz",
"platform": null,
"description": "# ngram-trie\r\n\r\n`ngram-trie` is a Rust library designed to efficiently handle n-gram data structures using a trie-based approach. It provides functionalities for fitting, saving, loading, and querying n-gram models, with support for various smoothing techniques.\r\n\r\n## Installation Rust\r\n\r\n1. Include it in the Cargo.toml:\r\n\r\n ```toml\r\n [dependencies]\r\n ngram-trie = { git = \"https://github.com/behappiness/ngram-trie\" }\r\n ```\r\n\r\n## Installation Python\r\n\r\n1. Install from pip:\r\n\r\n ```bash\r\n pip install ngram-trie\r\n ```\r\n\r\n\r\n## Example Usage\r\n```python\r\nfrom ngram_trie import PySmoothedTrie\r\n\r\ntrie = PySmoothedTrie(n_gram_max_length=7, root_capacity=None)\r\n\r\ntrie.fit(tokenized_data, n_gram_max_length=7, root_capacity=None, max_tokens=None)\r\n\r\ntrie.set_rule_set([\"++++++\", \"+++++\", \"++++\", \"+++\", \"++\", \"+\"])\r\n\r\ntrie.fit_smoothing()\r\n\r\ntrie.get_prediction_probabilities(tokenized_context)\r\n```\r\n\r\n#### Specify the smoothing\r\n\r\n```python\r\ntrie.fit_smoothing(\"modified_kneser_ney\"/\"stupid_backoff\")\r\n```\r\n\r\n#### Unsmoothed\r\n\r\n```python\r\nfrom ngram_trie import PySmoothedTrie\r\n\r\ntrie = PySmoothedTrie(n_gram_max_length=7, root_capacity=None)\r\n\r\ntrie.fit(tokenized_data, n_gram_max_length=7, root_capacity=None, max_tokens=None)\r\n\r\ntrie.set_rule_set(rules)\r\n\r\ntrie.get_unsmoothed_probabilities(tokenized_context)\r\n```\r\n\r\n## Dev\r\n```bash\r\ncargo add pyo3 --features extension-module\r\n```\r\n\r\n#### Build wheel\r\n```bash\r\nmaturin build\r\n```\r\n\r\n\r\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Rust-based n-gram trie library for Python",
"version": "1.2.6",
"project_urls": {
"Repository": "https://github.com/behappiness/ngram-trie"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "644c3638ff82357262644637892df5a6ed3beab85aa1a0ece643a17cefa4a8a7",
"md5": "ca677b37bd360db6d3464c627d431e2a",
"sha256": "530c28dd14793cd660b79c34558a80401aed0b908e0af29d123efbedbf919de4"
},
"downloads": -1,
"filename": "ngram_trie-1.2.6-cp312-cp312-manylinux_2_34_x86_64.whl",
"has_sig": false,
"md5_digest": "ca677b37bd360db6d3464c627d431e2a",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.6",
"size": 657555,
"upload_time": "2024-12-08T04:35:54",
"upload_time_iso_8601": "2024-12-08T04:35:54.750529Z",
"url": "https://files.pythonhosted.org/packages/64/4c/3638ff82357262644637892df5a6ed3beab85aa1a0ece643a17cefa4a8a7/ngram_trie-1.2.6-cp312-cp312-manylinux_2_34_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6477d87440b8aa94fe96c4fd638a12bebbd999502d797795b7bf5cb45b237cfb",
"md5": "5318592bdeaad4a937d9eb356f26138c",
"sha256": "8599f3c87f77748097d1eebf56f368777e423705ef8552d5526816e4587b35d1"
},
"downloads": -1,
"filename": "ngram_trie-1.2.6.tar.gz",
"has_sig": false,
"md5_digest": "5318592bdeaad4a937d9eb356f26138c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 31678,
"upload_time": "2024-12-08T04:35:56",
"upload_time_iso_8601": "2024-12-08T04:35:56.738254Z",
"url": "https://files.pythonhosted.org/packages/64/77/d87440b8aa94fe96c4fd638a12bebbd999502d797795b7bf5cb45b237cfb/ngram_trie-1.2.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-08 04:35:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "behappiness",
"github_project": "ngram-trie",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "maturin",
"specs": []
},
{
"name": "pyo3",
"specs": []
},
{
"name": "pyo3-log",
"specs": []
}
],
"lcname": "ngram-trie"
}