# Precious Package
## Overview
The Precious package provides a minimal model showcasing three tokenizer-free approaches for natural language processing tasks. It includes implementations for T-FREE, CANINE, and byte-level embeddings, along with attention mechanisms for enhanced performance.
## Installation
### From PyPI (Recommended)
```bash
pip install precious-nlp
```
### From Source (Development)
```bash
git clone https://github.com/bimri/precious.git
cd precious
pip install -e .
```
### With Optional Dependencies
```bash
# For development tools
pip install precious-nlp[dev]
# For benchmarking
pip install precious-nlp[benchmarks]
# For documentation
pip install precious-nlp[docs]
# All optional dependencies
pip install precious-nlp[all]
```
## Quick Start
### Installation and Import
```bash
# Install the package
pip install precious-nlp
```
```python
# Import the package (note: install as 'precious-nlp', import as 'precious')
import precious
from precious import PreciousModel, PreciousConfig
```
## Usage
Here is a basic example of how to use the PreciousModel:
```python
import precious
from precious import PreciousModel, PreciousConfig
# Initialize the model with the desired configuration
config = PreciousConfig(mode="byte", d_model=256) # or "tfree", "canine"
model = PreciousModel(config)
# Prepare your input data
inputs = ["Hello, tokenizer-free world!"]
outputs = model(inputs)
# Access the logits
logits = outputs["logits"]
print(f"Output shape: {logits.shape}") # [batch_size, seq_len, vocab_size]
# Training with targets
targets = ["Hello, tokenizer-free universe!"]
outputs = model(inputs, targets=targets)
loss = outputs["loss"]
print(f"Training loss: {loss.item()}")
```
## Three Tokenizer-Free Approaches
### 1. Byte-Level Processing
```python
import precious
config = precious.PreciousConfig(mode="byte", d_model=256)
model = precious.PreciousModel(config)
# Processes text at byte level - universal and memory efficient
```
### 2. CANINE Approach
```python
import precious
config = precious.PreciousConfig(mode="canine", d_model=256)
model = precious.PreciousModel(config)
# Character-level processing with Unicode support
```
### 3. T-FREE Method
```python
import precious
config = precious.PreciousConfig(mode="tfree", d_model=256, tfree_vocab_v=8192)
model = precious.PreciousModel(config)
# Vocabulary-aware with character-level fallback
```
## Key Features
- ๐ **Three tokenizer-free approaches** in one unified library
- ๐ฏ **Production-ready** with comprehensive testing and documentation
- ๐ **Universal text support** - handles any Unicode text
- โก **Efficient processing** with configurable model architectures
- ๐งช **Research-friendly** with benchmarking and comparison tools
- ๐ **Well-documented** with extensive examples and API reference
## Quick Performance Comparison
| Mode | Memory | Speed | Best For |
|------|--------|-------|----------|
| Byte | Lowest | Fastest | General purpose, production |
| CANINE | Medium | Medium | Multilingual, character-aware |
| T-FREE | Highest | Research | Vocabulary analysis, interpretability |
## Documentation
For complete documentation, visit the [docs directory](docs/) or browse individual guides:
- ๐ [API Reference](docs/API_REFERENCE.md) - Complete API documentation
- ๐ [Examples](docs/EXAMPLES.md) - From basic to advanced usage
## Requirements
- Python >= 3.8
- PyTorch >= 1.9.0
- NumPy >= 1.19.0
## Contributing
Contributions are welcome! Please follow these steps to contribute:
1. Fork the repository.
2. Create a new branch for your feature or bug fix.
3. Make your changes and commit them.
4. Push your branch and create a pull request.
## License
This project is licensed under the MIT License. See the LICENSE file for more details.
Raw data
{
"_id": null,
"home_page": "https://github.com/bimri/precious",
"name": "precious-nlp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "bimri <bimri@outlook.com>",
"keywords": "tokenization, nlp, transformers, tokenizer-free, canine, tfree, byte-level, natural-language-processing, deep-learning, pytorch",
"author": "bimri",
"author_email": "bimri <bimri@outlook.com>",
"download_url": "https://files.pythonhosted.org/packages/f8/f4/c6e405ec4b536af7ff4b13292997ea740cc588bc8aeb81cdde41d6b2edd9/precious_nlp-0.1.2.tar.gz",
"platform": null,
"description": "# Precious Package\n\n## Overview\nThe Precious package provides a minimal model showcasing three tokenizer-free approaches for natural language processing tasks. It includes implementations for T-FREE, CANINE, and byte-level embeddings, along with attention mechanisms for enhanced performance.\n\n## Installation\n\n### From PyPI (Recommended)\n```bash\npip install precious-nlp\n```\n\n### From Source (Development)\n```bash\ngit clone https://github.com/bimri/precious.git\ncd precious\npip install -e .\n```\n\n### With Optional Dependencies\n```bash\n# For development tools\npip install precious-nlp[dev]\n\n# For benchmarking\npip install precious-nlp[benchmarks]\n\n# For documentation\npip install precious-nlp[docs]\n\n# All optional dependencies\npip install precious-nlp[all]\n```\n\n## Quick Start\n\n### Installation and Import\n```bash\n# Install the package\npip install precious-nlp\n```\n\n```python\n# Import the package (note: install as 'precious-nlp', import as 'precious')\nimport precious\nfrom precious import PreciousModel, PreciousConfig\n```\n\n## Usage\nHere is a basic example of how to use the PreciousModel:\n\n```python\nimport precious\nfrom precious import PreciousModel, PreciousConfig\n\n# Initialize the model with the desired configuration\nconfig = PreciousConfig(mode=\"byte\", d_model=256) # or \"tfree\", \"canine\"\nmodel = PreciousModel(config)\n\n# Prepare your input data\ninputs = [\"Hello, tokenizer-free world!\"]\noutputs = model(inputs)\n\n# Access the logits\nlogits = outputs[\"logits\"]\nprint(f\"Output shape: {logits.shape}\") # [batch_size, seq_len, vocab_size]\n\n# Training with targets\ntargets = [\"Hello, tokenizer-free universe!\"]\noutputs = model(inputs, targets=targets)\nloss = outputs[\"loss\"]\nprint(f\"Training loss: {loss.item()}\")\n```\n\n## Three Tokenizer-Free Approaches\n\n### 1. Byte-Level Processing\n```python\nimport precious\nconfig = precious.PreciousConfig(mode=\"byte\", d_model=256)\nmodel = precious.PreciousModel(config)\n# Processes text at byte level - universal and memory efficient\n```\n\n### 2. CANINE Approach\n```python\nimport precious\nconfig = precious.PreciousConfig(mode=\"canine\", d_model=256)\nmodel = precious.PreciousModel(config)\n# Character-level processing with Unicode support\n```\n\n### 3. T-FREE Method\n```python\nimport precious\nconfig = precious.PreciousConfig(mode=\"tfree\", d_model=256, tfree_vocab_v=8192)\nmodel = precious.PreciousModel(config)\n# Vocabulary-aware with character-level fallback\n```\n\n## Key Features\n\n- \ud83d\ude80 **Three tokenizer-free approaches** in one unified library\n- \ud83c\udfaf **Production-ready** with comprehensive testing and documentation \n- \ud83c\udf0d **Universal text support** - handles any Unicode text\n- \u26a1 **Efficient processing** with configurable model architectures\n- \ud83e\uddea **Research-friendly** with benchmarking and comparison tools\n- \ud83d\udcda **Well-documented** with extensive examples and API reference\n\n## Quick Performance Comparison\n\n| Mode | Memory | Speed | Best For |\n|------|--------|-------|----------|\n| Byte | Lowest | Fastest | General purpose, production |\n| CANINE | Medium | Medium | Multilingual, character-aware |\n| T-FREE | Highest | Research | Vocabulary analysis, interpretability |\n\n## Documentation\n\nFor complete documentation, visit the [docs directory](docs/) or browse individual guides:\n\n- \ud83d\udcd6 [API Reference](docs/API_REFERENCE.md) - Complete API documentation\n- \ud83d\udcdd [Examples](docs/EXAMPLES.md) - From basic to advanced usage\n\n## Requirements\n\n- Python >= 3.8\n- PyTorch >= 1.9.0\n- NumPy >= 1.19.0\n\n## Contributing\nContributions are welcome! Please follow these steps to contribute:\n\n1. Fork the repository.\n2. Create a new branch for your feature or bug fix.\n3. Make your changes and commit them.\n4. Push your branch and create a pull request.\n\n## License\nThis project is licensed under the MIT License. See the LICENSE file for more details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A tokenizer-free NLP library with T-FREE, CANINE, and byte-level approaches",
"version": "0.1.2",
"project_urls": {
"Bug Reports": "https://github.com/bimri/precious/issues",
"Changelog": "https://github.com/bimri/precious/blob/master/CHANGELOG.md",
"Documentation": "https://github.com/bimri/precious/blob/master/docs/API_REFERENCE.md",
"Homepage": "https://github.com/bimri/precious",
"Repository": "https://github.com/bimri/precious"
},
"split_keywords": [
"tokenization",
" nlp",
" transformers",
" tokenizer-free",
" canine",
" tfree",
" byte-level",
" natural-language-processing",
" deep-learning",
" pytorch"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "42dee8991113d1b107d0db1f8207e09d8d53b2b66a8c615f490f6d7af7c3c2ae",
"md5": "8b0c24457af58f3b1de5fc7c3f4ef079",
"sha256": "d96d7c497d24cab668f6348a83ecf18c8bf8bd061ff6db7d9dca4fea73ec307f"
},
"downloads": -1,
"filename": "precious_nlp-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8b0c24457af58f3b1de5fc7c3f4ef079",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 10994,
"upload_time": "2025-09-02T11:28:20",
"upload_time_iso_8601": "2025-09-02T11:28:20.301076Z",
"url": "https://files.pythonhosted.org/packages/42/de/e8991113d1b107d0db1f8207e09d8d53b2b66a8c615f490f6d7af7c3c2ae/precious_nlp-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f8f4c6e405ec4b536af7ff4b13292997ea740cc588bc8aeb81cdde41d6b2edd9",
"md5": "a337d4b0198f8819a242caa78ca37747",
"sha256": "a62709d5602b2355d574432ac2f6470d3da07c629f0d03d0b33c874acdf88b17"
},
"downloads": -1,
"filename": "precious_nlp-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "a337d4b0198f8819a242caa78ca37747",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 17378,
"upload_time": "2025-09-02T11:28:22",
"upload_time_iso_8601": "2025-09-02T11:28:22.035913Z",
"url": "https://files.pythonhosted.org/packages/f8/f4/c6e405ec4b536af7ff4b13292997ea740cc588bc8aeb81cdde41d6b2edd9/precious_nlp-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-02 11:28:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "bimri",
"github_project": "precious",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "torch",
"specs": [
[
">=",
"1.9.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.19.0"
]
]
},
{
"name": "setuptools",
"specs": [
[
">=",
"45"
]
]
},
{
"name": "wheel",
"specs": []
},
{
"name": "build",
"specs": []
}
],
"lcname": "precious-nlp"
}