precious-nlp

Name	precious-nlp JSON
Version	0.1.2 JSON
	download
home_page	https://github.com/bimri/precious
Summary	A tokenizer-free NLP library with T-FREE, CANINE, and byte-level approaches
upload_time	2025-09-02 11:28:22
maintainer	None
docs_url	None
author	bimri
requires_python	>=3.8
license	MIT
keywords	tokenization nlp transformers tokenizer-free canine tfree byte-level natural-language-processing deep-learning pytorch
VCS
bugtrack_url
requirements	torch numpy setuptools wheel build
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Precious Package

## Overview
The Precious package provides a minimal model showcasing three tokenizer-free approaches for natural language processing tasks. It includes implementations for T-FREE, CANINE, and byte-level embeddings, along with attention mechanisms for enhanced performance.

## Installation

### From PyPI (Recommended)
```bash
pip install precious-nlp
```

### From Source (Development)
```bash
git clone https://github.com/bimri/precious.git
cd precious
pip install -e .
```

### With Optional Dependencies
```bash
# For development tools
pip install precious-nlp[dev]

# For benchmarking
pip install precious-nlp[benchmarks]

# For documentation
pip install precious-nlp[docs]

# All optional dependencies
pip install precious-nlp[all]
```

## Quick Start

### Installation and Import
```bash
# Install the package
pip install precious-nlp
```

```python
# Import the package (note: install as 'precious-nlp', import as 'precious')
import precious
from precious import PreciousModel, PreciousConfig
```

## Usage
Here is a basic example of how to use the PreciousModel:

```python
import precious
from precious import PreciousModel, PreciousConfig

# Initialize the model with the desired configuration
config = PreciousConfig(mode="byte", d_model=256)  # or "tfree", "canine"
model = PreciousModel(config)

# Prepare your input data
inputs = ["Hello, tokenizer-free world!"]
outputs = model(inputs)

# Access the logits
logits = outputs["logits"]
print(f"Output shape: {logits.shape}")  # [batch_size, seq_len, vocab_size]

# Training with targets
targets = ["Hello, tokenizer-free universe!"]
outputs = model(inputs, targets=targets)
loss = outputs["loss"]
print(f"Training loss: {loss.item()}")
```

## Three Tokenizer-Free Approaches

### 1. Byte-Level Processing
```python
import precious
config = precious.PreciousConfig(mode="byte", d_model=256)
model = precious.PreciousModel(config)
# Processes text at byte level - universal and memory efficient
```

### 2. CANINE Approach
```python
import precious
config = precious.PreciousConfig(mode="canine", d_model=256)
model = precious.PreciousModel(config)
# Character-level processing with Unicode support
```

### 3. T-FREE Method
```python
import precious
config = precious.PreciousConfig(mode="tfree", d_model=256, tfree_vocab_v=8192)
model = precious.PreciousModel(config)
# Vocabulary-aware with character-level fallback
```

## Key Features

- 🚀 **Three tokenizer-free approaches** in one unified library
- 🎯 **Production-ready** with comprehensive testing and documentation  
- 🌍 **Universal text support** - handles any Unicode text
- ⚡ **Efficient processing** with configurable model architectures
- 🧪 **Research-friendly** with benchmarking and comparison tools
- 📚 **Well-documented** with extensive examples and API reference

## Quick Performance Comparison

| Mode | Memory | Speed | Best For |
|------|--------|-------|----------|
| Byte | Lowest | Fastest | General purpose, production |
| CANINE | Medium | Medium | Multilingual, character-aware |
| T-FREE | Highest | Research | Vocabulary analysis, interpretability |

## Documentation

For complete documentation, visit the [docs directory](docs/) or browse individual guides:

- 📖 [API Reference](docs/API_REFERENCE.md) - Complete API documentation
- 📝 [Examples](docs/EXAMPLES.md) - From basic to advanced usage

## Requirements

- Python >= 3.8
- PyTorch >= 1.9.0
- NumPy >= 1.19.0

## Contributing
Contributions are welcome! Please follow these steps to contribute:

1. Fork the repository.
2. Create a new branch for your feature or bug fix.
3. Make your changes and commit them.
4. Push your branch and create a pull request.

## License
This project is licensed under the MIT License. See the LICENSE file for more details.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bimri/precious",
    "name": "precious-nlp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "bimri <bimri@outlook.com>",
    "keywords": "tokenization, nlp, transformers, tokenizer-free, canine, tfree, byte-level, natural-language-processing, deep-learning, pytorch",
    "author": "bimri",
    "author_email": "bimri <bimri@outlook.com>",
    "download_url": "https://files.pythonhosted.org/packages/f8/f4/c6e405ec4b536af7ff4b13292997ea740cc588bc8aeb81cdde41d6b2edd9/precious_nlp-0.1.2.tar.gz",
    "platform": null,
    "description": "# Precious Package\n\n## Overview\nThe Precious package provides a minimal model showcasing three tokenizer-free approaches for natural language processing tasks. It includes implementations for T-FREE, CANINE, and byte-level embeddings, along with attention mechanisms for enhanced performance.\n\n## Installation\n\n### From PyPI (Recommended)\n```bash\npip install precious-nlp\n```\n\n### From Source (Development)\n```bash\ngit clone https://github.com/bimri/precious.git\ncd precious\npip install -e .\n```\n\n### With Optional Dependencies\n```bash\n# For development tools\npip install precious-nlp[dev]\n\n# For benchmarking\npip install precious-nlp[benchmarks]\n\n# For documentation\npip install precious-nlp[docs]\n\n# All optional dependencies\npip install precious-nlp[all]\n```\n\n## Quick Start\n\n### Installation and Import\n```bash\n# Install the package\npip install precious-nlp\n```\n\n```python\n# Import the package (note: install as 'precious-nlp', import as 'precious')\nimport precious\nfrom precious import PreciousModel, PreciousConfig\n```\n\n## Usage\nHere is a basic example of how to use the PreciousModel:\n\n```python\nimport precious\nfrom precious import PreciousModel, PreciousConfig\n\n# Initialize the model with the desired configuration\nconfig = PreciousConfig(mode=\"byte\", d_model=256)  # or \"tfree\", \"canine\"\nmodel = PreciousModel(config)\n\n# Prepare your input data\ninputs = [\"Hello, tokenizer-free world!\"]\noutputs = model(inputs)\n\n# Access the logits\nlogits = outputs[\"logits\"]\nprint(f\"Output shape: {logits.shape}\")  # [batch_size, seq_len, vocab_size]\n\n# Training with targets\ntargets = [\"Hello, tokenizer-free universe!\"]\noutputs = model(inputs, targets=targets)\nloss = outputs[\"loss\"]\nprint(f\"Training loss: {loss.item()}\")\n```\n\n## Three Tokenizer-Free Approaches\n\n### 1. Byte-Level Processing\n```python\nimport precious\nconfig = precious.PreciousConfig(mode=\"byte\", d_model=256)\nmodel = precious.PreciousModel(config)\n# Processes text at byte level - universal and memory efficient\n```\n\n### 2. CANINE Approach\n```python\nimport precious\nconfig = precious.PreciousConfig(mode=\"canine\", d_model=256)\nmodel = precious.PreciousModel(config)\n# Character-level processing with Unicode support\n```\n\n### 3. T-FREE Method\n```python\nimport precious\nconfig = precious.PreciousConfig(mode=\"tfree\", d_model=256, tfree_vocab_v=8192)\nmodel = precious.PreciousModel(config)\n# Vocabulary-aware with character-level fallback\n```\n\n## Key Features\n\n- \ud83d\ude80 **Three tokenizer-free approaches** in one unified library\n- \ud83c\udfaf **Production-ready** with comprehensive testing and documentation  \n- \ud83c\udf0d **Universal text support** - handles any Unicode text\n- \u26a1 **Efficient processing** with configurable model architectures\n- \ud83e\uddea **Research-friendly** with benchmarking and comparison tools\n- \ud83d\udcda **Well-documented** with extensive examples and API reference\n\n## Quick Performance Comparison\n\n| Mode | Memory | Speed | Best For |\n|------|--------|-------|----------|\n| Byte | Lowest | Fastest | General purpose, production |\n| CANINE | Medium | Medium | Multilingual, character-aware |\n| T-FREE | Highest | Research | Vocabulary analysis, interpretability |\n\n## Documentation\n\nFor complete documentation, visit the [docs directory](docs/) or browse individual guides:\n\n- \ud83d\udcd6 [API Reference](docs/API_REFERENCE.md) - Complete API documentation\n- \ud83d\udcdd [Examples](docs/EXAMPLES.md) - From basic to advanced usage\n\n## Requirements\n\n- Python >= 3.8\n- PyTorch >= 1.9.0\n- NumPy >= 1.19.0\n\n## Contributing\nContributions are welcome! Please follow these steps to contribute:\n\n1. Fork the repository.\n2. Create a new branch for your feature or bug fix.\n3. Make your changes and commit them.\n4. Push your branch and create a pull request.\n\n## License\nThis project is licensed under the MIT License. See the LICENSE file for more details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A tokenizer-free NLP library with T-FREE, CANINE, and byte-level approaches",
    "version": "0.1.2",
    "project_urls": {
        "Bug Reports": "https://github.com/bimri/precious/issues",
        "Changelog": "https://github.com/bimri/precious/blob/master/CHANGELOG.md",
        "Documentation": "https://github.com/bimri/precious/blob/master/docs/API_REFERENCE.md",
        "Homepage": "https://github.com/bimri/precious",
        "Repository": "https://github.com/bimri/precious"
    },
    "split_keywords": [
        "tokenization",
        " nlp",
        " transformers",
        " tokenizer-free",
        " canine",
        " tfree",
        " byte-level",
        " natural-language-processing",
        " deep-learning",
        " pytorch"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "42dee8991113d1b107d0db1f8207e09d8d53b2b66a8c615f490f6d7af7c3c2ae",
                "md5": "8b0c24457af58f3b1de5fc7c3f4ef079",
                "sha256": "d96d7c497d24cab668f6348a83ecf18c8bf8bd061ff6db7d9dca4fea73ec307f"
            },
            "downloads": -1,
            "filename": "precious_nlp-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8b0c24457af58f3b1de5fc7c3f4ef079",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 10994,
            "upload_time": "2025-09-02T11:28:20",
            "upload_time_iso_8601": "2025-09-02T11:28:20.301076Z",
            "url": "https://files.pythonhosted.org/packages/42/de/e8991113d1b107d0db1f8207e09d8d53b2b66a8c615f490f6d7af7c3c2ae/precious_nlp-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f8f4c6e405ec4b536af7ff4b13292997ea740cc588bc8aeb81cdde41d6b2edd9",
                "md5": "a337d4b0198f8819a242caa78ca37747",
                "sha256": "a62709d5602b2355d574432ac2f6470d3da07c629f0d03d0b33c874acdf88b17"
            },
            "downloads": -1,
            "filename": "precious_nlp-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a337d4b0198f8819a242caa78ca37747",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 17378,
            "upload_time": "2025-09-02T11:28:22",
            "upload_time_iso_8601": "2025-09-02T11:28:22.035913Z",
            "url": "https://files.pythonhosted.org/packages/f8/f4/c6e405ec4b536af7ff4b13292997ea740cc588bc8aeb81cdde41d6b2edd9/precious_nlp-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-02 11:28:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bimri",
    "github_project": "precious",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "1.9.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.19.0"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    ">=",
                    "45"
                ]
            ]
        },
        {
            "name": "wheel",
            "specs": []
        },
        {
            "name": "build",
            "specs": []
        }
    ],
    "lcname": "precious-nlp"
}

bimri