torchtextclassifiers

Name	torchtextclassifiers JSON
Version	0.0.1 JSON
	download
home_page	None
Summary	An implementation of the https://github.com/facebookresearch/fastText supervised learning algorithm for text classification using Pytorch.
upload_time	2025-07-30 05:25:38
maintainer	None
docs_url	None
author	Tom Seimandi, Julien Pramil, Meilame Tayebjee, Cédric Couralet
requires_python	>=3.11
license	None
keywords	fasttext text classification nlp automatic coding deep learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # torchTextClassifiers

A unified, extensible framework for text classification built on [PyTorch](https://pytorch.org/) and [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/).

## 🚀 Features

- **Unified API**: Consistent interface for different classifier wrappers
- **Extensible**: Easy to add new classifier implementations through wrapper pattern
- **FastText Support**: Built-in FastText classifier with n-gram tokenization
- **Flexible Preprocessing**: Each classifier can implement its own text preprocessing approach
- **PyTorch Lightning**: Automated training with callbacks, early stopping, and logging


## 📦 Installation

```bash
# Clone the repository
git clone https://github.com/InseeFrLab/torchTextClassifiers.git
cd torchtextClassifiers

# Install with uv (recommended)
uv sync

# Or install with pip
pip install -e .
```

## 🎯 Quick Start

### Basic FastText Classification

```python
import numpy as np
from torchTextClassifiers import create_fasttext

# Create a FastText classifier
classifier = create_fasttext(
    embedding_dim=100,
    sparse=False,
    num_tokens=10000,
    min_count=2,
    min_n=3,
    max_n=6,
    len_word_ngrams=2,
    num_classes=2
)

# Prepare your data
X_train = np.array([
    "This is a positive example",
    "This is a negative example",
    "Another positive case",
    "Another negative case"
])
y_train = np.array([1, 0, 1, 0])

X_val = np.array([
    "Validation positive",
    "Validation negative"
])
y_val = np.array([1, 0])

# Build the model
classifier.build(X_train, y_train)

# Train the model
classifier.train(
    X_train, y_train, X_val, y_val,
    num_epochs=50,
    batch_size=32,
    patience_train=5,
    verbose=True
)

# Make predictions
X_test = np.array(["This is a test sentence"])
predictions = classifier.predict(X_test)
print(f"Predictions: {predictions}")

# Validate on test set
accuracy = classifier.validate(X_test, np.array([1]))
print(f"Accuracy: {accuracy:.3f}")
```

### Custom Classifier Implementation

```python
import numpy as np
from torchTextClassifiers import torchTextClassifiers
from torchTextClassifiers.classifiers.simple_text_classifier import SimpleTextWrapper, SimpleTextConfig

# Example: TF-IDF based classifier (alternative to tokenization)
config = SimpleTextConfig(
    hidden_dim=128,
    num_classes=2,
    max_features=5000,
    learning_rate=1e-3,
    dropout_rate=0.2
)

# Create classifier with TF-IDF preprocessing
wrapper = SimpleTextWrapper(config)
classifier = torchTextClassifiers(wrapper)

# Text data
X_train = np.array(["Great product!", "Terrible service", "Love it!"])
y_train = np.array([1, 0, 1])

# Build and train
classifier.build(X_train, y_train)
# ... continue with training
```


### Training Customization

```python
# Custom PyTorch Lightning trainer parameters
trainer_params = {
    'accelerator': 'gpu',
    'devices': 1,
    'precision': 16,  # Mixed precision training
    'gradient_clip_val': 1.0,
}

classifier.train(
    X_train, y_train, X_val, y_val,
    num_epochs=100,
    batch_size=64,
    patience_train=10,
    trainer_params=trainer_params,
    verbose=True
)
```

## 🔬 Testing

Run the test suite:

```bash
# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=torchTextClassifiers

# Run specific test file
uv run pytest tests/test_torchTextClassifiers.py -v
```


## 📚 Examples

See the [examples/](examples/) directory for:
- Basic text classification
- Multi-class classification
- Mixed features (text + categorical)
- Custom classifier implementation
- Advanced training configurations



## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "torchtextclassifiers",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "fastText, text classification, NLP, automatic coding, deep learning",
    "author": "Tom Seimandi, Julien Pramil, Meilame Tayebjee, C\u00e9dric Couralet",
    "author_email": "Tom Seimandi <tom.seimandi@gmail.com>, Julien Pramil <julien.pramil@insee.fr>, Meilame Tayebjee <meilame.tayebjee@insee.fr>, C\u00e9dric Couralet <cedric.couralet@insee.fr>",
    "download_url": "https://files.pythonhosted.org/packages/ee/5e/e7c352f8261d876df19638d4e440b6069ba3d87bd1ae9c0ab7edb18d8b9c/torchtextclassifiers-0.0.1.tar.gz",
    "platform": null,
    "description": "# torchTextClassifiers\r\n\r\nA unified, extensible framework for text classification built on [PyTorch](https://pytorch.org/) and [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/).\r\n\r\n## \ud83d\ude80 Features\r\n\r\n- **Unified API**: Consistent interface for different classifier wrappers\r\n- **Extensible**: Easy to add new classifier implementations through wrapper pattern\r\n- **FastText Support**: Built-in FastText classifier with n-gram tokenization\r\n- **Flexible Preprocessing**: Each classifier can implement its own text preprocessing approach\r\n- **PyTorch Lightning**: Automated training with callbacks, early stopping, and logging\r\n\r\n\r\n## \ud83d\udce6 Installation\r\n\r\n```bash\r\n# Clone the repository\r\ngit clone https://github.com/InseeFrLab/torchTextClassifiers.git\r\ncd torchtextClassifiers\r\n\r\n# Install with uv (recommended)\r\nuv sync\r\n\r\n# Or install with pip\r\npip install -e .\r\n```\r\n\r\n## \ud83c\udfaf Quick Start\r\n\r\n### Basic FastText Classification\r\n\r\n```python\r\nimport numpy as np\r\nfrom torchTextClassifiers import create_fasttext\r\n\r\n# Create a FastText classifier\r\nclassifier = create_fasttext(\r\n    embedding_dim=100,\r\n    sparse=False,\r\n    num_tokens=10000,\r\n    min_count=2,\r\n    min_n=3,\r\n    max_n=6,\r\n    len_word_ngrams=2,\r\n    num_classes=2\r\n)\r\n\r\n# Prepare your data\r\nX_train = np.array([\r\n    \"This is a positive example\",\r\n    \"This is a negative example\",\r\n    \"Another positive case\",\r\n    \"Another negative case\"\r\n])\r\ny_train = np.array([1, 0, 1, 0])\r\n\r\nX_val = np.array([\r\n    \"Validation positive\",\r\n    \"Validation negative\"\r\n])\r\ny_val = np.array([1, 0])\r\n\r\n# Build the model\r\nclassifier.build(X_train, y_train)\r\n\r\n# Train the model\r\nclassifier.train(\r\n    X_train, y_train, X_val, y_val,\r\n    num_epochs=50,\r\n    batch_size=32,\r\n    patience_train=5,\r\n    verbose=True\r\n)\r\n\r\n# Make predictions\r\nX_test = np.array([\"This is a test sentence\"])\r\npredictions = classifier.predict(X_test)\r\nprint(f\"Predictions: {predictions}\")\r\n\r\n# Validate on test set\r\naccuracy = classifier.validate(X_test, np.array([1]))\r\nprint(f\"Accuracy: {accuracy:.3f}\")\r\n```\r\n\r\n### Custom Classifier Implementation\r\n\r\n```python\r\nimport numpy as np\r\nfrom torchTextClassifiers import torchTextClassifiers\r\nfrom torchTextClassifiers.classifiers.simple_text_classifier import SimpleTextWrapper, SimpleTextConfig\r\n\r\n# Example: TF-IDF based classifier (alternative to tokenization)\r\nconfig = SimpleTextConfig(\r\n    hidden_dim=128,\r\n    num_classes=2,\r\n    max_features=5000,\r\n    learning_rate=1e-3,\r\n    dropout_rate=0.2\r\n)\r\n\r\n# Create classifier with TF-IDF preprocessing\r\nwrapper = SimpleTextWrapper(config)\r\nclassifier = torchTextClassifiers(wrapper)\r\n\r\n# Text data\r\nX_train = np.array([\"Great product!\", \"Terrible service\", \"Love it!\"])\r\ny_train = np.array([1, 0, 1])\r\n\r\n# Build and train\r\nclassifier.build(X_train, y_train)\r\n# ... continue with training\r\n```\r\n\r\n\r\n### Training Customization\r\n\r\n```python\r\n# Custom PyTorch Lightning trainer parameters\r\ntrainer_params = {\r\n    'accelerator': 'gpu',\r\n    'devices': 1,\r\n    'precision': 16,  # Mixed precision training\r\n    'gradient_clip_val': 1.0,\r\n}\r\n\r\nclassifier.train(\r\n    X_train, y_train, X_val, y_val,\r\n    num_epochs=100,\r\n    batch_size=64,\r\n    patience_train=10,\r\n    trainer_params=trainer_params,\r\n    verbose=True\r\n)\r\n```\r\n\r\n## \ud83d\udd2c Testing\r\n\r\nRun the test suite:\r\n\r\n```bash\r\n# Run all tests\r\nuv run pytest\r\n\r\n# Run with coverage\r\nuv run pytest --cov=torchTextClassifiers\r\n\r\n# Run specific test file\r\nuv run pytest tests/test_torchTextClassifiers.py -v\r\n```\r\n\r\n\r\n## \ud83d\udcda Examples\r\n\r\nSee the [examples/](examples/) directory for:\r\n- Basic text classification\r\n- Multi-class classification\r\n- Mixed features (text + categorical)\r\n- Custom classifier implementation\r\n- Advanced training configurations\r\n\r\n\r\n\r\n## \ud83d\udcc4 License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "An implementation of the https://github.com/facebookresearch/fastText supervised learning algorithm for text classification using Pytorch.",
    "version": "0.0.1",
    "project_urls": null,
    "split_keywords": [
        "fasttext",
        " text classification",
        " nlp",
        " automatic coding",
        " deep learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ead2849da27eab4630f6836f61340a7e561d4b78a347d8daaf8cedcc3fefdae7",
                "md5": "404106befc7692977de5fb9e5ac6c4c3",
                "sha256": "45978bdc6679420c779b4348579fb1a7d9a575d90d9dafd1c1a12658bfc20013"
            },
            "downloads": -1,
            "filename": "torchtextclassifiers-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "404106befc7692977de5fb9e5ac6c4c3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 35440,
            "upload_time": "2025-07-30T05:25:37",
            "upload_time_iso_8601": "2025-07-30T05:25:37.136080Z",
            "url": "https://files.pythonhosted.org/packages/ea/d2/849da27eab4630f6836f61340a7e561d4b78a347d8daaf8cedcc3fefdae7/torchtextclassifiers-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ee5ee7c352f8261d876df19638d4e440b6069ba3d87bd1ae9c0ab7edb18d8b9c",
                "md5": "3cea616aa9b0ab6bb8d11e09eb2fc577",
                "sha256": "a86065e1aec413031cd76e1070bc66640a63063f3c7d835abf5f79310cfc98e2"
            },
            "downloads": -1,
            "filename": "torchtextclassifiers-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "3cea616aa9b0ab6bb8d11e09eb2fc577",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 27727,
            "upload_time": "2025-07-30T05:25:38",
            "upload_time_iso_8601": "2025-07-30T05:25:38.165345Z",
            "url": "https://files.pythonhosted.org/packages/ee/5e/e7c352f8261d876df19638d4e440b6069ba3d87bd1ae9c0ab7edb18d8b9c/torchtextclassifiers-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-30 05:25:38",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "torchtextclassifiers"
}

Tom Seimandi, Julien Pramil, Meilame Tayebjee, Cédric Couralet