# RETVec: Resilient & Efficient Text Vectorizer
## Overview
RETVec is a next-gen text vectorizer designed to offer built-in adversarial resilience using robust word embeddings. Read the paper here: https://arxiv.org/abs/2302.09207.
RETVec is trained to be resilient against character manipulations including insertion, deletion, typos, homoglyphs, LEET substitution, and more. The RETVec model is trained on top of a novel character embedding which can encode all UTF-8 characters and words. Thus, RETVec works out-of-the-box on over 100 languages without the need for a lookup table or fixed vocabulary size. Furthermore, RETVec is a layer, which means that it can be inserted into any TF model without the need for a separate pre-processing step.
### Getting started
#### Installation
You can use pip to install the TensorFlow version of RETVec:
```python
pip install retvec
```
RETVec has been tested on TensorFlow 2.6+ and python 3.7+.
### Basic Usage
`training/train_tf_retvec_models.py` is the RETVec model training script. Example usage:
```python
train_tf_retvec_models.py --train_config <train_config_path> --model_config <model_config_path> --output_dir <output_path>
```
Configurations for our base models are under the `configs/` folder.
### Colab
Colab for training and releasing a new RETVec model: `notebooks/train_and_relase_a_rewnet.ipynb`
Hello world colab: `notebooks/hello_world.ipynb`
## Disclaimer
This is not an official Google product.
Raw data
{
"_id": null,
"home_page": "https://github.com/google-research/retvec",
"name": "retvec",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Google",
"author_email": "retvec@google.com",
"download_url": "https://files.pythonhosted.org/packages/f3/e7/ab797f2b5e71f62690f1d2e3e848bacdc0d9dff48bda5ce9051253720237/retvec-1.0.0.tar.gz",
"platform": null,
"description": "# RETVec: Resilient & Efficient Text Vectorizer\n\n\n## Overview\nRETVec is a next-gen text vectorizer designed to offer built-in adversarial resilience using robust word embeddings. Read the paper here: https://arxiv.org/abs/2302.09207.\n\nRETVec is trained to be resilient against character manipulations including insertion, deletion, typos, homoglyphs, LEET substitution, and more. The RETVec model is trained on top of a novel character embedding which can encode all UTF-8 characters and words. Thus, RETVec works out-of-the-box on over 100 languages without the need for a lookup table or fixed vocabulary size. Furthermore, RETVec is a layer, which means that it can be inserted into any TF model without the need for a separate pre-processing step.\n\n\n### Getting started\n\n#### Installation\n\nYou can use pip to install the TensorFlow version of RETVec:\n\n```python\npip install retvec\n```\n\nRETVec has been tested on TensorFlow 2.6+ and python 3.7+.\n\n### Basic Usage\n\n`training/train_tf_retvec_models.py` is the RETVec model training script. Example usage:\n\n```python\ntrain_tf_retvec_models.py --train_config <train_config_path> --model_config <model_config_path> --output_dir <output_path>\n```\n\nConfigurations for our base models are under the `configs/` folder.\n\n### Colab\n\nColab for training and releasing a new RETVec model: `notebooks/train_and_relase_a_rewnet.ipynb`\n\nHello world colab: `notebooks/hello_world.ipynb`\n\n## Disclaimer\nThis is not an official Google product.\n\n\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Resilient and Efficient Text Vectorizer",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/google-research/retvec"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fe8fdb725817dddaa443721d116429bf9f024f40c061fb3ce36800ecbe3ad10d",
"md5": "d401124c5159266adc55205b6bd2b733",
"sha256": "84aeab0498a9a83b47eed40f7539e4e4900e3a08bf47f099e91116cd39539ed5"
},
"downloads": -1,
"filename": "retvec-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d401124c5159266adc55205b6bd2b733",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 66408,
"upload_time": "2023-08-02T21:38:19",
"upload_time_iso_8601": "2023-08-02T21:38:19.711393Z",
"url": "https://files.pythonhosted.org/packages/fe/8f/db725817dddaa443721d116429bf9f024f40c061fb3ce36800ecbe3ad10d/retvec-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f3e7ab797f2b5e71f62690f1d2e3e848bacdc0d9dff48bda5ce9051253720237",
"md5": "12ecedce99a33e74af2a3e8b991e64d6",
"sha256": "572426a4b9535b2274f734d7744bac80949085373874503542237b2870380446"
},
"downloads": -1,
"filename": "retvec-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "12ecedce99a33e74af2a3e8b991e64d6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 24689,
"upload_time": "2023-08-02T21:38:21",
"upload_time_iso_8601": "2023-08-02T21:38:21.473004Z",
"url": "https://files.pythonhosted.org/packages/f3/e7/ab797f2b5e71f62690f1d2e3e848bacdc0d9dff48bda5ce9051253720237/retvec-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-02 21:38:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "google-research",
"github_project": "retvec",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "retvec"
}