# Simple Representations
This library is based on the [Transformers](https://github.com/huggingface/transformers) library by HuggingFace. Using this library, you can quickly extract text representations from Transformer models. Only two lines of code are needed to initialize the required model and extract the text representations from it.
# Table of contents
* [Installation](#installation)
* [With `pip`](#with-pip)
* [From source](#from-source)
* [Usage](#usage)
* [Minimal Start](#minimal-start)
* [Default Settings](#default-settings)
* [Current Pretrained Models](#current-pretrained-models)
* [Acknowledgements](#acknowledgements)
## Installation
This repository is tested on Python 3.6.8 and PyTorch 1.2.0
### With `pip`
First you need to install PyTorch. Please refer to [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.
When PyTorch has been installed, Simple Representation can be installed using pip as follows:
```
pip install simplerepresentation
```
### From source
Here also, you first need to install PyTorch. Please refer to [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.
When PyTorch has been installed, you can install from source by cloning the repository and running:
```
pip install .
```
## Usage
### Minimal Start
The following example extracts the text representations from `BERT Base Uncased` model for the sentences `Hello Transformers!` and `It's very simple.`.
```python
from simplerepresentations import RepresentationModel
def load_data():
return ['Hello Transformers!', 'It\'s very simple.']
if __name__ == '__main__':
model_type = 'bert'
model_name = 'bert-base-uncased'
representation_model = RepresentationModel(
model_type=model_type,
model_name=model_name,
batch_size=32,
max_seq_length=10, # truncate sentences to be less than or equal to 10 tokens
combination_method='cat', # concatenate the last `last_hidden_to_use` hidden states
last_hidden_to_use=4 # use the last 4 hidden states to build tokens representations
)
text_a = load_data()
all_sentences_representations, all_tokens_representations = representation_model(text_a=text_a)
print(all_sentences_representations.shape) # (2, 768) => (number of sentences, hidden size)
print(all_tokens_representations.shape) # (2, 10, 3072) => (number of sentences, number of tokens, hidden size)
```
You can change the code in `load_data` function to load your own data from any source you want (e.g. a CSV file).
### Default Settings
The default settings for `RepresentationModel` class are given below:
#### batch_size (32): integer
The batch size will be used while extracting representations.
#### max_seq_length (128): integer
Maximum sequence length the model will support.
#### last_hidden_to_use (1): integer
The number of the last hidden states that will be used to build the representations.
#### combination_method ('sum'): string ('sum', 'cat')
The method that will be used to combine the `last_hidden_to_use`.
#### use_cuda (True): boolean
Whether to use `CUDA` or not.
#### process_count (cpu_count() - 2 if cpu_count() > 2 else 1): integer
Number of CPU cores (processes) to use when converting examples to features. Default is (number of cores - 2) or 1 if (number of cores <= 2).
#### chunksize (500): integer
The number of chunks that the examples will be divided to when converting them to features.
### Current Pretrained Models
You can find the complete list of the current pretrained models from Transformers library [documentation](https://huggingface.co/transformers/pretrained_models.html).
## Acknowledgements
None of this would have been possible without the hard work by the HuggingFace team in developing the [Transformers](https://github.com/huggingface/transformers) library.
Also, a lot of ideas used in this repository inspired from the [Simple Transformers](https://github.com/ThilinaRajapakse/simpletransformers) library.
Raw data
{
"_id": null,
"home_page": "https://github.com/AliOsm/simplerepresentations",
"name": "simplerepresentations",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "NLP,natural language processing,deep learning,pytorch,transformer,BERT,XLM,XLNet,RoBERTa,DistilBERT,GPT,GPT-2,ALBERT,google,openai,CMU",
"author": "Ali Fadel",
"author_email": "aliosm1997@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b6/c7/6b3dbc9b94612307a1f360640da888d5599783b601b0944422e1ba560b58/simplerepresentations-0.0.4.tar.gz",
"platform": "",
"description": "# Simple Representations\n\nThis library is based on the [Transformers](https://github.com/huggingface/transformers) library by HuggingFace. Using this library, you can quickly extract text representations from Transformer models. Only two lines of code are needed to initialize the required model and extract the text representations from it.\n\n# Table of contents\n\n* [Installation](#installation)\n\t* [With `pip`](#with-pip)\n\t* [From source](#from-source)\n* [Usage](#usage)\n\t* [Minimal Start](#minimal-start)\n\t* [Default Settings](#default-settings)\n\t* [Current Pretrained Models](#current-pretrained-models)\n* [Acknowledgements](#acknowledgements)\n\n## Installation\n\nThis repository is tested on Python 3.6.8 and PyTorch 1.2.0\n\n### With `pip`\n\nFirst you need to install PyTorch. Please refer to [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.\n\nWhen PyTorch has been installed, Simple Representation can be installed using pip as follows:\n\n```\npip install simplerepresentation\n```\n\n### From source\n\nHere also, you first need to install PyTorch. Please refer to [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.\n\nWhen PyTorch has been installed, you can install from source by cloning the repository and running:\n\n```\npip install .\n```\n\n## Usage\n\n### Minimal Start\n\nThe following example extracts the text representations from `BERT Base Uncased` model for the sentences `Hello Transformers!` and `It's very simple.`.\n\n```python\nfrom simplerepresentations import RepresentationModel\n\n\ndef load_data():\n\treturn ['Hello Transformers!', 'It\\'s very simple.']\n\n\nif __name__ == '__main__':\n\tmodel_type = 'bert'\n\tmodel_name = 'bert-base-uncased'\n\n\trepresentation_model = RepresentationModel(\n\t\tmodel_type=model_type,\n\t\tmodel_name=model_name,\n\t\tbatch_size=32,\n\t\tmax_seq_length=10, # truncate sentences to be less than or equal to 10 tokens\n\t\tcombination_method='cat', # concatenate the last `last_hidden_to_use` hidden states\n\t\tlast_hidden_to_use=4 # use the last 4 hidden states to build tokens representations\n\t)\n\n\ttext_a = load_data()\n\n\tall_sentences_representations, all_tokens_representations = representation_model(text_a=text_a)\n\n\tprint(all_sentences_representations.shape) # (2, 768) => (number of sentences, hidden size)\n\tprint(all_tokens_representations.shape) # (2, 10, 3072) => (number of sentences, number of tokens, hidden size)\n```\n\nYou can change the code in `load_data` function to load your own data from any source you want (e.g. a CSV file).\n\n### Default Settings\n\nThe default settings for `RepresentationModel` class are given below:\n\n#### batch_size (32): integer\nThe batch size will be used while extracting representations.\n\n#### max_seq_length (128): integer\nMaximum sequence length the model will support.\n\n#### last_hidden_to_use (1): integer\nThe number of the last hidden states that will be used to build the representations.\n\n#### combination_method ('sum'): string ('sum', 'cat')\nThe method that will be used to combine the `last_hidden_to_use`.\n\n#### use_cuda (True): boolean\nWhether to use `CUDA` or not.\n\n#### process_count (cpu_count() - 2 if cpu_count() > 2 else 1): integer\nNumber of CPU cores (processes) to use when converting examples to features. Default is (number of cores - 2) or 1 if (number of cores <= 2).\n\n#### chunksize (500): integer\nThe number of chunks that the examples will be divided to when converting them to features.\n\n### Current Pretrained Models\n\nYou can find the complete list of the current pretrained models from Transformers library [documentation](https://huggingface.co/transformers/pretrained_models.html).\n\n## Acknowledgements\n\nNone of this would have been possible without the hard work by the HuggingFace team in developing the [Transformers](https://github.com/huggingface/transformers) library.\n\nAlso, a lot of ideas used in this repository inspired from the [Simple Transformers](https://github.com/ThilinaRajapakse/simpletransformers) library.",
"bugtrack_url": null,
"license": "Apache",
"summary": "Easy-to-use text representations extraction library based on the Transformers library.",
"version": "0.0.4",
"project_urls": {
"Homepage": "https://github.com/AliOsm/simplerepresentations"
},
"split_keywords": [
"nlp",
"natural language processing",
"deep learning",
"pytorch",
"transformer",
"bert",
"xlm",
"xlnet",
"roberta",
"distilbert",
"gpt",
"gpt-2",
"albert",
"google",
"openai",
"cmu"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b6c76b3dbc9b94612307a1f360640da888d5599783b601b0944422e1ba560b58",
"md5": "026ab4ada0f239fdfe5035b252056d79",
"sha256": "bdc3b6a08cabb4f6966dd723e5e8dcd56336a5b049d53c44ad99b17b32e12092"
},
"downloads": -1,
"filename": "simplerepresentations-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "026ab4ada0f239fdfe5035b252056d79",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 7313,
"upload_time": "2020-01-03T15:33:31",
"upload_time_iso_8601": "2020-01-03T15:33:31.046729Z",
"url": "https://files.pythonhosted.org/packages/b6/c7/6b3dbc9b94612307a1f360640da888d5599783b601b0944422e1ba560b58/simplerepresentations-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2020-01-03 15:33:31",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AliOsm",
"github_project": "simplerepresentations",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "simplerepresentations"
}