# ReaderBench Python
## Install
We recommend using virtual environments, as some packages require an exact version.
If you only want to use the package do the following:
1. `sudo apt-get install python3-pip, python3-venv, python3-dev`
2. `python3 -m venv rbenv` (create virutal environment named rbenv)
3. `source rbenv/bin/activate` (activate virtual env)
4. `pip3 uninstall setuptools && pip3 install setuptools && pip3 install --upgrade pip && pip3 install --no-cache-dir rbpy-rb`
5. Use it as in: https://github.com/readerbench/ReaderBench/blob/master/usage.py
If you want to contribute to the code base of package:
1. `sudo apt-get install python3-pip, python3-venv, python3-dev`
2. `git clone git@git.readerbench.com:ReaderBench/readerbenchpy.git && cd readerbenchpy/`
3. `python3 -m venv rbenv` (create virutal environment named rbenv)
4. `source rbenv/bin/activate` (activate virtual env)
5. `pip3 uninstall setuptools && pip3 install setuptools && pip3 install --upgrade pip`
6. `pip3 install -r requirements.txt`
7. `python3 nltk_download.py`
Optional: prei-install model for en (otherwise most of the English processings would fail
and ask to run this command):
8. `python3 -m spacy download en_core_web_lg`
If you want to install spellchecking (hunspell) also you need this non-python libraries:
1. `sudo apt-get install libhunspell-1.6-0 libhunspell-dev hunspell-ro`
2. `pip3 install hunspell`
## Usage
For usage (parsing, lemmatization, NER, wordnet, content words, indices etc.) see file `usage.py` from
https://github.com/readerbench/ReaderBench
## Tips
You may also need some spacy models which are downloaded through spacy.
You have to download these spacy models by yourself, using the command:
`python3 -m spacy download name_of_the_model`
The logger will also write instructions on which models you need, and how to download them.
## Developer instructions
## How to use Bert
Our models are also available in the HuggingFace platform: https://huggingface.co/readerbench
You can use them directly from HuggingFace:
```
# tensorflow
from transformers import AutoModel, AutoTokenizer, TFAutoModel
tokenizer = AutoTokenizer.from_pretrained("readerbench/RoBERT-base")
model = TFAutoModel.from_pretrained("readerbench/RoBERT-base")
inputs = tokenizer("exemplu de propoziție", return_tensors="tf")
outputs = model(inputs)
# pytorch
from transformers import AutoModel, AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("readerbench/RoBERT-base")
model = AutoModel.from_pretrained("readerbench/RoBERT-base")
inputs = tokenizer("exemplu de propoziție", return_tensors="pt")
outputs = model(**inputs)
```
or from ReaderBench:
```
from rb.core.lang import Lang
from rb.processings.encoders.bert import BertWrapper
from tensorflow import keras
bert_wrapper = BertWrapper(Lang.RO, max_seq_len=128)
inputs, bert_layer = bert_wrapper.create_inputs_and_model()
cls_output = bert_wrapper.get_output(bert_layer, "cls") # or "pool"
# Add decision layer and compile model
# eg.
# hidden = keras.layers.Dense(..)(cls_output)
# output = keras.layers.Dense(..)(hidden)
# model = keras.Model(inputs=inputs, outputs=[output])
# model.compile(..)
bert_wrapper.load_weights() #must be called after compile
# Process inputs for model
feed_inputs = bert_wrapper.process_input(["text1", "text2", "text3"])
# feed_output = ...
# model.fit(feed_inputs, feed_output, ...)
```
## How to use the logger
In each file you have to initialize the logger:
```sh
from rb.utils.rblogger import Logger
logger = Logger.get_logger()
logger.info("info msg")
logger.warning("warning msg")
logger.error()
```
## How to push the wheel on pip
1. `rm -r dist/`
2. `pip3 install twine wheel`
3. `./upload_to_pypi.sh`
## How to run rb/core/cscl/csv_parser.py
1. Do the installing steps from contribution
2. run `pip3 install xmltodict`
3. run `EXPORT PYTHONPATH=/add/path/to/repo/readerbenchpy/`
4. add json resources in a `jsons` directory in `readerbenchpy/rb/core/cscl/`
5. run `cd rb/core/cscl/ && python3 csv_parser.py`
## Supported Date Formats
ReaderBench is able to perform conversation analysis from chats and communities. Each utterance must have the time expressed in one of the following formats:
- %Y-%m-%d %H:%M:%S.%f %Z
- %Y-%m-%d %H:%M:%S %Z
- %Y-%m-%d %H:%M %Z
- %Y-%m-%d %H:%M:%S.%f
- %Y-%m-%d %H:%M:%S
- %Y-%m-%d %H:%M
where codifications are extracted from [Python date format codes](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes).
Raw data
{
"_id": null,
"home_page": "https://github.com/readerbench/ReaderBench",
"name": "rbpy-rb",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "",
"author": "Woodcarver",
"author_email": "batpepastrama@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/84/cc/ee85c8ac687b685437ae6d6a28c7706a77eec8affaf434ccdb8fed23df2f/rbpy-rb-0.12.4.tar.gz",
"platform": null,
"description": "# ReaderBench Python\n\n## Install\nWe recommend using virtual environments, as some packages require an exact version. \nIf you only want to use the package do the following: \n1. `sudo apt-get install python3-pip, python3-venv, python3-dev` \n2. `python3 -m venv rbenv` (create virutal environment named rbenv)\n3. `source rbenv/bin/activate` (activate virtual env)\n4. `pip3 uninstall setuptools && pip3 install setuptools && pip3 install --upgrade pip && pip3 install --no-cache-dir rbpy-rb`\n5. Use it as in: https://github.com/readerbench/ReaderBench/blob/master/usage.py \n\nIf you want to contribute to the code base of package: \n1. `sudo apt-get install python3-pip, python3-venv, python3-dev` \n2. `git clone git@git.readerbench.com:ReaderBench/readerbenchpy.git && cd readerbenchpy/` \n3. `python3 -m venv rbenv` (create virutal environment named rbenv)\n4. `source rbenv/bin/activate` (activate virtual env)\n5. `pip3 uninstall setuptools && pip3 install setuptools && pip3 install --upgrade pip`\n6. `pip3 install -r requirements.txt` \n7. `python3 nltk_download.py` \nOptional: prei-install model for en (otherwise most of the English processings would fail\n and ask to run this command):\n8. `python3 -m spacy download en_core_web_lg`\n\n\nIf you want to install spellchecking (hunspell) also you need this non-python libraries:\n1. `sudo apt-get install libhunspell-1.6-0 libhunspell-dev hunspell-ro`\n2. `pip3 install hunspell`\n\n## Usage\nFor usage (parsing, lemmatization, NER, wordnet, content words, indices etc.) see file `usage.py` from \nhttps://github.com/readerbench/ReaderBench \n\n\n## Tips\nYou may also need some spacy models which are downloaded through spacy. \nYou have to download these spacy models by yourself, using the command: \n`python3 -m spacy download name_of_the_model` \nThe logger will also write instructions on which models you need, and how to download them. \n\n## Developer instructions\n\n## How to use Bert\n\nOur models are also available in the HuggingFace platform: https://huggingface.co/readerbench \n\nYou can use them directly from HuggingFace:\n```\n# tensorflow\nfrom transformers import AutoModel, AutoTokenizer, TFAutoModel\ntokenizer = AutoTokenizer.from_pretrained(\"readerbench/RoBERT-base\")\nmodel = TFAutoModel.from_pretrained(\"readerbench/RoBERT-base\")\ninputs = tokenizer(\"exemplu de propozi\u021bie\", return_tensors=\"tf\")\noutputs = model(inputs)\n\n# pytorch\nfrom transformers import AutoModel, AutoTokenizer, AutoModel\ntokenizer = AutoTokenizer.from_pretrained(\"readerbench/RoBERT-base\")\nmodel = AutoModel.from_pretrained(\"readerbench/RoBERT-base\")\ninputs = tokenizer(\"exemplu de propozi\u021bie\", return_tensors=\"pt\")\noutputs = model(**inputs)\n```\n\nor from ReaderBench:\n\n```\nfrom rb.core.lang import Lang\nfrom rb.processings.encoders.bert import BertWrapper\nfrom tensorflow import keras\n\nbert_wrapper = BertWrapper(Lang.RO, max_seq_len=128)\ninputs, bert_layer = bert_wrapper.create_inputs_and_model()\ncls_output = bert_wrapper.get_output(bert_layer, \"cls\") # or \"pool\"\n\n# Add decision layer and compile model\n# eg. \n# hidden = keras.layers.Dense(..)(cls_output)\n# output = keras.layers.Dense(..)(hidden)\n# model = keras.Model(inputs=inputs, outputs=[output])\n# model.compile(..)\n\nbert_wrapper.load_weights() #must be called after compile\n\n# Process inputs for model\nfeed_inputs = bert_wrapper.process_input([\"text1\", \"text2\", \"text3\"])\n# feed_output = ...\n# model.fit(feed_inputs, feed_output, ...)\n```\n\n## How to use the logger\nIn each file you have to initialize the logger: \n```sh\nfrom rb.utils.rblogger import Logger \nlogger = Logger.get_logger() \nlogger.info(\"info msg\")\nlogger.warning(\"warning msg\") \nlogger.error()\n```\n## How to push the wheel on pip\n1. `rm -r dist/`\n2. `pip3 install twine wheel`\n3. `./upload_to_pypi.sh`\n\n\n## How to run rb/core/cscl/csv_parser.py\n1. Do the installing steps from contribution\n2. run `pip3 install xmltodict`\n3. run `EXPORT PYTHONPATH=/add/path/to/repo/readerbenchpy/`\n4. add json resources in a `jsons` directory in `readerbenchpy/rb/core/cscl/`\n5. run `cd rb/core/cscl/ && python3 csv_parser.py`\n\n## Supported Date Formats\nReaderBench is able to perform conversation analysis from chats and communities. Each utterance must have the time expressed in one of the following formats:\n- %Y-%m-%d %H:%M:%S.%f %Z\n- %Y-%m-%d %H:%M:%S %Z\n- %Y-%m-%d %H:%M %Z\n- %Y-%m-%d %H:%M:%S.%f\n- %Y-%m-%d %H:%M:%S\n- %Y-%m-%d %H:%M\nwhere codifications are extracted from [Python date format codes](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes).\n",
"bugtrack_url": null,
"license": "",
"summary": "ReaderBench library written in python",
"version": "0.12.4",
"project_urls": {
"Homepage": "https://github.com/readerbench/ReaderBench"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "84ccee85c8ac687b685437ae6d6a28c7706a77eec8affaf434ccdb8fed23df2f",
"md5": "bef3425d6cdd3daae801d5d7769c65f9",
"sha256": "8bf4f518d8587611e2295fdb1f6f84ccc4661bf084026fc7f3227afeced2c9f9"
},
"downloads": -1,
"filename": "rbpy-rb-0.12.4.tar.gz",
"has_sig": false,
"md5_digest": "bef3425d6cdd3daae801d5d7769c65f9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 1493610,
"upload_time": "2023-08-09T08:27:14",
"upload_time_iso_8601": "2023-08-09T08:27:14.784344Z",
"url": "https://files.pythonhosted.org/packages/84/cc/ee85c8ac687b685437ae6d6a28c7706a77eec8affaf434ccdb8fed23df2f/rbpy-rb-0.12.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-09 08:27:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "readerbench",
"github_project": "ReaderBench",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "rbpy-rb"
}