SST Utils
---------
Utilities for downloading, importing, and visualizing the [Stanford Sentiment Treebank](http://nlp.stanford.edu/sentiment/treebank.html), a dataset capturing fine-grained sentiment over movie reviews.
See examples below for usage. Tested in Python `3.4.3` and `2.7.12`.
![Jonathan Raiman, author](https://img.shields.io/badge/Author-Jonathan%20Raiman%20-blue.svg)
Javascript code by Jason Chuang and Stanford NLP modified and taken from [Stanford NLP Sentiment Analysis demo](http://nlp.stanford.edu:8080/sentiment/rntnDemo.html).
[![PyPI version](https://badge.fury.io/py/pytreebank.svg)](https://badge.fury.io/py/pytreebank)
[![Build Status](https://travis-ci.org/JonathanRaiman/pytreebank.svg?branch=master)](https://travis-ci.org/JonathanRaiman/pytreebank)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.md)
### Visualization
Allows for visualization using Jason Chuang's Javascript and CSS within an IPython notebook:
```python
import pytreebank
# load the sentiment treebank corpus in the parenthesis format,
# e.g. "(4 (2 very ) (3 good))"
dataset = pytreebank.load_sst()
# add Javascript and CSS to the Ipython notebook
pytreebank.LabeledTree.inject_visualization_javascript()
# select and example to visualize
example = dataset["train"][0]
# display it in the page
example.display()
```
![Example visualization using pytreebank](visualization_example.png)
### Lines and Labels
To use the corpus to output spans from the different trees you can call the `to_labeled_lines` and `to_lines` method of a `LabeledTree`. The first returned sentence in those lists is always the root sentence:
```python
import pytreebank
dataset = pytreebank.load_sst()
example = dataset["train"][0]
# extract spans from the tree.
for label, sentence in example.to_labeled_lines():
print("%s has sentiment label %s" % (
sentence,
["very negative", "negative", "neutral", "positive", "very positive"][label]
))
```
### Download/Loading control:
Change the save/load directory by passing a path (this will look for
`train.txt`, `dev.txt` and `test.txt` files under the directory).
```
dataset = pytreebank.load_sst("/path/to/sentiment/")
```
To just load a single dataset file:
```
train_data = pytreebank.import_tree_corpus("/path/to/sentiment/train.txt")
```
Raw data
{
"_id": null,
"home_page": "https://github.com/JonathanRaiman/pytreebank",
"name": "pytreebank",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "Machine Learning,NLP",
"author": "Jonathan Raiman",
"author_email": "jonathanraiman@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e0/12/626ead6f6c0a0a9617396796b965961e9dfa5e78b36c17a81ea4c43554b1/pytreebank-0.2.7.tar.gz",
"platform": "any",
"description": "SST Utils\n---------\n\nUtilities for downloading, importing, and visualizing the [Stanford Sentiment Treebank](http://nlp.stanford.edu/sentiment/treebank.html), a dataset capturing fine-grained sentiment over movie reviews.\nSee examples below for usage. Tested in Python `3.4.3` and `2.7.12`.\n\n![Jonathan Raiman, author](https://img.shields.io/badge/Author-Jonathan%20Raiman%20-blue.svg)\n\nJavascript code by Jason Chuang and Stanford NLP modified and taken from [Stanford NLP Sentiment Analysis demo](http://nlp.stanford.edu:8080/sentiment/rntnDemo.html).\n\n[![PyPI version](https://badge.fury.io/py/pytreebank.svg)](https://badge.fury.io/py/pytreebank)\n[![Build Status](https://travis-ci.org/JonathanRaiman/pytreebank.svg?branch=master)](https://travis-ci.org/JonathanRaiman/pytreebank)\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.md)\n\n### Visualization\n\nAllows for visualization using Jason Chuang's Javascript and CSS within an IPython notebook:\n\n```python\nimport pytreebank\n# load the sentiment treebank corpus in the parenthesis format,\n# e.g. \"(4 (2 very ) (3 good))\"\ndataset = pytreebank.load_sst()\n# add Javascript and CSS to the Ipython notebook\npytreebank.LabeledTree.inject_visualization_javascript()\n# select and example to visualize\nexample = dataset[\"train\"][0]\n# display it in the page\nexample.display()\n```\n\n![Example visualization using pytreebank](visualization_example.png)\n\n### Lines and Labels\n\nTo use the corpus to output spans from the different trees you can call the `to_labeled_lines` and `to_lines` method of a `LabeledTree`. The first returned sentence in those lists is always the root sentence:\n\n```python\nimport pytreebank\ndataset = pytreebank.load_sst()\nexample = dataset[\"train\"][0]\n\n# extract spans from the tree.\nfor label, sentence in example.to_labeled_lines():\n\tprint(\"%s has sentiment label %s\" % (\n\t\tsentence,\n\t\t[\"very negative\", \"negative\", \"neutral\", \"positive\", \"very positive\"][label]\n\t))\n```\n\n### Download/Loading control:\n\nChange the save/load directory by passing a path (this will look for\n`train.txt`, `dev.txt` and `test.txt` files under the directory).\n\n```\ndataset = pytreebank.load_sst(\"/path/to/sentiment/\")\n```\n\nTo just load a single dataset file:\n\n```\ntrain_data = pytreebank.import_tree_corpus(\"/path/to/sentiment/train.txt\")\n```",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python package for loading Stanford Sentiment Treebank corpus",
"version": "0.2.7",
"split_keywords": [
"machine learning",
"nlp"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "3edc17f6f2e18c775bb65cfc77dc15fe",
"sha256": "f0c6fde639739d356d4994d432476903421d216b3e2f11a620c3118e47aa675f"
},
"downloads": -1,
"filename": "pytreebank-0.2.7.tar.gz",
"has_sig": false,
"md5_digest": "3edc17f6f2e18c775bb65cfc77dc15fe",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 34554,
"upload_time": "2020-02-18T06:04:51",
"upload_time_iso_8601": "2020-02-18T06:04:51.097559Z",
"url": "https://files.pythonhosted.org/packages/e0/12/626ead6f6c0a0a9617396796b965961e9dfa5e78b36c17a81ea4c43554b1/pytreebank-0.2.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2020-02-18 06:04:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "JonathanRaiman",
"github_project": "pytreebank",
"travis_ci": true,
"coveralls": false,
"github_actions": false,
"lcname": "pytreebank"
}