pytreebank


Namepytreebank JSON
Version 0.2.7 PyPI version JSON
download
home_pagehttps://github.com/JonathanRaiman/pytreebank
SummaryPython package for loading Stanford Sentiment Treebank corpus
upload_time2020-02-18 06:04:51
maintainer
docs_urlNone
authorJonathan Raiman
requires_python
licenseMIT
keywords machine learning nlp
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            SST Utils
---------

Utilities for downloading, importing, and visualizing the [Stanford Sentiment Treebank](http://nlp.stanford.edu/sentiment/treebank.html), a dataset capturing fine-grained sentiment over movie reviews.
See examples below for usage. Tested in Python `3.4.3` and `2.7.12`.

![Jonathan Raiman, author](https://img.shields.io/badge/Author-Jonathan%20Raiman%20-blue.svg)

Javascript code by Jason Chuang and Stanford NLP modified and taken from [Stanford NLP Sentiment Analysis demo](http://nlp.stanford.edu:8080/sentiment/rntnDemo.html).

[![PyPI version](https://badge.fury.io/py/pytreebank.svg)](https://badge.fury.io/py/pytreebank)
[![Build Status](https://travis-ci.org/JonathanRaiman/pytreebank.svg?branch=master)](https://travis-ci.org/JonathanRaiman/pytreebank)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.md)

### Visualization

Allows for visualization using Jason Chuang's Javascript and CSS within an IPython notebook:

```python
import pytreebank
# load the sentiment treebank corpus in the parenthesis format,
# e.g. "(4 (2 very ) (3 good))"
dataset = pytreebank.load_sst()
# add Javascript and CSS to the Ipython notebook
pytreebank.LabeledTree.inject_visualization_javascript()
# select and example to visualize
example = dataset["train"][0]
# display it in the page
example.display()
```

![Example visualization using pytreebank](visualization_example.png)

### Lines and Labels

To use the corpus to output spans from the different trees you can call the `to_labeled_lines` and `to_lines` method of a `LabeledTree`. The first returned sentence in those lists is always the root sentence:

```python
import pytreebank
dataset = pytreebank.load_sst()
example = dataset["train"][0]

# extract spans from the tree.
for label, sentence in example.to_labeled_lines():
	print("%s has sentiment label %s" % (
		sentence,
		["very negative", "negative", "neutral", "positive", "very positive"][label]
	))
```

### Download/Loading control:

Change the save/load directory by passing a path (this will look for
`train.txt`, `dev.txt` and `test.txt` files under the directory).

```
dataset = pytreebank.load_sst("/path/to/sentiment/")
```

To just load a single dataset file:

```
train_data = pytreebank.import_tree_corpus("/path/to/sentiment/train.txt")
```
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/JonathanRaiman/pytreebank",
    "name": "pytreebank",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Machine Learning,NLP",
    "author": "Jonathan Raiman",
    "author_email": "jonathanraiman@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e0/12/626ead6f6c0a0a9617396796b965961e9dfa5e78b36c17a81ea4c43554b1/pytreebank-0.2.7.tar.gz",
    "platform": "any",
    "description": "SST Utils\n---------\n\nUtilities for downloading, importing, and visualizing the [Stanford Sentiment Treebank](http://nlp.stanford.edu/sentiment/treebank.html), a dataset capturing fine-grained sentiment over movie reviews.\nSee examples below for usage. Tested in Python `3.4.3` and `2.7.12`.\n\n![Jonathan Raiman, author](https://img.shields.io/badge/Author-Jonathan%20Raiman%20-blue.svg)\n\nJavascript code by Jason Chuang and Stanford NLP modified and taken from [Stanford NLP Sentiment Analysis demo](http://nlp.stanford.edu:8080/sentiment/rntnDemo.html).\n\n[![PyPI version](https://badge.fury.io/py/pytreebank.svg)](https://badge.fury.io/py/pytreebank)\n[![Build Status](https://travis-ci.org/JonathanRaiman/pytreebank.svg?branch=master)](https://travis-ci.org/JonathanRaiman/pytreebank)\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.md)\n\n### Visualization\n\nAllows for visualization using Jason Chuang's Javascript and CSS within an IPython notebook:\n\n```python\nimport pytreebank\n# load the sentiment treebank corpus in the parenthesis format,\n# e.g. \"(4 (2 very ) (3 good))\"\ndataset = pytreebank.load_sst()\n# add Javascript and CSS to the Ipython notebook\npytreebank.LabeledTree.inject_visualization_javascript()\n# select and example to visualize\nexample = dataset[\"train\"][0]\n# display it in the page\nexample.display()\n```\n\n![Example visualization using pytreebank](visualization_example.png)\n\n### Lines and Labels\n\nTo use the corpus to output spans from the different trees you can call the `to_labeled_lines` and `to_lines` method of a `LabeledTree`. The first returned sentence in those lists is always the root sentence:\n\n```python\nimport pytreebank\ndataset = pytreebank.load_sst()\nexample = dataset[\"train\"][0]\n\n# extract spans from the tree.\nfor label, sentence in example.to_labeled_lines():\n\tprint(\"%s has sentiment label %s\" % (\n\t\tsentence,\n\t\t[\"very negative\", \"negative\", \"neutral\", \"positive\", \"very positive\"][label]\n\t))\n```\n\n### Download/Loading control:\n\nChange the save/load directory by passing a path (this will look for\n`train.txt`, `dev.txt` and `test.txt` files under the directory).\n\n```\ndataset = pytreebank.load_sst(\"/path/to/sentiment/\")\n```\n\nTo just load a single dataset file:\n\n```\ntrain_data = pytreebank.import_tree_corpus(\"/path/to/sentiment/train.txt\")\n```",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python package for loading Stanford Sentiment Treebank corpus",
    "version": "0.2.7",
    "split_keywords": [
        "machine learning",
        "nlp"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "3edc17f6f2e18c775bb65cfc77dc15fe",
                "sha256": "f0c6fde639739d356d4994d432476903421d216b3e2f11a620c3118e47aa675f"
            },
            "downloads": -1,
            "filename": "pytreebank-0.2.7.tar.gz",
            "has_sig": false,
            "md5_digest": "3edc17f6f2e18c775bb65cfc77dc15fe",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 34554,
            "upload_time": "2020-02-18T06:04:51",
            "upload_time_iso_8601": "2020-02-18T06:04:51.097559Z",
            "url": "https://files.pythonhosted.org/packages/e0/12/626ead6f6c0a0a9617396796b965961e9dfa5e78b36c17a81ea4c43554b1/pytreebank-0.2.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-02-18 06:04:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "JonathanRaiman",
    "github_project": "pytreebank",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pytreebank"
}
        
Elapsed time: 0.04470s