slate-nlp


Nameslate-nlp JSON
Version 1.1.1 PyPI version JSON
download
home_page
SummaryA terminal-based text annotation tool
upload_time2023-04-14 03:55:41
maintainer
docs_urlNone
author
requires_python!=3.0.*,!=3.1.*,!=3.2.*,<4,>=2.6
licenseCopyright (c) 2016-2023 Jonathan K Kummerfeld <jonathan.kummerfeld@sydney.edu.au> Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies. THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
keywords nlp annotation labeling natural-language-processing text-annotation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            This is a tool for labeling text documents.
Slate supports annotation at different scales (spans of characters, tokens, and lines, or a document) and of different types (free text, labels, and links).
This covers a range of tasks, such as Part-of-Speech tagging, Named Entity Recognition, Text Classification (including Sentiment Analysis), Discourse Structure, and more.

Why use this tool over the range of other text annotation tools out there?

- Fast
- Trivial installation
- Focuses all of the screen space on annotation (good for large fonts)
- Terminal based, so it works in constrained environments (e.g. only allowed ssh access to a machine)
- Not difficult to configure and modify

Note - this repository is **not** for the "Segment and Link-based Annotation Tool, Enhanced", which can be found [here](https://bitbucket.org/dainkaplan/slate/wiki/Home) and was first presented at [LREC 2010](http://www.lrec-conf.org/proceedings/lrec2010/pdf/129_Paper.pdf).
See 'Citing' below for additional notes on that work.

## Installation

Two options:

### 1. Install with pip
```bash
pip install slate-nlp
```

Then run from any directory in one of two ways:
```
slate
python -m slate
```

### 2. Or download and run without installing
Either download as a zip file:
```bash
curl https://codeload.github.com/jkkummerfeld/slate/zip/master -o "slate.zip"
unzip slate.zip
cd slate-master
```
Or clone the repository:
```bash
git clone https://github.com/jkkummerfeld/slate
cd slate
```

Then run with either of:
```
python slate.py
./slate.py
```
To run from another directory, use:
```
python PATH_TO_SLATE/slate.py
PATH_TO_SLATE/slate.py
```

### Requirements

The code requires only Python (2 or 3) and can be run out of the box.
Your terminal must be at least 80 characters wide and 20 tall to use the tool.

## Citing

If you use this tool in your work, please cite:

```
@InProceedings{acl19slate,
  title     = {SLATE: A Super-Lightweight Annotation Tool for Experts},
  author    = {Jonathan K. Kummerfeld},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations},
  location  = {Florence, Italy},
  month     = {July},
  year      = {2019},
  pages     = {7--12},
  doi       = {10.18653/v1/P19-3002},
  url       = {https://aclweb.org/anthology/papers/P/P19/P19-3002/},
  software  = {https://jkk.name/slate},
}
```

While presenting this work at ACL I learned of another annotation tool called SLATE.
That tool was first described in "Annotation Process Management Revisited", [Kaplan et al. (LREC 2010)](http://www.lrec-conf.org/proceedings/lrec2010/pdf/129_Paper.pdf) and then in "Slate - A Tool for Creating and Maintaining Annotated Corpora", [Kaplan et al. (JLCL 2011)](https://jlcl.org/content/2-allissues/12-Heft2-2011/11.pdf).
It takes a very different approach, using a web based interface that includes a suite of project management tools as well as annotation.
The code it available at [https://bitbucket.org/dainkaplan/slate/wiki/Home](https://bitbucket.org/dainkaplan/slate/wiki/Home).

## Getting Started

Note: if you used pip to install, reaplce `python slate.py` with `slate` everywhere below.

Run `python slate.py <filename>` to start annotating `<filename>` with labels over spans of tokens.
The entire interface is contained in your terminal, there is no GUI.
With command line arguments you can vary properties such as the type of annotation (labels or links) and scope of annotation (characters, tokens, lines, documents).

The input file should be plain text, organised however you like.
Prepare the data with your favourite sentence splitting and/or tokenisation software (e.g., [SpaCy](https://spacy.io)).
If you use Python 3 then unicode should be supported, but the code has not been tested extensively with non-English text (please share any issues!).

When you start the tool it displays a set of core commands by default.
These are also specified below, along with additional commands.

The tool saves annotations in a separate file (`<filename>.annotations` by default, this can be varied with a file list as described below).
Annotation files are formatted with one line per annotated item.
The item is specified with a tuple of numbers.
For labels, the item is followed by a hyphen and the list of labels.
For links, there are two items on the line before the hyphen.
For example, these are two annotation files, one for labels of token spans and the other for links between lines:

```
==> label.annotations <==
(2, 1) - label:a
((3, 5), (3, 8)) - label:a
(7, 8) - label:s label:a

==> link.annotations <==
13 0 - 
13 7 - 
16 7 - 
```

A few notes:
- The second label annotation is on a span of tokens, going from 5 to 8 on line 3.
- The third label annotation has two labels.
- The line annotations only have one number to specify the item.
- When the same line is linked to multiple other lines, each link is a separate item.

### Tutorials

Included in this repository are a set of interactive tutorials that teach you how to use the tool from within the tool itself.

Task | Command
---- | --------
Named Entity Recognition annotation |  `python slate.py tutorial/ner.md -t categorical -s token -o -c ner-book.config -l log.tutorial.ner.txt -sl -sm`
Labelling spans of text in a document | `python slate.py tutorial/label.md -t categorical -s token -o -l log.tutorial.label.txt`
Linking lines in a document | `python slate.py tutorial/link.md -t link -s line -o -l log.tutorial.link.txt`

### Example Workflow

This tool has already been used for two annotation efforts involving multiple annotators ([Durrett et al., 2017](http://jkk.name/publication/emnlp17forums/) and [Kummerfeld et al., 2018](http://jkk.name/publication/arxiv18disentangle/)).
Our workflow was as follows:

- Create a repository containing (1) the annotation guide, (2) the data to be annotated divided into user-specific folders.
- Each annotator downloaded slate and used it to do their annotations and commit the files to the repository.
- Either the whole group or the project leader went through files that were annotated by multiple people, using the adjudication mode in the tool.

### Comparing Annotations

To use adjudication mode, create a file, `example.txt`, similar to the following (you can have as many annotators as you like):

```
raw-text0 adjudicated-anno0 ((1000,),(1000,)) anno0.1 anno0.2 anno0.3
raw-text1 adjudicated-anno1 ((1000,),(1000,)) anno1.1 anno1.2
raw-text2 adjudicated-anno2 ((1000,),(1000,)) anno2.1 anno2.2 anno2.3 anno2.4
```

To save time, it is best to initialise `adjudicated-annoN` with the lines everyone agreed on:

```
for i in 0 1 2 ; do
  count=`ls anno${i}.* | wc -l`
  cat anno${i}.* | sort | uniq -c | awk -v count=$count '$1 == count' | sed 's/^ *[0-9]* *//' > matching
done
```

Then run the tool as if you are annotating, for example for linking lines:

```
python ../learn-anno/slate/slate.py -d example.txt -pf -t link -s line -o -l log.adj.txt --do-not-show-linked
```

## Detailed Usage Instructions

### Invocation options

```
usage: slate.py [-h] [-d DATA_LIST [DATA_LIST ...]] [-t {categorical,link}]
                [-s {character,token,line,document}] [-c CONFIG_FILE] [-l LOG_PREFIX] [-ld]
                [-sh] [-sl] [-sp] [-sm] [-r] [-o] [-ps] [-pf] [--do-not-show-linked]
                [--alternate-comparisons]
                [data ...]

A tool for annotating text data.

positional arguments:
  data                  Files to be annotated

optional arguments:
  -h, --help            show this help message and exit
  -d DATA_LIST [DATA_LIST ...], --data-list DATA_LIST [DATA_LIST ...]
                        Files containing lists of files to be annotated
  -t {categorical,link}, --ann-type {categorical,link}
                        The type of annotation being done.
  -s {character,token,line,document}, --ann-scope {character,token,line,document}
                        The scope of annotation being done.
  -c CONFIG_FILE, --config-file CONFIG_FILE
                        A file containing configuration information.
  -l LOG_PREFIX, --log-prefix LOG_PREFIX
                        Prefix for logging files
  -ld, --log-debug      Provide detailed logging.
  -sh, --show-help      Show help on startup.
  -sl, --show-legend    Start with legend showing.
  -sp, --show-progress  Start with progress showing.
  -sm, --show-mark      Start with mark showing.
  -r, --readonly        Do not allow changes or save annotations.
  -o, --overwrite       If they exist already, read and overwrite output files.
  -ps, --prevent-self-links
                        Prevent an item from being linked to itself.
  -pf, --prevent-forward-links
                        Prevent a link from an item to one after it.
  --do-not-show-linked  Do not have a special color to indicate any linked token.
  --alternate-comparisons
                        Activate alternative way of showing different annotations (one colour
                        per set of markings, rather than counts).
```

You may also define arguments in a file and pass them in as follows:

```bash
python slate.py @arguments.txt
```

### Keybindings

The tool shows files one at a time in plain text. Default commands are:

Type                        | Key                                                       | Labelling Affect                 | Linking Affect
--------------------------- | --------------------------------------------------------- | -------------------------------- | ---------------------
Movement                    | <kbd>j</kbd> or <kbd>&larr;</kbd>                         | move to the left                 | move selected item to the left
&nbsp;                      | <kbd>i</kbd> or <kbd>&uarr;</kbd>                         | move up a line                   | move selected item up a line
&nbsp;                      | <kbd>o</kbd> or <kbd>&darr;</kbd>                         | move down a line                 | move selected item down a line
&nbsp;                      | <kbd>;</kbd> or <kbd>&rarr;</kbd>                         | move to the right                | move selected item to the right
&nbsp;                      | <kbd>J</kbd> or [<kbd>Shift</kbd> + <kbd>&larr;</kbd>]        | go to the start of the line      | move linking item to the left
&nbsp;                      | <kbd>I</kbd> or [<kbd>Shift</kbd> + <kbd>&uarr;</kbd>]        | go to first line                 | move linking item up a line
&nbsp;                      | <kbd>O</kbd> or [<kbd>Shift</kbd> + <kbd>&darr;</kbd>]        | go to last line                  | move linking item down a line
&nbsp;                      | <kbd>:</kbd> or [<kbd>Shift</kbd> + <kbd>&rarr;</kbd>]        | go to the end of the line        | move linking item to the right
Edit Span                   | <kbd>m</kbd>                                              | extend left                      | extend selected item left
&nbsp;                      | <kbd>k</kbd>                                              | contract left side               | contract selected item left
&nbsp;                      | <kbd>/</kbd>                                              | extend right                     | extend selected item right
&nbsp;                      | <kbd>l</kbd>                                              | contract right side              | contract selected item right
&nbsp;                      | <kbd>M</kbd>                                              | -                                | extend linking item left
&nbsp;                      | <kbd>K</kbd>                                              | -                                | contract linking item left
&nbsp;                      | <kbd>?</kbd>                                              | -                                | extend linking item right
&nbsp;                      | <kbd>L</kbd>                                              | -                                | contract linking item right
Label Annotation (default)  | <kbd>Space</kbd> then <kbd>a</kbd>                        | [un]mark this item as a          | -
&nbsp;                      | <kbd>Space</kbd> then <kbd>s</kbd>                        | [un]mark this item as s          | -
&nbsp;                      | <kbd>Space</kbd> then <kbd>d</kbd>                        | [un]mark this item as d          | -
&nbsp;                      | <kbd>Space</kbd> then <kbd>v</kbd>                        | [un]mark this item as v          | -
Link Annotation             | <kbd>d</kbd>                                              | -                                | create a link and move right / down
&nbsp;                      | <kbd>D</kbd>                                              | -                                | create a link
Either Annotation mode      | <kbd>u</kbd>                                              | undo annotation on this item     | undo all annotations for the current item

Shared commands:

Type                        | Mode   | Key                                             | Affect               
--------------------------- | ------ | ----------------------------------------------- | ----------------------------
Searching                   | Normal | <kbd>\\</kbd>                                    | enter query editing mode
&nbsp;                      | Query  | <kbd>?</kbd> or <kbd>Enter</kbd>                    | exit query editing mode
&nbsp;                      | Query  | <kbd>!</kbd> or <kbd>Backspace</kbd>                    | delete last character in query
&nbsp;                      | Query  | characters except <kbd>?</kbd> and <kbd>!</kbd> | add character to query
&nbsp;                      | Normal | <kbd>p</kbd>                                    | go to previous match
&nbsp;                      | Normal | <kbd>n</kbd>                                    | go to next match
&nbsp;                      | Normal | <kbd>P</kbd>                                    | go to previous match for linking line
&nbsp;                      | Normal | <kbd>N</kbd>                                    | go to next match for linking line
Assigning text labels       | Normal | <kbd>t</kbd>                                    | enter label editing mode
&nbsp;                      | Label  | <kbd>?</kbd> or <kbd>Enter</kbd>                    | exit label editing mode and assign the label
&nbsp;                      | Label  | <kbd>!</kbd> or <kbd>Backspace</kbd>                    | delete last character in label
&nbsp;                      | Label  | characters except <kbd>?</kbd> and <kbd>!</kbd> | add character to label
Saving, exiting, etc        | Normal | <kbd>]</kbd>                                    | save and go to next file         
&nbsp;                      | Normal | <kbd>[</kbd>                                    | save and go to previous file     
&nbsp;                      | Normal | <kbd>q</kbd>                                    | save and quit                    
&nbsp;                      | Normal | <kbd>s</kbd>                                    | save                             
&nbsp;                      | Normal | <kbd>Q</kbd>                                    | quit                             
Misc                        | Normal | <kbd>#</kbd>                                    | toggle line numbers
&nbsp;                      | Normal | <kbd>h</kbd>                                    | toggle help info (default on)    
&nbsp;                      | Normal | <kbd>{</kbd> or <kbd>PAGE-UP</kbd>              | shift view up 5 lines
&nbsp;                      | Normal | <kbd>}</kbd> or <kbd>PAGE-DOWN</kbd>            | shift view down 5 lines
&nbsp;                      | Normal | <kbd>></kbd> then <kbd>p</kbd>                  | toggle showing progress through files
&nbsp;                      | Normal | <kbd>></kbd> then <kbd>l</kbd>                  | toggle showing legend for labels
&nbsp;                      | Normal | <kbd>></kbd> then <kbd>m</kbd>                  | toggle showing the mark on the current item

Note: special keys such as `ENTER` and `BACKSPACE` may not work on non-OS-X operating systems. That is why in all places where they are used we have an alternative as well.

### Misc

To annotate multiple files, specify more than one as an argument.
For greater control, provide a list of files in a file specified with `--data-list` / `-d`.
The list should be formatted as follows, where [] indicate optional values:

```
raw_file [annotation_file [starting_position [additional_annotation_files]]]
```

For example, these commands will create a file list, use it, then return to it later:

```bash
find . -name *txt > filenames_todo
./slate.py -d filenames_todo -l do_later
# ... do some work, then quit, go away, come back...
./slate.py -d do_later.todo -l do_even_later -o
```

Note, the `-o` flag is added so it will allow you to edit the annotations you have already created.
Otherwise the system will complain that you are overwriting existing annotation files.

When the `additional_annotation_files` are included it activates an adjudication mode.
By default, all annotations that appear in all additional files are added to the current annotations.
Disagreements are coloured in the text, but will disappear once a decision is made (using the normal annotation commands).

## Customisation

Colours and keys are customisable. For labelling, the default is:

 - Underlined, current selected item
 - Green on black, 'a' items
 - Blue on black, 's' items
 - Magenta on black, 'd' items
 - Red on black, 'v' items
 - Cyan on black, multiple types for a single token

For linking, the default is:

 - Underlined, current selected item
 - Green on black, current linking item
 - Blue on black, item is linked to the current linking item
 - Yellow on black, item is in some link, though not with the current linking item

### Modifying the Code

Slate has a relatively small codebase (~2,200 lines) and is designed to make adding new functionality not too hard.
The code is divided up as follows:

 - `annotate.py`, the main program, this has the core loop that gets user input.
 - `config.py`, contains the default configuration, including colours and keyboard bindings.
 - `data.py`, classes to read, store and write data.
 - `view.py`, rendering the screen.

Logic for determining what colour goes where is split across two parts of the code.
In `data.py`, the set of labels for an item is determined.
In `view.py`, that set of labels is used to choose a suitable colour.

Adding a new command involves:

 - Adding the name and key to `input_action_list` in `config.py`
 - Adding a mapping from the name to a function in `action_to_function` in `annotate.py`
 - Adding or modifying a function in `annotate.py`
 - Modifying `data.py` or `view.py` to apply the action

#### Changing the label set / Adding labels

The label set is defined in your config file (see an example config [here](https://github.com/jkkummerfeld/slate/blob/master/tutorial/config-example.txt)).

See lines like this for label definitions:

```
Label:          a                         SPACE_a green
```

The format is:
```
Label:        <label>                    <command> <colour>
```

You can add / edit / remove these lines to define your own label scheme. For example, for NER you may want to do:

```
Label:          O                         SPACE_a green
Label:          LOC                       SPACE_s blue
Label:          PER                       SPACE_d red
Label:          ORG                       SPACE_f yellow
Label:          MISC                      SPACE_v magenta
```

The current set of available colours is: [green, blue, white, cyan, magenta, red, yellow].
Note that by default white is used for regular text and cyan is used for cases where multiple labels apply to the same content.

To define more colours, edit the top of `slate/config.py`.
By varying both the text colour (foreground) and background colour you can achieve quite a range of variations.
You can also define any RGB colour you want using the curses [init_color](https://docs.python.org/3/library/curses.html#curses.init_color) function and the [init_pair](https://docs.python.org/3/library/curses.html#curses.init_pair) function.

# Questions

If you have a question please either:

- Open an issue on [github](https://github.com/jkkummerfeld/slate/issues).
- Mail me at [jonathan.kummerfeld@sydney.edu.au](mailto:jonathan.kummerfeld@sydney.edu.au).

# Contributions

If you find a bug in the code, please submit an issue, or even better, a pull request with a fix.

# Acknowledgments

This tool is based in part upon work supported by IBM under contract 4915012629, and by ONR under MURI grant N000140911081.
Any opinions, findings, conclusions or recommendations expressed are those of the authors and do not necessarily reflect the views of IBM.


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "slate-nlp",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "!=3.0.*,!=3.1.*,!=3.2.*,<4,>=2.6",
    "maintainer_email": "",
    "keywords": "nlp,annotation,labeling,natural-language-processing,text-annotation",
    "author": "",
    "author_email": "\"Jonathan K. Kummerfeld\" <jonathan.kummerfeld@sydney.edu.au>",
    "download_url": "https://files.pythonhosted.org/packages/d1/b8/79e27b4ca00c8d332803c49752b294196fb3075833b2e3aceb2fc5c5dcd5/slate-nlp-1.1.1.tar.gz",
    "platform": null,
    "description": "This is a tool for labeling text documents.\nSlate supports annotation at different scales (spans of characters, tokens, and lines, or a document) and of different types (free text, labels, and links).\nThis covers a range of tasks, such as Part-of-Speech tagging, Named Entity Recognition, Text Classification (including Sentiment Analysis), Discourse Structure, and more.\n\nWhy use this tool over the range of other text annotation tools out there?\n\n- Fast\n- Trivial installation\n- Focuses all of the screen space on annotation (good for large fonts)\n- Terminal based, so it works in constrained environments (e.g. only allowed ssh access to a machine)\n- Not difficult to configure and modify\n\nNote - this repository is **not** for the \"Segment and Link-based Annotation Tool, Enhanced\", which can be found [here](https://bitbucket.org/dainkaplan/slate/wiki/Home) and was first presented at [LREC 2010](http://www.lrec-conf.org/proceedings/lrec2010/pdf/129_Paper.pdf).\nSee 'Citing' below for additional notes on that work.\n\n## Installation\n\nTwo options:\n\n### 1. Install with pip\n```bash\npip install slate-nlp\n```\n\nThen run from any directory in one of two ways:\n```\nslate\npython -m slate\n```\n\n### 2. Or download and run without installing\nEither download as a zip file:\n```bash\ncurl https://codeload.github.com/jkkummerfeld/slate/zip/master -o \"slate.zip\"\nunzip slate.zip\ncd slate-master\n```\nOr clone the repository:\n```bash\ngit clone https://github.com/jkkummerfeld/slate\ncd slate\n```\n\nThen run with either of:\n```\npython slate.py\n./slate.py\n```\nTo run from another directory, use:\n```\npython PATH_TO_SLATE/slate.py\nPATH_TO_SLATE/slate.py\n```\n\n### Requirements\n\nThe code requires only Python (2 or 3) and can be run out of the box.\nYour terminal must be at least 80 characters wide and 20 tall to use the tool.\n\n## Citing\n\nIf you use this tool in your work, please cite:\n\n```\n@InProceedings{acl19slate,\n  title     = {SLATE: A Super-Lightweight Annotation Tool for Experts},\n  author    = {Jonathan K. Kummerfeld},\n  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations},\n  location  = {Florence, Italy},\n  month     = {July},\n  year      = {2019},\n  pages     = {7--12},\n  doi       = {10.18653/v1/P19-3002},\n  url       = {https://aclweb.org/anthology/papers/P/P19/P19-3002/},\n  software  = {https://jkk.name/slate},\n}\n```\n\nWhile presenting this work at ACL I learned of another annotation tool called SLATE.\nThat tool was first described in \"Annotation Process Management Revisited\", [Kaplan et al. (LREC 2010)](http://www.lrec-conf.org/proceedings/lrec2010/pdf/129_Paper.pdf) and then in \"Slate - A Tool for Creating and Maintaining Annotated Corpora\", [Kaplan et al. (JLCL 2011)](https://jlcl.org/content/2-allissues/12-Heft2-2011/11.pdf).\nIt takes a very different approach, using a web based interface that includes a suite of project management tools as well as annotation.\nThe code it available at [https://bitbucket.org/dainkaplan/slate/wiki/Home](https://bitbucket.org/dainkaplan/slate/wiki/Home).\n\n## Getting Started\n\nNote: if you used pip to install, reaplce `python slate.py` with `slate` everywhere below.\n\nRun `python slate.py <filename>` to start annotating `<filename>` with labels over spans of tokens.\nThe entire interface is contained in your terminal, there is no GUI.\nWith command line arguments you can vary properties such as the type of annotation (labels or links) and scope of annotation (characters, tokens, lines, documents).\n\nThe input file should be plain text, organised however you like.\nPrepare the data with your favourite sentence splitting and/or tokenisation software (e.g., [SpaCy](https://spacy.io)).\nIf you use Python 3 then unicode should be supported, but the code has not been tested extensively with non-English text (please share any issues!).\n\nWhen you start the tool it displays a set of core commands by default.\nThese are also specified below, along with additional commands.\n\nThe tool saves annotations in a separate file (`<filename>.annotations` by default, this can be varied with a file list as described below).\nAnnotation files are formatted with one line per annotated item.\nThe item is specified with a tuple of numbers.\nFor labels, the item is followed by a hyphen and the list of labels.\nFor links, there are two items on the line before the hyphen.\nFor example, these are two annotation files, one for labels of token spans and the other for links between lines:\n\n```\n==> label.annotations <==\n(2, 1) - label:a\n((3, 5), (3, 8)) - label:a\n(7, 8) - label:s label:a\n\n==> link.annotations <==\n13 0 - \n13 7 - \n16 7 - \n```\n\nA few notes:\n- The second label annotation is on a span of tokens, going from 5 to 8 on line 3.\n- The third label annotation has two labels.\n- The line annotations only have one number to specify the item.\n- When the same line is linked to multiple other lines, each link is a separate item.\n\n### Tutorials\n\nIncluded in this repository are a set of interactive tutorials that teach you how to use the tool from within the tool itself.\n\nTask | Command\n---- | --------\nNamed Entity Recognition annotation |  `python slate.py tutorial/ner.md -t categorical -s token -o -c ner-book.config -l log.tutorial.ner.txt -sl -sm`\nLabelling spans of text in a document | `python slate.py tutorial/label.md -t categorical -s token -o -l log.tutorial.label.txt`\nLinking lines in a document | `python slate.py tutorial/link.md -t link -s line -o -l log.tutorial.link.txt`\n\n### Example Workflow\n\nThis tool has already been used for two annotation efforts involving multiple annotators ([Durrett et al., 2017](http://jkk.name/publication/emnlp17forums/) and [Kummerfeld et al., 2018](http://jkk.name/publication/arxiv18disentangle/)).\nOur workflow was as follows:\n\n- Create a repository containing (1) the annotation guide, (2) the data to be annotated divided into user-specific folders.\n- Each annotator downloaded slate and used it to do their annotations and commit the files to the repository.\n- Either the whole group or the project leader went through files that were annotated by multiple people, using the adjudication mode in the tool.\n\n### Comparing Annotations\n\nTo use adjudication mode, create a file, `example.txt`, similar to the following (you can have as many annotators as you like):\n\n```\nraw-text0 adjudicated-anno0 ((1000,),(1000,)) anno0.1 anno0.2 anno0.3\nraw-text1 adjudicated-anno1 ((1000,),(1000,)) anno1.1 anno1.2\nraw-text2 adjudicated-anno2 ((1000,),(1000,)) anno2.1 anno2.2 anno2.3 anno2.4\n```\n\nTo save time, it is best to initialise `adjudicated-annoN` with the lines everyone agreed on:\n\n```\nfor i in 0 1 2 ; do\n  count=`ls anno${i}.* | wc -l`\n  cat anno${i}.* | sort | uniq -c | awk -v count=$count '$1 == count' | sed 's/^ *[0-9]* *//' > matching\ndone\n```\n\nThen run the tool as if you are annotating, for example for linking lines:\n\n```\npython ../learn-anno/slate/slate.py -d example.txt -pf -t link -s line -o -l log.adj.txt --do-not-show-linked\n```\n\n## Detailed Usage Instructions\n\n### Invocation options\n\n```\nusage: slate.py [-h] [-d DATA_LIST [DATA_LIST ...]] [-t {categorical,link}]\n                [-s {character,token,line,document}] [-c CONFIG_FILE] [-l LOG_PREFIX] [-ld]\n                [-sh] [-sl] [-sp] [-sm] [-r] [-o] [-ps] [-pf] [--do-not-show-linked]\n                [--alternate-comparisons]\n                [data ...]\n\nA tool for annotating text data.\n\npositional arguments:\n  data                  Files to be annotated\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -d DATA_LIST [DATA_LIST ...], --data-list DATA_LIST [DATA_LIST ...]\n                        Files containing lists of files to be annotated\n  -t {categorical,link}, --ann-type {categorical,link}\n                        The type of annotation being done.\n  -s {character,token,line,document}, --ann-scope {character,token,line,document}\n                        The scope of annotation being done.\n  -c CONFIG_FILE, --config-file CONFIG_FILE\n                        A file containing configuration information.\n  -l LOG_PREFIX, --log-prefix LOG_PREFIX\n                        Prefix for logging files\n  -ld, --log-debug      Provide detailed logging.\n  -sh, --show-help      Show help on startup.\n  -sl, --show-legend    Start with legend showing.\n  -sp, --show-progress  Start with progress showing.\n  -sm, --show-mark      Start with mark showing.\n  -r, --readonly        Do not allow changes or save annotations.\n  -o, --overwrite       If they exist already, read and overwrite output files.\n  -ps, --prevent-self-links\n                        Prevent an item from being linked to itself.\n  -pf, --prevent-forward-links\n                        Prevent a link from an item to one after it.\n  --do-not-show-linked  Do not have a special color to indicate any linked token.\n  --alternate-comparisons\n                        Activate alternative way of showing different annotations (one colour\n                        per set of markings, rather than counts).\n```\n\nYou may also define arguments in a file and pass them in as follows:\n\n```bash\npython slate.py @arguments.txt\n```\n\n### Keybindings\n\nThe tool shows files one at a time in plain text. Default commands are:\n\nType                        | Key                                                       | Labelling Affect                 | Linking Affect\n--------------------------- | --------------------------------------------------------- | -------------------------------- | ---------------------\nMovement                    | <kbd>j</kbd> or <kbd>&larr;</kbd>                         | move to the left                 | move selected item to the left\n&nbsp;                      | <kbd>i</kbd> or <kbd>&uarr;</kbd>                         | move up a line                   | move selected item up a line\n&nbsp;                      | <kbd>o</kbd> or <kbd>&darr;</kbd>                         | move down a line                 | move selected item down a line\n&nbsp;                      | <kbd>;</kbd> or <kbd>&rarr;</kbd>                         | move to the right                | move selected item to the right\n&nbsp;                      | <kbd>J</kbd> or [<kbd>Shift</kbd> + <kbd>&larr;</kbd>]        | go to the start of the line      | move linking item to the left\n&nbsp;                      | <kbd>I</kbd> or [<kbd>Shift</kbd> + <kbd>&uarr;</kbd>]        | go to first line                 | move linking item up a line\n&nbsp;                      | <kbd>O</kbd> or [<kbd>Shift</kbd> + <kbd>&darr;</kbd>]        | go to last line                  | move linking item down a line\n&nbsp;                      | <kbd>:</kbd> or [<kbd>Shift</kbd> + <kbd>&rarr;</kbd>]        | go to the end of the line        | move linking item to the right\nEdit Span                   | <kbd>m</kbd>                                              | extend left                      | extend selected item left\n&nbsp;                      | <kbd>k</kbd>                                              | contract left side               | contract selected item left\n&nbsp;                      | <kbd>/</kbd>                                              | extend right                     | extend selected item right\n&nbsp;                      | <kbd>l</kbd>                                              | contract right side              | contract selected item right\n&nbsp;                      | <kbd>M</kbd>                                              | -                                | extend linking item left\n&nbsp;                      | <kbd>K</kbd>                                              | -                                | contract linking item left\n&nbsp;                      | <kbd>?</kbd>                                              | -                                | extend linking item right\n&nbsp;                      | <kbd>L</kbd>                                              | -                                | contract linking item right\nLabel Annotation (default)  | <kbd>Space</kbd> then <kbd>a</kbd>                        | [un]mark this item as a          | -\n&nbsp;                      | <kbd>Space</kbd> then <kbd>s</kbd>                        | [un]mark this item as s          | -\n&nbsp;                      | <kbd>Space</kbd> then <kbd>d</kbd>                        | [un]mark this item as d          | -\n&nbsp;                      | <kbd>Space</kbd> then <kbd>v</kbd>                        | [un]mark this item as v          | -\nLink Annotation             | <kbd>d</kbd>                                              | -                                | create a link and move right / down\n&nbsp;                      | <kbd>D</kbd>                                              | -                                | create a link\nEither Annotation mode      | <kbd>u</kbd>                                              | undo annotation on this item     | undo all annotations for the current item\n\nShared commands:\n\nType                        | Mode   | Key                                             | Affect               \n--------------------------- | ------ | ----------------------------------------------- | ----------------------------\nSearching                   | Normal | <kbd>\\\\</kbd>                                    | enter query editing mode\n&nbsp;                      | Query  | <kbd>?</kbd> or <kbd>Enter</kbd>                    | exit query editing mode\n&nbsp;                      | Query  | <kbd>!</kbd> or <kbd>Backspace</kbd>                    | delete last character in query\n&nbsp;                      | Query  | characters except <kbd>?</kbd> and <kbd>!</kbd> | add character to query\n&nbsp;                      | Normal | <kbd>p</kbd>                                    | go to previous match\n&nbsp;                      | Normal | <kbd>n</kbd>                                    | go to next match\n&nbsp;                      | Normal | <kbd>P</kbd>                                    | go to previous match for linking line\n&nbsp;                      | Normal | <kbd>N</kbd>                                    | go to next match for linking line\nAssigning text labels       | Normal | <kbd>t</kbd>                                    | enter label editing mode\n&nbsp;                      | Label  | <kbd>?</kbd> or <kbd>Enter</kbd>                    | exit label editing mode and assign the label\n&nbsp;                      | Label  | <kbd>!</kbd> or <kbd>Backspace</kbd>                    | delete last character in label\n&nbsp;                      | Label  | characters except <kbd>?</kbd> and <kbd>!</kbd> | add character to label\nSaving, exiting, etc        | Normal | <kbd>]</kbd>                                    | save and go to next file         \n&nbsp;                      | Normal | <kbd>[</kbd>                                    | save and go to previous file     \n&nbsp;                      | Normal | <kbd>q</kbd>                                    | save and quit                    \n&nbsp;                      | Normal | <kbd>s</kbd>                                    | save                             \n&nbsp;                      | Normal | <kbd>Q</kbd>                                    | quit                             \nMisc                        | Normal | <kbd>#</kbd>                                    | toggle line numbers\n&nbsp;                      | Normal | <kbd>h</kbd>                                    | toggle help info (default on)    \n&nbsp;                      | Normal | <kbd>{</kbd> or <kbd>PAGE-UP</kbd>              | shift view up 5 lines\n&nbsp;                      | Normal | <kbd>}</kbd> or <kbd>PAGE-DOWN</kbd>            | shift view down 5 lines\n&nbsp;                      | Normal | <kbd>></kbd> then <kbd>p</kbd>                  | toggle showing progress through files\n&nbsp;                      | Normal | <kbd>></kbd> then <kbd>l</kbd>                  | toggle showing legend for labels\n&nbsp;                      | Normal | <kbd>></kbd> then <kbd>m</kbd>                  | toggle showing the mark on the current item\n\nNote: special keys such as `ENTER` and `BACKSPACE` may not work on non-OS-X operating systems. That is why in all places where they are used we have an alternative as well.\n\n### Misc\n\nTo annotate multiple files, specify more than one as an argument.\nFor greater control, provide a list of files in a file specified with `--data-list` / `-d`.\nThe list should be formatted as follows, where [] indicate optional values:\n\n```\nraw_file [annotation_file [starting_position [additional_annotation_files]]]\n```\n\nFor example, these commands will create a file list, use it, then return to it later:\n\n```bash\nfind . -name *txt > filenames_todo\n./slate.py -d filenames_todo -l do_later\n# ... do some work, then quit, go away, come back...\n./slate.py -d do_later.todo -l do_even_later -o\n```\n\nNote, the `-o` flag is added so it will allow you to edit the annotations you have already created.\nOtherwise the system will complain that you are overwriting existing annotation files.\n\nWhen the `additional_annotation_files` are included it activates an adjudication mode.\nBy default, all annotations that appear in all additional files are added to the current annotations.\nDisagreements are coloured in the text, but will disappear once a decision is made (using the normal annotation commands).\n\n## Customisation\n\nColours and keys are customisable. For labelling, the default is:\n\n - Underlined, current selected item\n - Green on black, 'a' items\n - Blue on black, 's' items\n - Magenta on black, 'd' items\n - Red on black, 'v' items\n - Cyan on black, multiple types for a single token\n\nFor linking, the default is:\n\n - Underlined, current selected item\n - Green on black, current linking item\n - Blue on black, item is linked to the current linking item\n - Yellow on black, item is in some link, though not with the current linking item\n\n### Modifying the Code\n\nSlate has a relatively small codebase (~2,200 lines) and is designed to make adding new functionality not too hard.\nThe code is divided up as follows:\n\n - `annotate.py`, the main program, this has the core loop that gets user input.\n - `config.py`, contains the default configuration, including colours and keyboard bindings.\n - `data.py`, classes to read, store and write data.\n - `view.py`, rendering the screen.\n\nLogic for determining what colour goes where is split across two parts of the code.\nIn `data.py`, the set of labels for an item is determined.\nIn `view.py`, that set of labels is used to choose a suitable colour.\n\nAdding a new command involves:\n\n - Adding the name and key to `input_action_list` in `config.py`\n - Adding a mapping from the name to a function in `action_to_function` in `annotate.py`\n - Adding or modifying a function in `annotate.py`\n - Modifying `data.py` or `view.py` to apply the action\n\n#### Changing the label set / Adding labels\n\nThe label set is defined in your config file (see an example config [here](https://github.com/jkkummerfeld/slate/blob/master/tutorial/config-example.txt)).\n\nSee lines like this for label definitions:\n\n```\nLabel:          a                         SPACE_a green\n```\n\nThe format is:\n```\nLabel:        <label>                    <command> <colour>\n```\n\nYou can add / edit / remove these lines to define your own label scheme. For example, for NER you may want to do:\n\n```\nLabel:          O                         SPACE_a green\nLabel:          LOC                       SPACE_s blue\nLabel:          PER                       SPACE_d red\nLabel:          ORG                       SPACE_f yellow\nLabel:          MISC                      SPACE_v magenta\n```\n\nThe current set of available colours is: [green, blue, white, cyan, magenta, red, yellow].\nNote that by default white is used for regular text and cyan is used for cases where multiple labels apply to the same content.\n\nTo define more colours, edit the top of `slate/config.py`.\nBy varying both the text colour (foreground) and background colour you can achieve quite a range of variations.\nYou can also define any RGB colour you want using the curses [init_color](https://docs.python.org/3/library/curses.html#curses.init_color) function and the [init_pair](https://docs.python.org/3/library/curses.html#curses.init_pair) function.\n\n# Questions\n\nIf you have a question please either:\n\n- Open an issue on [github](https://github.com/jkkummerfeld/slate/issues).\n- Mail me at [jonathan.kummerfeld@sydney.edu.au](mailto:jonathan.kummerfeld@sydney.edu.au).\n\n# Contributions\n\nIf you find a bug in the code, please submit an issue, or even better, a pull request with a fix.\n\n# Acknowledgments\n\nThis tool is based in part upon work supported by IBM under contract 4915012629, and by ONR under MURI grant N000140911081.\nAny opinions, findings, conclusions or recommendations expressed are those of the authors and do not necessarily reflect the views of IBM.\n\n",
    "bugtrack_url": null,
    "license": "Copyright (c) 2016-2023 Jonathan K Kummerfeld <jonathan.kummerfeld@sydney.edu.au>  Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.  THE SOFTWARE IS PROVIDED \"AS IS\" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. ",
    "summary": "A terminal-based text annotation tool",
    "version": "1.1.1",
    "split_keywords": [
        "nlp",
        "annotation",
        "labeling",
        "natural-language-processing",
        "text-annotation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5227b26542bdb37fc2d3a72fd095a8c780643fd0d83132b9c9020e2006970562",
                "md5": "66ba26d5d8ce2dc9f7f025edb3d13242",
                "sha256": "fdc0e5b7ee89ef980a77faa795aa572c7f7f6d66505a211eee79eea25ae6333b"
            },
            "downloads": -1,
            "filename": "slate_nlp-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "66ba26d5d8ce2dc9f7f025edb3d13242",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "!=3.0.*,!=3.1.*,!=3.2.*,<4,>=2.6",
            "size": 29477,
            "upload_time": "2023-04-14T03:55:37",
            "upload_time_iso_8601": "2023-04-14T03:55:37.461052Z",
            "url": "https://files.pythonhosted.org/packages/52/27/b26542bdb37fc2d3a72fd095a8c780643fd0d83132b9c9020e2006970562/slate_nlp-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d1b879e27b4ca00c8d332803c49752b294196fb3075833b2e3aceb2fc5c5dcd5",
                "md5": "fe5721c7b55de2d1cef4645b2d39a2d3",
                "sha256": "8f282ee1ab51a895bb77269cd060af436af006bc0bcb591ac0d6274c59101c8f"
            },
            "downloads": -1,
            "filename": "slate-nlp-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "fe5721c7b55de2d1cef4645b2d39a2d3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "!=3.0.*,!=3.1.*,!=3.2.*,<4,>=2.6",
            "size": 983457,
            "upload_time": "2023-04-14T03:55:41",
            "upload_time_iso_8601": "2023-04-14T03:55:41.147137Z",
            "url": "https://files.pythonhosted.org/packages/d1/b8/79e27b4ca00c8d332803c49752b294196fb3075833b2e3aceb2fc5c5dcd5/slate-nlp-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-14 03:55:41",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "slate-nlp"
}
        
Elapsed time: 0.05488s