emoji-annotations


Nameemoji-annotations JSON
Version 0.0.0 PyPI version JSON
download
home_pageNone
SummaryEasy to read and edit text annotations for NLP tasks.
upload_time2025-10-29 14:40:55
maintainerNone
docs_urlNone
authorJan Göpfert
requires_python>=3.9
licenseNone
keywords annotation tool annotations nlp natural language processing ner named entity recognition sequence labeling information extraction emoji
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <a href="https://www.fz-juelich.de/en/ice/ice-2"><img src="https://github.com/FZJ-IEK3-VSA/README_assets/blob/main/JSA-Header.svg?raw=True" alt="Forschungszentrum Juelich Logo" width="175px"></a>

# emoji-annotations

Easy to read and edit text annotations for NLP tasks.

Using colorful emojis is an easy and effective way to annotate text. Emoji annotations are easy to read and edit without requiring specialized software. Simply use your preferred text editor to curate your data.


## Why emojis?

They are easy to spot, distinguish, and edit — and are fun to use! Data formats used to store annotations for sequence annotation tasks are often difficult for humans to read and are usually viewed and edited with specialized software. Using emojis makes annotations easily recognizable and editable in any text editor. Ideally, use emojis that resemble the entity type (e.g., 📆,⏰️,📍,🏛️,🎨, etc.) or that are of different colors (e.g., 🍎,🥝,🍊,🍌,🍉,🍇, etc.). 

Emoji annotations are:
* **Easy to setup**: No need for special software, just use your favorite text editor.
* **Easy to read**: Colors pop out and are easy to distinguish.
* **Easy to edit**: Edits are quick and easy because emojis are just one character to move per annotation boundary. In addition, you can use the search and replace function and other features of your favorite text editor to efficiently edit many annotations.


## Limitations

* Works only for text genres in which the emojis selected as annotation boundaries are unlikely to appear in the text.
* Because the same emoji is used as start and end markers, nested or overlapping annotations of the same entity type are not supported.


## Installation

Create and activate a virtual environment. Then, install the package via pip:
```bash
pip install emoji-annotations
```


## Supported tasks

Emoji annotations are best suited for tasks that are typically approached with sequence labeling, such as named entity recognition (NER). You can also use them for relation extraction, template filling, or event extraction, provided that you use relation-specific tagging and only annotate one n-ary relation, template, or event per record.


## Usage

Create a new emoji annotation object using whichever emoji mapping you prefer. A mapping is a dictionary that associates entity types with emojis.

```python
from emoji_annotations import EmojiAnnotator
emoji_mapping = {
        "artwork": "🎨",
        "painter": "👨‍🎨",
        "museum": "🏛️",
        "location": "📍",
        "year": "📆",
    }
emoji_nlp = EmojiAnnotator(emoji_mapping)
```
**Convert annotated text to plain text and annnoations as char offsets.**
```python
text = "The 🎨Mona Lisa🎨 is believed to have been painted by 👨‍🎨Leonardo da Vinci👨‍🎨 between 📆1503📆 and 📆1506📆 and is now displayed in the 📍🏛️Louvre🏛️, Paris📍."
plain_text, annotations = emoji_nlp.from_inline_annotations(text)
print(plain_text)
print(annotations)
```
```bash
The Mona Lisa is believed to have been painted by Leonardo da Vinci between 1503 and 1506 and is now displayed in the Louvre, Paris.
{'artwork': [(4, 13)], 'painter': [(50, 67)], 'year': [(76, 80), (85, 89)], 'location': [(118, 131)], 'museum': [(118, 124)]}
```

**Convert plain text and annotations back to annotated text.**
```python
annotated_text = emoji_nlp.to_inline_annotations(plain_text, annotations)
print(annotated_text)
```
```bash
The 🎨Mona Lisa🎨 is believed to have been painted by 👨‍🎨Leonardo da Vinci👨‍🎨 between 📆1503📆 and 📆1506📆 and is now displayed in the 📍🏛️Louvre🏛️, Paris📍.
```

If two emoji have the same character offset, the emoji that closes the active annotation is placed first. If the order was different in the original text, the order is not preserved (e.g., "🏛️📍Louvre🏛️, Paris📍" would become "📍🏛️Louvre🏛️, Paris📍").

**Use the command line to curate annotations** by integrating `emoji_nlp.get_user_feedback()` in a Python script. This function will prompt the user to confirm or edit the annotations in the text.
```bash
Computer says 🗯️

🌶️Andalusia🌶️ has a 🍊surface area🍊 of 🍏87,597🍏 🍓square kilometres🍓.

Correct? y/n
(To edit the n-th annotation, enter its number n, e.g. '3', press enter, use the arrow keys to move it, press enter to see the changes, and press enter again to confirm the changes. To delete all annotations press 'd'.)
```
```
User input: 3 
```
```
🌶️Andalusia🌶️ has a 🔻surface area🍊 of 🍏87,597🍏 🍓square kilometres🍓.
```
```
User input: → →
```
```
🌶️Andalusia🌶️ has a su🔻rface area🍊 of 🍏87,597🍏 🍓square kilometres🍓.

Correct? y/n
```


## Comparison of NER annotation formats

Comparing different NER annotation formats, we can see that the colorful emoji annotations are much easier to read and edit than the other formats. While this small example already makes the difference obvious, it becomes even more pronounced with larger datasets.

### _Colorful emoji annotations_
```txt
🏢U.N.🏢 official 🙋Ekeus🙋 heads for 📍Baghdad📍.
```

### _CoNLL 2003 NER format_
(https://www.cnts.ua.ac.be/conll2003/ner/)
```conll
U.N.         NNP  I-NP  I-ORG 
official     NN   I-NP  O 
Ekeus        NNP  I-NP  I-PER 
heads        VBZ  I-VP  O 
for          IN   I-PP  O 
Baghdad      NNP  I-NP  I-LOC 
.            .    O     O 
```

### _brat standoff format_
(https://brat.nlplab.org/standoff.html)

```brat
T1  Organization 0 3 U.N.
T2  Person 4 10 Ekeus
T3  Location 20 28 Baghdad
```

### _XML-based formats_
```xml	
<p><EM ID="1" CATEG="ORGANIZATION">U.N.</EM> official <EM ID="2" CATEG="PERSON">Ekeus</EM> heads for <EM ID="3" CATEG="LOCATION">Baghdad</EM>.</p>
```


## Development
To update the list of supported emojis, run
```bash
python src/emoji_annotations/scripts/update_emoji_list.py
```


## About Us 

<a href="https://www.fz-juelich.de/en/ice/ice-2"><img src="https://github.com/FZJ-IEK3-VSA/README_assets/blob/main/iek3-square.png?raw=True" alt="Institute image ICE-2" width="280" align="right" style="margin:0px 10px"/></a>

We are the <a href="https://www.fz-juelich.de/en/ice/ice-2">Institute of Climate and Energy Systems (ICE) - Jülich Systems Analysis</a> belonging to the <a href="https://www.fz-juelich.de/en">Forschungszentrum Jülich</a>. Our interdisciplinary department's research is focusing on energy-related process and systems analyses. Data searches and system simulations are used to determine energy and mass balances, as well as to evaluate performance, emissions and costs of energy systems. The results are used for performing comparative assessment studies between the various systems. Our current priorities include the development of energy strategies, in accordance with the German Federal Government’s greenhouse gas reduction targets, by designing new infrastructures for sustainable and secure energy supply chains and by conducting cost analysis studies for integrating new technologies into future energy market frameworks.

## Acknowledgements

The authors would like to thank the German Federal Government, the German state governments, and the Joint Science Conference (GWK) for their funding and support as part of the NFDI4Ing consortium. Funded by the German Research Foundation (DFG) – project number: 442146713. Furthermore, this work was supported by the Helmholtz Association under the program "Energy System Design".

<p float="left">
    <a href="https://nfdi4ing.de/"><img src="https://nfdi4ing.de/wp-content/uploads/2018/09/logo.svg" alt="NFDI4Ing Logo" width="130px"></a>&emsp;<a href="https://www.helmholtz.de/en/"><img src="https://www.helmholtz.de/fileadmin/user_upload/05_aktuelles/Marke_Design/logos/HG_LOGO_S_ENG_RGB.jpg" alt="Helmholtz Logo" width="200px"></a>
</p>

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "emoji-annotations",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "annotation tool, annotations, NLP, natural language processing, NER, named entity recognition, sequence labeling, information extraction, emoji",
    "author": "Jan G\u00f6pfert",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/99/52/96c216bc4eab5889b8886783b9be9b63ccd4ce2b4a49f185d98c91959732/emoji_annotations-0.0.0.tar.gz",
    "platform": null,
    "description": "<a href=\"https://www.fz-juelich.de/en/ice/ice-2\"><img src=\"https://github.com/FZJ-IEK3-VSA/README_assets/blob/main/JSA-Header.svg?raw=True\" alt=\"Forschungszentrum Juelich Logo\" width=\"175px\"></a>\n\n# emoji-annotations\n\nEasy to read and edit text annotations for NLP tasks.\n\nUsing colorful emojis is an easy and effective way to annotate text. Emoji annotations are easy to read and edit without requiring specialized software. Simply use your preferred text editor to curate your data.\n\n\n## Why emojis?\n\nThey are easy to spot, distinguish, and edit \u2014 and are fun to use! Data formats used to store annotations for sequence annotation tasks are often difficult for humans to read and are usually viewed and edited with specialized software. Using emojis makes annotations easily recognizable and editable in any text editor. Ideally, use emojis that resemble the entity type (e.g., \ud83d\udcc6,\u23f0\ufe0f,\ud83d\udccd,\ud83c\udfdb\ufe0f,\ud83c\udfa8, etc.) or that are of different colors (e.g., \ud83c\udf4e,\ud83e\udd5d,\ud83c\udf4a,\ud83c\udf4c,\ud83c\udf49,\ud83c\udf47, etc.). \n\nEmoji annotations are:\n* **Easy to setup**: No need for special software, just use your favorite text editor.\n* **Easy to read**: Colors pop out and are easy to distinguish.\n* **Easy to edit**: Edits are quick and easy because emojis are just one character to move per annotation boundary. In addition, you can use the search and replace function and other features of your favorite text editor to efficiently edit many annotations.\n\n\n## Limitations\n\n* Works only for text genres in which the emojis selected as annotation boundaries are unlikely to appear in the text.\n* Because the same emoji is used as start and end markers, nested or overlapping annotations of the same entity type are not supported.\n\n\n## Installation\n\nCreate and activate a virtual environment. Then, install the package via pip:\n```bash\npip install emoji-annotations\n```\n\n\n## Supported tasks\n\nEmoji annotations are best suited for tasks that are typically approached with sequence labeling, such as named entity recognition (NER). You can also use them for relation extraction, template filling, or event extraction, provided that you use relation-specific tagging and only annotate one n-ary relation, template, or event per record.\n\n\n## Usage\n\nCreate a new emoji annotation object using whichever emoji mapping you prefer. A mapping is a dictionary that associates entity types with emojis.\n\n```python\nfrom emoji_annotations import EmojiAnnotator\nemoji_mapping = {\n        \"artwork\": \"\ud83c\udfa8\",\n        \"painter\": \"\ud83d\udc68\u200d\ud83c\udfa8\",\n        \"museum\": \"\ud83c\udfdb\ufe0f\",\n        \"location\": \"\ud83d\udccd\",\n        \"year\": \"\ud83d\udcc6\",\n    }\nemoji_nlp = EmojiAnnotator(emoji_mapping)\n```\n**Convert annotated text to plain text and annnoations as char offsets.**\n```python\ntext = \"The \ud83c\udfa8Mona Lisa\ud83c\udfa8 is believed to have been painted by \ud83d\udc68\u200d\ud83c\udfa8Leonardo da Vinci\ud83d\udc68\u200d\ud83c\udfa8 between \ud83d\udcc61503\ud83d\udcc6 and \ud83d\udcc61506\ud83d\udcc6 and is now displayed in the \ud83d\udccd\ud83c\udfdb\ufe0fLouvre\ud83c\udfdb\ufe0f, Paris\ud83d\udccd.\"\nplain_text, annotations = emoji_nlp.from_inline_annotations(text)\nprint(plain_text)\nprint(annotations)\n```\n```bash\nThe Mona Lisa is believed to have been painted by Leonardo da Vinci between 1503 and 1506 and is now displayed in the Louvre, Paris.\n{'artwork': [(4, 13)], 'painter': [(50, 67)], 'year': [(76, 80), (85, 89)], 'location': [(118, 131)], 'museum': [(118, 124)]}\n```\n\n**Convert plain text and annotations back to annotated text.**\n```python\nannotated_text = emoji_nlp.to_inline_annotations(plain_text, annotations)\nprint(annotated_text)\n```\n```bash\nThe \ud83c\udfa8Mona Lisa\ud83c\udfa8 is believed to have been painted by \ud83d\udc68\u200d\ud83c\udfa8Leonardo da Vinci\ud83d\udc68\u200d\ud83c\udfa8 between \ud83d\udcc61503\ud83d\udcc6 and \ud83d\udcc61506\ud83d\udcc6 and is now displayed in the \ud83d\udccd\ud83c\udfdb\ufe0fLouvre\ud83c\udfdb\ufe0f, Paris\ud83d\udccd.\n```\n\nIf two emoji have the same character offset, the emoji that closes the active annotation is placed first. If the order was different in the original text, the order is not preserved (e.g., \"\ud83c\udfdb\ufe0f\ud83d\udccdLouvre\ud83c\udfdb\ufe0f, Paris\ud83d\udccd\" would become \"\ud83d\udccd\ud83c\udfdb\ufe0fLouvre\ud83c\udfdb\ufe0f, Paris\ud83d\udccd\").\n\n**Use the command line to curate annotations** by integrating `emoji_nlp.get_user_feedback()` in a Python script. This function will prompt the user to confirm or edit the annotations in the text.\n```bash\nComputer says \ud83d\uddef\ufe0f\n\n\ud83c\udf36\ufe0fAndalusia\ud83c\udf36\ufe0f has a \ud83c\udf4asurface area\ud83c\udf4a of \ud83c\udf4f87,597\ud83c\udf4f \ud83c\udf53square kilometres\ud83c\udf53.\n\nCorrect? y/n\n(To edit the n-th annotation, enter its number n, e.g. '3', press enter, use the arrow keys to move it, press enter to see the changes, and press enter again to confirm the changes. To delete all annotations press 'd'.)\n```\n```\nUser input: 3 \n```\n```\n\ud83c\udf36\ufe0fAndalusia\ud83c\udf36\ufe0f has a \ud83d\udd3bsurface area\ud83c\udf4a of \ud83c\udf4f87,597\ud83c\udf4f \ud83c\udf53square kilometres\ud83c\udf53.\n```\n```\nUser input: \u2192 \u2192\n```\n```\n\ud83c\udf36\ufe0fAndalusia\ud83c\udf36\ufe0f has a su\ud83d\udd3brface area\ud83c\udf4a of \ud83c\udf4f87,597\ud83c\udf4f \ud83c\udf53square kilometres\ud83c\udf53.\n\nCorrect? y/n\n```\n\n\n## Comparison of NER annotation formats\n\nComparing different NER annotation formats, we can see that the colorful emoji annotations are much easier to read and edit than the other formats. While this small example already makes the difference obvious, it becomes even more pronounced with larger datasets.\n\n### _Colorful emoji annotations_\n```txt\n\ud83c\udfe2U.N.\ud83c\udfe2 official \ud83d\ude4bEkeus\ud83d\ude4b heads for \ud83d\udccdBaghdad\ud83d\udccd.\n```\n\n### _CoNLL 2003 NER format_\n(https://www.cnts.ua.ac.be/conll2003/ner/)\n```conll\nU.N.         NNP  I-NP  I-ORG \nofficial     NN   I-NP  O \nEkeus        NNP  I-NP  I-PER \nheads        VBZ  I-VP  O \nfor          IN   I-PP  O \nBaghdad      NNP  I-NP  I-LOC \n.            .    O     O \n```\n\n### _brat standoff format_\n(https://brat.nlplab.org/standoff.html)\n\n```brat\nT1  Organization 0 3 U.N.\nT2  Person 4 10 Ekeus\nT3  Location 20 28 Baghdad\n```\n\n### _XML-based formats_\n```xml\t\n<p><EM ID=\"1\" CATEG=\"ORGANIZATION\">U.N.</EM> official <EM ID=\"2\" CATEG=\"PERSON\">Ekeus</EM> heads for <EM ID=\"3\" CATEG=\"LOCATION\">Baghdad</EM>.</p>\n```\n\n\n## Development\nTo update the list of supported emojis, run\n```bash\npython src/emoji_annotations/scripts/update_emoji_list.py\n```\n\n\n## About Us \n\n<a href=\"https://www.fz-juelich.de/en/ice/ice-2\"><img src=\"https://github.com/FZJ-IEK3-VSA/README_assets/blob/main/iek3-square.png?raw=True\" alt=\"Institute image ICE-2\" width=\"280\" align=\"right\" style=\"margin:0px 10px\"/></a>\n\nWe are the <a href=\"https://www.fz-juelich.de/en/ice/ice-2\">Institute of Climate and Energy Systems (ICE) - J\u00fclich Systems Analysis</a> belonging to the <a href=\"https://www.fz-juelich.de/en\">Forschungszentrum J\u00fclich</a>. Our interdisciplinary department's research is focusing on energy-related process and systems analyses. Data searches and system simulations are used to determine energy and mass balances, as well as to evaluate performance, emissions and costs of energy systems. The results are used for performing comparative assessment studies between the various systems. Our current priorities include the development of energy strategies, in accordance with the German Federal Government\u2019s greenhouse gas reduction targets, by designing new infrastructures for sustainable and secure energy supply chains and by conducting cost analysis studies for integrating new technologies into future energy market frameworks.\n\n## Acknowledgements\n\nThe authors would like to thank the German Federal Government, the German state governments, and the Joint Science Conference (GWK) for their funding and support as part of the NFDI4Ing consortium. Funded by the German Research Foundation (DFG) \u2013 project number: 442146713. Furthermore, this work was supported by the Helmholtz Association under the program \"Energy System Design\".\n\n<p float=\"left\">\n    <a href=\"https://nfdi4ing.de/\"><img src=\"https://nfdi4ing.de/wp-content/uploads/2018/09/logo.svg\" alt=\"NFDI4Ing Logo\" width=\"130px\"></a>&emsp;<a href=\"https://www.helmholtz.de/en/\"><img src=\"https://www.helmholtz.de/fileadmin/user_upload/05_aktuelles/Marke_Design/logos/HG_LOGO_S_ENG_RGB.jpg\" alt=\"Helmholtz Logo\" width=\"200px\"></a>\n</p>\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Easy to read and edit text annotations for NLP tasks.",
    "version": "0.0.0",
    "project_urls": {
        "Homepage": "https://github.com/FZJ-IEK3-VSA/emoji-annotation-tool",
        "Issues": "https://github.com/FZJ-IEK3-VSA/emoji-annotation-tool/issues"
    },
    "split_keywords": [
        "annotation tool",
        " annotations",
        " nlp",
        " natural language processing",
        " ner",
        " named entity recognition",
        " sequence labeling",
        " information extraction",
        " emoji"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "947addcadc94c7c4e9553273523f998e3efa01c7a7e83f7cf144c1fbccc43039",
                "md5": "8b8ce1dc50991444633caf4b5d3db018",
                "sha256": "3fc0f0149d7f0a6ea6f6a5816a6478a819709072512c770a65557c40610dafda"
            },
            "downloads": -1,
            "filename": "emoji_annotations-0.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8b8ce1dc50991444633caf4b5d3db018",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 26456,
            "upload_time": "2025-10-29T14:40:54",
            "upload_time_iso_8601": "2025-10-29T14:40:54.207558Z",
            "url": "https://files.pythonhosted.org/packages/94/7a/ddcadc94c7c4e9553273523f998e3efa01c7a7e83f7cf144c1fbccc43039/emoji_annotations-0.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "995296c216bc4eab5889b8886783b9be9b63ccd4ce2b4a49f185d98c91959732",
                "md5": "467095228d2e7c452898dbd22d53f389",
                "sha256": "a0798f7eaa77f6a0403263e14a87b60d40a61ab4cda9b6dd27767a326a50c608"
            },
            "downloads": -1,
            "filename": "emoji_annotations-0.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "467095228d2e7c452898dbd22d53f389",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 31678,
            "upload_time": "2025-10-29T14:40:55",
            "upload_time_iso_8601": "2025-10-29T14:40:55.870820Z",
            "url": "https://files.pythonhosted.org/packages/99/52/96c216bc4eab5889b8886783b9be9b63ccd4ce2b4a49f185d98c91959732/emoji_annotations-0.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-29 14:40:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "FZJ-IEK3-VSA",
    "github_project": "emoji-annotation-tool",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "emoji-annotations"
}
        
Elapsed time: 1.45137s