unimoji


Nameunimoji JSON
Version 2.0.0 PyPI version JSON
download
home_pagehttps://github.com/pxawtyy/unimoji
SummaryAccurately remove and replace emojis in text strings
upload_time2025-07-17 01:16:27
maintainerNone
docs_urlNone
authorMichael Kuwahara
requires_python>=3.8
licenseApache-2.0
keywords emoji emojis nlp natural langauge processing unicode unicode-emoji
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # unimoji

Accurately find or remove [emojis](https://en.wikipedia.org/wiki/Emoji) from a blob of text using
the latest data from the Unicode Consortium's [emoji code repository](https://unicode.org/Public/emoji/).

This library is based on the excellent work of [Brad Solomon](https://github.com/bsolomon1124/) and has been updated to support the latest Unicode emoji specifications with performance optimizations.

[![License](https://img.shields.io/github/license/pxawtyy/unimoji.svg)](https://github.com/pxawtyy/unimoji/blob/master/LICENSE)
[![PyPI](https://img.shields.io/pypi/v/unimoji.svg)](https://pypi.org/project/unimoji/)
[![Status](https://img.shields.io/pypi/status/unimoji.svg)](https://pypi.org/project/unimoji/)
[![Python](https://img.shields.io/pypi/pyversions/unimoji.svg)](https://pypi.org/project/unimoji)

-------

## About

`unimoji` is an updated and optimized version of the original [demoji](https://github.com/bsolomon1124/demoji) library by Brad Solomon. This library provides enhanced Unicode emoji support with the latest emoji specifications and improved performance.

**Key improvements over demoji:**
- **Unicode 17.0 support** - Latest emoji specifications (5200+ emojis)
- **Performance optimizations** - Faster `replace_with_desc()` with O(n) complexity
- **Modern Python support** - Python 3.8+ with type hints
- **Reduced dependencies** - No external runtime dependencies
- **Maintained API compatibility** - Easy drop-in replacement

**Credits:** Special thanks to Brad Solomon for the original `demoji` library which serves as the foundation for this project.

## Installation

```bash
pip install unimoji
```

### Migrating from demoji

`unimoji` is a drop-in replacement for `demoji`. Simply replace your imports:

```python
# Old
import demoji

# New
import unimoji as demoji  # or
import unimoji
```

## Basic Usage

`unimoji` exports several text-related functions for find-and-replace functionality with emojis:

```python
>>> tweet = """\
... #startspreadingthenews yankees win great start by ๐ŸŽ…๐Ÿพ going 5strong innings with 5kโ€™s๐Ÿ”ฅ ๐Ÿ‚
... solo homerun ๐ŸŒ‹๐ŸŒ‹ with 2 solo homeruns and๐Ÿ‘น 3run homerunโ€ฆ ๐Ÿคก ๐Ÿšฃ๐Ÿผ ๐Ÿ‘จ๐Ÿฝโ€โš–๏ธ with rbiโ€™s โ€ฆ ๐Ÿ”ฅ๐Ÿ”ฅ
... ๐Ÿ‡ฒ๐Ÿ‡ฝ and ๐Ÿ‡ณ๐Ÿ‡ฎ to close the game๐Ÿ”ฅ๐Ÿ”ฅ!!!โ€ฆ.
... WHAT A GAME!!..
... """
>>> unimoji.findall(tweet)
{
    "๐Ÿ”ฅ": "fire",
    "๐ŸŒ‹": "volcano",
    "๐Ÿ‘จ๐Ÿฝ\u200dโš–๏ธ": "man judge: medium skin tone",
    "๐ŸŽ…๐Ÿพ": "Santa Claus: medium-dark skin tone",
    "๐Ÿ‡ฒ๐Ÿ‡ฝ": "flag: Mexico",
    "๐Ÿ‘น": "ogre",
    "๐Ÿคก": "clown face",
    "๐Ÿ‡ณ๐Ÿ‡ฎ": "flag: Nicaragua",
    "๐Ÿšฃ๐Ÿผ": "person rowing boat: medium-light skin tone",
    "๐Ÿ‚": "ox",
}
```

See [below](#reference) for function API.

## Command-line Use

You can use `unimoji` or `python -m unimoji` to replace emojis
in file(s) or stdin with their `:code:` equivalents:

```bash
$ cat out.txt
All done! โœจ ๐Ÿฐ โœจ
$ unimoji out.txt
All done! :sparkles: :shortcake: :sparkles:

$ echo 'All done! โœจ ๐Ÿฐ โœจ' | unimoji
All done! :sparkles: :shortcake: :sparkles:

$ unimoji -
we didnt start the ๐Ÿ”ฅ
we didnt start the :fire:
```

## Reference

```python
findall(string: str) -> Dict[str, str]
```

Find emojis within `string`.  Return a mapping of `{emoji: description}`.

```python
findall_list(string: str, desc: bool = True) -> List[str]
```

Find emojis within `string`.  Return a list (with possible duplicates).

If `desc` is True, the list contains description codes.  If `desc` is False, the list contains emojis.

```python
replace(string: str, repl: str = "") -> str
```

Replace emojis in `string` with `repl`.

```python
replace_with_desc(string: str, sep: str = ":") -> str
```

Replace emojis in `string` with their description codes.  The codes are surrounded by `sep`.

```python
last_downloaded_timestamp() -> datetime.datetime
```

Show the timestamp of last download for the emoji data bundled with the package.

## Footnote: Emoji Sequences

Numerous emojis that look like single Unicode characters are actually multi-character sequences.  Examples:

- The keycap 2๏ธโƒฃ is actually 3 characters, U+0032 (the ASCII digit 2), U+FE0F (variation selector), and U+20E3 (combining enclosing keycap).
- The flag of Scotland 7 component characters, `b'\\U0001f3f4\\U000e0067\\U000e0062\\U000e0073\\U000e0063\\U000e0074\\U000e007f'` in full esaped notation.

(You can see any of these through `s.encode("unicode-escape")`.)

`unimoji` is careful to handle this and should find the full sequences rather than their incomplete subcomponents.

The way it does this it to sort emoji codes by their length, and then compile a concatenated regular expression that will greedily search for longer emojis first, falling back to shorter ones if not found.  This is not by any means a super-optimized way of searching as it has O(N<sup>2</sup>) properties, but the focus is on accuracy and completeness.

```python
>>> from pprint import pprint
>>> seq = """\
... I bet you didn't know that ๐Ÿ™‹, ๐Ÿ™‹โ€โ™‚๏ธ, and ๐Ÿ™‹โ€โ™€๏ธ are three different emojis.
... """
>>> pprint(seq.encode('unicode-escape'))  # Python 3
(b"I bet you didn't know that \\U0001f64b, \\U0001f64b\\u200d\\u2642\\ufe0f,"
 b' and \\U0001f64b\\u200d\\u2640\\ufe0f are three different emojis.\\n')
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pxawtyy/unimoji",
    "name": "unimoji",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "emoji, emojis, nlp, natural langauge processing, unicode, unicode-emoji",
    "author": "Michael Kuwahara",
    "author_email": "prod.samiichy@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/7f/df/07e139cd4ada8038fce4165fb5817fb267da5d0b27922c834f238474608f/unimoji-2.0.0.tar.gz",
    "platform": null,
    "description": "# unimoji\r\n\r\nAccurately find or remove [emojis](https://en.wikipedia.org/wiki/Emoji) from a blob of text using\r\nthe latest data from the Unicode Consortium's [emoji code repository](https://unicode.org/Public/emoji/).\r\n\r\nThis library is based on the excellent work of [Brad Solomon](https://github.com/bsolomon1124/) and has been updated to support the latest Unicode emoji specifications with performance optimizations.\r\n\r\n[![License](https://img.shields.io/github/license/pxawtyy/unimoji.svg)](https://github.com/pxawtyy/unimoji/blob/master/LICENSE)\r\n[![PyPI](https://img.shields.io/pypi/v/unimoji.svg)](https://pypi.org/project/unimoji/)\r\n[![Status](https://img.shields.io/pypi/status/unimoji.svg)](https://pypi.org/project/unimoji/)\r\n[![Python](https://img.shields.io/pypi/pyversions/unimoji.svg)](https://pypi.org/project/unimoji)\r\n\r\n-------\r\n\r\n## About\r\n\r\n`unimoji` is an updated and optimized version of the original [demoji](https://github.com/bsolomon1124/demoji) library by Brad Solomon. This library provides enhanced Unicode emoji support with the latest emoji specifications and improved performance.\r\n\r\n**Key improvements over demoji:**\r\n- **Unicode 17.0 support** - Latest emoji specifications (5200+ emojis)\r\n- **Performance optimizations** - Faster `replace_with_desc()` with O(n) complexity\r\n- **Modern Python support** - Python 3.8+ with type hints\r\n- **Reduced dependencies** - No external runtime dependencies\r\n- **Maintained API compatibility** - Easy drop-in replacement\r\n\r\n**Credits:** Special thanks to Brad Solomon for the original `demoji` library which serves as the foundation for this project.\r\n\r\n## Installation\r\n\r\n```bash\r\npip install unimoji\r\n```\r\n\r\n### Migrating from demoji\r\n\r\n`unimoji` is a drop-in replacement for `demoji`. Simply replace your imports:\r\n\r\n```python\r\n# Old\r\nimport demoji\r\n\r\n# New\r\nimport unimoji as demoji  # or\r\nimport unimoji\r\n```\r\n\r\n## Basic Usage\r\n\r\n`unimoji` exports several text-related functions for find-and-replace functionality with emojis:\r\n\r\n```python\r\n>>> tweet = \"\"\"\\\r\n... #startspreadingthenews yankees win great start by \ud83c\udf85\ud83c\udffe going 5strong innings with 5k\u2019s\ud83d\udd25 \ud83d\udc02\r\n... solo homerun \ud83c\udf0b\ud83c\udf0b with 2 solo homeruns and\ud83d\udc79 3run homerun\u2026 \ud83e\udd21 \ud83d\udea3\ud83c\udffc \ud83d\udc68\ud83c\udffd\u200d\u2696\ufe0f with rbi\u2019s \u2026 \ud83d\udd25\ud83d\udd25\r\n... \ud83c\uddf2\ud83c\uddfd and \ud83c\uddf3\ud83c\uddee to close the game\ud83d\udd25\ud83d\udd25!!!\u2026.\r\n... WHAT A GAME!!..\r\n... \"\"\"\r\n>>> unimoji.findall(tweet)\r\n{\r\n    \"\ud83d\udd25\": \"fire\",\r\n    \"\ud83c\udf0b\": \"volcano\",\r\n    \"\ud83d\udc68\ud83c\udffd\\u200d\u2696\ufe0f\": \"man judge: medium skin tone\",\r\n    \"\ud83c\udf85\ud83c\udffe\": \"Santa Claus: medium-dark skin tone\",\r\n    \"\ud83c\uddf2\ud83c\uddfd\": \"flag: Mexico\",\r\n    \"\ud83d\udc79\": \"ogre\",\r\n    \"\ud83e\udd21\": \"clown face\",\r\n    \"\ud83c\uddf3\ud83c\uddee\": \"flag: Nicaragua\",\r\n    \"\ud83d\udea3\ud83c\udffc\": \"person rowing boat: medium-light skin tone\",\r\n    \"\ud83d\udc02\": \"ox\",\r\n}\r\n```\r\n\r\nSee [below](#reference) for function API.\r\n\r\n## Command-line Use\r\n\r\nYou can use `unimoji` or `python -m unimoji` to replace emojis\r\nin file(s) or stdin with their `:code:` equivalents:\r\n\r\n```bash\r\n$ cat out.txt\r\nAll done! \u2728 \ud83c\udf70 \u2728\r\n$ unimoji out.txt\r\nAll done! :sparkles: :shortcake: :sparkles:\r\n\r\n$ echo 'All done! \u2728 \ud83c\udf70 \u2728' | unimoji\r\nAll done! :sparkles: :shortcake: :sparkles:\r\n\r\n$ unimoji -\r\nwe didnt start the \ud83d\udd25\r\nwe didnt start the :fire:\r\n```\r\n\r\n## Reference\r\n\r\n```python\r\nfindall(string: str) -> Dict[str, str]\r\n```\r\n\r\nFind emojis within `string`.  Return a mapping of `{emoji: description}`.\r\n\r\n```python\r\nfindall_list(string: str, desc: bool = True) -> List[str]\r\n```\r\n\r\nFind emojis within `string`.  Return a list (with possible duplicates).\r\n\r\nIf `desc` is True, the list contains description codes.  If `desc` is False, the list contains emojis.\r\n\r\n```python\r\nreplace(string: str, repl: str = \"\") -> str\r\n```\r\n\r\nReplace emojis in `string` with `repl`.\r\n\r\n```python\r\nreplace_with_desc(string: str, sep: str = \":\") -> str\r\n```\r\n\r\nReplace emojis in `string` with their description codes.  The codes are surrounded by `sep`.\r\n\r\n```python\r\nlast_downloaded_timestamp() -> datetime.datetime\r\n```\r\n\r\nShow the timestamp of last download for the emoji data bundled with the package.\r\n\r\n## Footnote: Emoji Sequences\r\n\r\nNumerous emojis that look like single Unicode characters are actually multi-character sequences.  Examples:\r\n\r\n- The keycap 2\ufe0f\u20e3 is actually 3 characters, U+0032 (the ASCII digit 2), U+FE0F (variation selector), and U+20E3 (combining enclosing keycap).\r\n- The flag of Scotland 7 component characters, `b'\\\\U0001f3f4\\\\U000e0067\\\\U000e0062\\\\U000e0073\\\\U000e0063\\\\U000e0074\\\\U000e007f'` in full esaped notation.\r\n\r\n(You can see any of these through `s.encode(\"unicode-escape\")`.)\r\n\r\n`unimoji` is careful to handle this and should find the full sequences rather than their incomplete subcomponents.\r\n\r\nThe way it does this it to sort emoji codes by their length, and then compile a concatenated regular expression that will greedily search for longer emojis first, falling back to shorter ones if not found.  This is not by any means a super-optimized way of searching as it has O(N<sup>2</sup>) properties, but the focus is on accuracy and completeness.\r\n\r\n```python\r\n>>> from pprint import pprint\r\n>>> seq = \"\"\"\\\r\n... I bet you didn't know that \ud83d\ude4b, \ud83d\ude4b\u200d\u2642\ufe0f, and \ud83d\ude4b\u200d\u2640\ufe0f are three different emojis.\r\n... \"\"\"\r\n>>> pprint(seq.encode('unicode-escape'))  # Python 3\r\n(b\"I bet you didn't know that \\\\U0001f64b, \\\\U0001f64b\\\\u200d\\\\u2642\\\\ufe0f,\"\r\n b' and \\\\U0001f64b\\\\u200d\\\\u2640\\\\ufe0f are three different emojis.\\\\n')\r\n```\r\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Accurately remove and replace emojis in text strings",
    "version": "2.0.0",
    "project_urls": {
        "Bug Reports": "https://github.com/pxawtyy/unimoji/issues",
        "Documentation": "https://github.com/pxawtyy/unimoji/blob/master/README.md",
        "Homepage": "https://github.com/pxawtyy/unimoji",
        "Source": "https://github.com/pxawtyy/unimoji"
    },
    "split_keywords": [
        "emoji",
        " emojis",
        " nlp",
        " natural langauge processing",
        " unicode",
        " unicode-emoji"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a2e393d744b181c9a9b246774da90236c70f9f9ff45bf69d46ae1194f386cd3a",
                "md5": "323008b6ece2a8dbcac3c415ce4ee51d",
                "sha256": "12afcadfd5f0b7475b1d97a688889efc9e1253a3930a53abc9722be240369493"
            },
            "downloads": -1,
            "filename": "unimoji-2.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "323008b6ece2a8dbcac3c415ce4ee51d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 45844,
            "upload_time": "2025-07-17T01:16:26",
            "upload_time_iso_8601": "2025-07-17T01:16:26.447758Z",
            "url": "https://files.pythonhosted.org/packages/a2/e3/93d744b181c9a9b246774da90236c70f9f9ff45bf69d46ae1194f386cd3a/unimoji-2.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7fdf07e139cd4ada8038fce4165fb5817fb267da5d0b27922c834f238474608f",
                "md5": "d1fa1d7bf1e1c8d56f9451ba275927f8",
                "sha256": "a259663a998ca53612ffdbe62f7a2b525a7f00f7306d2c49357ac4bf9c3ecd23"
            },
            "downloads": -1,
            "filename": "unimoji-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "d1fa1d7bf1e1c8d56f9451ba275927f8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 49586,
            "upload_time": "2025-07-17T01:16:27",
            "upload_time_iso_8601": "2025-07-17T01:16:27.923395Z",
            "url": "https://files.pythonhosted.org/packages/7f/df/07e139cd4ada8038fce4165fb5817fb267da5d0b27922c834f238474608f/unimoji-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-17 01:16:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pxawtyy",
    "github_project": "unimoji",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "unimoji"
}
        
Elapsed time: 0.53339s