linkify-it-py


Namelinkify-it-py JSON
Version 2.0.3 PyPI version JSON
download
home_page
SummaryLinks recognition library with FULL unicode support.
upload_time2024-02-04 14:48:04
maintainer
docs_urlNone
authortsutsu3
requires_python>=3.7
licenseMIT
keywords linkify linkifier autolink autolinker
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # linkify-it-py

[![CI](https://github.com/tsutsu3/linkify-it-py/workflows/CI/badge.svg?branch=main)](https://github.com/tsutsu3/linkify-it-py/actions)
[![pypi](https://img.shields.io/pypi/v/linkify-it-py)](https://pypi.org/project/linkify-it-py/)
[![Anaconda-Server Badge](https://anaconda.org/conda-forge/linkify-it-py/badges/version.svg)](https://anaconda.org/conda-forge/linkify-it-py)
[![Documentation Status](https://readthedocs.org/projects/linkify-it-py/badge/?version=latest)](https://linkify-it-py.readthedocs.io/en/latest/?badge=latest)
[![codecov](https://codecov.io/gh/tsutsu3/linkify-it-py/branch/main/graph/badge.svg)](https://codecov.io/gh/tsutsu3/linkify-it-py)
[![Maintainability](https://api.codeclimate.com/v1/badges/6341fd3ec5f05fde392f/maintainability)](https://codeclimate.com/github/tsutsu3/linkify-it-py/maintainability)

This is Python port of [linkify-it](https://github.com/markdown-it/linkify-it).

> Links recognition library with FULL unicode support.
> Focused on high quality link patterns detection in plain text.

__[Demo](https://linkify-it-py-demo.vercel.app/)__

__[Javascript Demo](http://markdown-it.github.io/linkify-it/)__

Why it's awesome:

- Full unicode support, _with astral characters_!
- International domains support.
- Allows rules extension & custom normalizers.


## Install

```bash
pip install linkify-it-py
```

or

```bash
conda install -c conda-forge linkify-it-py
```

## Usage examples

### Example 1. Simple use

```python
from linkify_it import LinkifyIt


linkify = LinkifyIt()

print(linkify.test("Site github.com!"))
# => True

print(linkify.match("Site github.com!"))
# => [linkify_it.main.Match({
#         'schema': '',
#         'index': 5,
#         'last_index': 15,
#         'raw': 'github.com',
#         'text': 'github.com',
#         'url': 'http://github.com'
#     }]
```

### Example 2. With options

```python
from linkify_it import LinkifyIt
from linkify_it.tlds import TLDS


# Reload full tlds list & add unofficial `.onion` domain.
linkify = (
    LinkifyIt()
    .tlds(TLDS)               # Reload with full tlds list
    .tlds("onion", True)      # Add unofficial `.onion` domain
    .add("git:", "http:")     # Add `git:` protocol as "alias"
    .add("ftp:", None)        # Disable `ftp:` protocol
    .set({"fuzzy_ip": True})  # Enable IPs in fuzzy links (without schema)
)
print(linkify.test("Site tamanegi.onion!"))
# => True

print(linkify.match("Site tamanegi.onion!"))
# => [linkify_it.main.Match({
#         'schema': '',
#         'index': 5,
#         'last_index': 19,
#         'raw': 'tamanegi.onion',
#         'text': 'tamanegi.onion',
#         'url': 'http://tamanegi.onion'
#     }]
```

### Example 3. Add twitter mentions handler

```python
from linkify_it import LinkifyIt


linkify = LinkifyIt()

def validate(obj, text, pos):
    tail = text[pos:]

    if not obj.re.get("twitter"):
        obj.re["twitter"] = re.compile(
            "^([a-zA-Z0-9_]){1,15}(?!_)(?=$|" + obj.re["src_ZPCc"] + ")"
        )
    if obj.re["twitter"].search(tail):
        if pos > 2 and tail[pos - 2] == "@":
            return False
        return len(obj.re["twitter"].search(tail).group())
    return 0

def normalize(obj, match):
    match.url = "https://twitter.com/" + re.sub(r"^@", "", match.url)

linkify.add("@", {"validate": validate, "normalize": normalize})
```


## API

[API documentation](https://linkify-it-py.readthedocs.io/en/latest/)

### LinkifyIt(schemas, options)

Creates new linkifier instance with optional additional schemas.

By default understands:

- `http(s)://...` , `ftp://...`, `mailto:...` & `//...` links
- "fuzzy" links and emails (google.com, foo@bar.com).

`schemas` is an dict, where each key/value describes protocol/rule:

- __key__ - link prefix (usually, protocol name with `:` at the end, `skype:`
  for example). `linkify-it-py` makes sure that prefix is not preceded with
  alphanumeric char.
- __value__ - rule to check tail after link prefix
  - _str_
    - just alias to existing rule
  - _dict_
    - _validate_ - either a `re.Pattern` (start with `^`, and don't include the
      link prefix itself), or a validator `function` which, given arguments
      _self_, _text_ and _pos_, returns the length of a match in _text_
      starting at index _pos_.  _pos_ is the index right after the link prefix.
      _self_ can be used to access the linkify object to cache data.
    - _normalize_ - optional function to normalize text & url of matched result
      (for example, for twitter mentions).

`options`:

- __fuzzy_link__ - recognize URL-s without `http(s)://` head. Default `True`.
- __fuzzy_ip__ - allow IPs in fuzzy links above. Can conflict with some texts
  like version numbers. Default `False`.
- __fuzzy_email__ - recognize emails without `mailto:` prefix. Default `True`.
- __---__ - set `True` to terminate link with `---` (if it's considered as long dash).


### .test(text)

Searches linkifiable pattern and returns `True` on success or `False` on fail.


### .pretest(text)

Quick check if link MAY BE can exist. Can be used to optimize more expensive
`.test()` calls. Return `False` if link can not be found, `True` - if `.test()`
call needed to know exactly.


### .test_schema_at(text, name, position)

Similar to `.test()` but checks only specific protocol tail exactly at given
position. Returns length of found pattern (0 on fail).


### .match(text)

Returns `list` of found link matches or null if nothing found.

Each match has:

- __schema__ - link schema, can be empty for fuzzy links, or `//` for
  protocol-neutral links.
- __index__ - offset of matched text
- __last_index__ - index of next char after mathch end
- __raw__ - matched text
- __text__ - normalized text
- __url__ - link, generated from matched text

### .matchAtStart(text)

Checks if a match exists at the start of the string. Returns `Match`
(see docs for `match(text)`) or null if no URL is at the start.
Doesn't work with fuzzy links.

### .tlds(list_tlds, keep_old=False)

Load (or merge) new tlds list. Those are needed for fuzzy links (without schema)
to avoid false positives. By default:

- 2-letter root zones are ok.
- biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф are ok.
- encoded (`xn--...`) root zones are ok.

If that's not enough, you can reload defaults with more detailed zones list.

### .add(key, value)

Add a new schema to the schemas object. As described in the constructor
definition, `key` is a link prefix (`skype:`, for example), and `value`
is a `str` to alias to another schema, or an `dict` with `validate` and
optionally `normalize` definitions.  To disable an existing rule, use
`.add(key, None)`.


### .set(options)

Override default options. Missed properties will not be changed.


## License

[MIT](https://github.com/tsutsu3/linkify-it-py/blob/master/LICENSE)

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "linkify-it-py",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "linkify,linkifier,autolink,autolinker",
    "author": "tsutsu3",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/2a/ae/bb56c6828e4797ba5a4821eec7c43b8bf40f69cda4d4f5f8c8a2810ec96a/linkify-it-py-2.0.3.tar.gz",
    "platform": null,
    "description": "# linkify-it-py\n\n[![CI](https://github.com/tsutsu3/linkify-it-py/workflows/CI/badge.svg?branch=main)](https://github.com/tsutsu3/linkify-it-py/actions)\n[![pypi](https://img.shields.io/pypi/v/linkify-it-py)](https://pypi.org/project/linkify-it-py/)\n[![Anaconda-Server Badge](https://anaconda.org/conda-forge/linkify-it-py/badges/version.svg)](https://anaconda.org/conda-forge/linkify-it-py)\n[![Documentation Status](https://readthedocs.org/projects/linkify-it-py/badge/?version=latest)](https://linkify-it-py.readthedocs.io/en/latest/?badge=latest)\n[![codecov](https://codecov.io/gh/tsutsu3/linkify-it-py/branch/main/graph/badge.svg)](https://codecov.io/gh/tsutsu3/linkify-it-py)\n[![Maintainability](https://api.codeclimate.com/v1/badges/6341fd3ec5f05fde392f/maintainability)](https://codeclimate.com/github/tsutsu3/linkify-it-py/maintainability)\n\nThis is Python port of [linkify-it](https://github.com/markdown-it/linkify-it).\n\n> Links recognition library with FULL unicode support.\n> Focused on high quality link patterns detection in plain text.\n\n__[Demo](https://linkify-it-py-demo.vercel.app/)__\n\n__[Javascript Demo](http://markdown-it.github.io/linkify-it/)__\n\nWhy it's awesome:\n\n- Full unicode support, _with astral characters_!\n- International domains support.\n- Allows rules extension & custom normalizers.\n\n\n## Install\n\n```bash\npip install linkify-it-py\n```\n\nor\n\n```bash\nconda install -c conda-forge linkify-it-py\n```\n\n## Usage examples\n\n### Example 1. Simple use\n\n```python\nfrom linkify_it import LinkifyIt\n\n\nlinkify = LinkifyIt()\n\nprint(linkify.test(\"Site github.com!\"))\n# => True\n\nprint(linkify.match(\"Site github.com!\"))\n# => [linkify_it.main.Match({\n#         'schema': '',\n#         'index': 5,\n#         'last_index': 15,\n#         'raw': 'github.com',\n#         'text': 'github.com',\n#         'url': 'http://github.com'\n#     }]\n```\n\n### Example 2. With options\n\n```python\nfrom linkify_it import LinkifyIt\nfrom linkify_it.tlds import TLDS\n\n\n# Reload full tlds list & add unofficial `.onion` domain.\nlinkify = (\n    LinkifyIt()\n    .tlds(TLDS)               # Reload with full tlds list\n    .tlds(\"onion\", True)      # Add unofficial `.onion` domain\n    .add(\"git:\", \"http:\")     # Add `git:` protocol as \"alias\"\n    .add(\"ftp:\", None)        # Disable `ftp:` protocol\n    .set({\"fuzzy_ip\": True})  # Enable IPs in fuzzy links (without schema)\n)\nprint(linkify.test(\"Site tamanegi.onion!\"))\n# => True\n\nprint(linkify.match(\"Site tamanegi.onion!\"))\n# => [linkify_it.main.Match({\n#         'schema': '',\n#         'index': 5,\n#         'last_index': 19,\n#         'raw': 'tamanegi.onion',\n#         'text': 'tamanegi.onion',\n#         'url': 'http://tamanegi.onion'\n#     }]\n```\n\n### Example 3. Add twitter mentions handler\n\n```python\nfrom linkify_it import LinkifyIt\n\n\nlinkify = LinkifyIt()\n\ndef validate(obj, text, pos):\n    tail = text[pos:]\n\n    if not obj.re.get(\"twitter\"):\n        obj.re[\"twitter\"] = re.compile(\n            \"^([a-zA-Z0-9_]){1,15}(?!_)(?=$|\" + obj.re[\"src_ZPCc\"] + \")\"\n        )\n    if obj.re[\"twitter\"].search(tail):\n        if pos > 2 and tail[pos - 2] == \"@\":\n            return False\n        return len(obj.re[\"twitter\"].search(tail).group())\n    return 0\n\ndef normalize(obj, match):\n    match.url = \"https://twitter.com/\" + re.sub(r\"^@\", \"\", match.url)\n\nlinkify.add(\"@\", {\"validate\": validate, \"normalize\": normalize})\n```\n\n\n## API\n\n[API documentation](https://linkify-it-py.readthedocs.io/en/latest/)\n\n### LinkifyIt(schemas, options)\n\nCreates new linkifier instance with optional additional schemas.\n\nBy default understands:\n\n- `http(s)://...` , `ftp://...`, `mailto:...` & `//...` links\n- \"fuzzy\" links and emails (google.com, foo@bar.com).\n\n`schemas` is an dict, where each key/value describes protocol/rule:\n\n- __key__ - link prefix (usually, protocol name with `:` at the end, `skype:`\n  for example). `linkify-it-py` makes sure that prefix is not preceded with\n  alphanumeric char.\n- __value__ - rule to check tail after link prefix\n  - _str_\n    - just alias to existing rule\n  - _dict_\n    - _validate_ - either a `re.Pattern` (start with `^`, and don't include the\n      link prefix itself), or a validator `function` which, given arguments\n      _self_, _text_ and _pos_, returns the length of a match in _text_\n      starting at index _pos_.  _pos_ is the index right after the link prefix.\n      _self_ can be used to access the linkify object to cache data.\n    - _normalize_ - optional function to normalize text & url of matched result\n      (for example, for twitter mentions).\n\n`options`:\n\n- __fuzzy_link__ - recognize URL-s without `http(s)://` head. Default `True`.\n- __fuzzy_ip__ - allow IPs in fuzzy links above. Can conflict with some texts\n  like version numbers. Default `False`.\n- __fuzzy_email__ - recognize emails without `mailto:` prefix. Default `True`.\n- __---__ - set `True` to terminate link with `---` (if it's considered as long dash).\n\n\n### .test(text)\n\nSearches linkifiable pattern and returns `True` on success or `False` on fail.\n\n\n### .pretest(text)\n\nQuick check if link MAY BE can exist. Can be used to optimize more expensive\n`.test()` calls. Return `False` if link can not be found, `True` - if `.test()`\ncall needed to know exactly.\n\n\n### .test_schema_at(text, name, position)\n\nSimilar to `.test()` but checks only specific protocol tail exactly at given\nposition. Returns length of found pattern (0 on fail).\n\n\n### .match(text)\n\nReturns `list` of found link matches or null if nothing found.\n\nEach match has:\n\n- __schema__ - link schema, can be empty for fuzzy links, or `//` for\n  protocol-neutral links.\n- __index__ - offset of matched text\n- __last_index__ - index of next char after mathch end\n- __raw__ - matched text\n- __text__ - normalized text\n- __url__ - link, generated from matched text\n\n### .matchAtStart(text)\n\nChecks if a match exists at the start of the string. Returns `Match`\n(see docs for `match(text)`) or null if no URL is at the start.\nDoesn't work with fuzzy links.\n\n### .tlds(list_tlds, keep_old=False)\n\nLoad (or merge) new tlds list. Those are needed for fuzzy links (without schema)\nto avoid false positives. By default:\n\n- 2-letter root zones are ok.\n- biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|\u0440\u0444 are ok.\n- encoded (`xn--...`) root zones are ok.\n\nIf that's not enough, you can reload defaults with more detailed zones list.\n\n### .add(key, value)\n\nAdd a new schema to the schemas object. As described in the constructor\ndefinition, `key` is a link prefix (`skype:`, for example), and `value`\nis a `str` to alias to another schema, or an `dict` with `validate` and\noptionally `normalize` definitions.  To disable an existing rule, use\n`.add(key, None)`.\n\n\n### .set(options)\n\nOverride default options. Missed properties will not be changed.\n\n\n## License\n\n[MIT](https://github.com/tsutsu3/linkify-it-py/blob/master/LICENSE)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Links recognition library with FULL unicode support.",
    "version": "2.0.3",
    "project_urls": {
        "Homepage": "https://github.com/tsutsu3/linkify-it-py"
    },
    "split_keywords": [
        "linkify",
        "linkifier",
        "autolink",
        "autolinker"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "041eb832de447dee8b582cac175871d2f6c3d5077cc56d5575cadba1fd1cccfa",
                "md5": "d2740ba84a6fd9fc626b058ac67a8aa8",
                "sha256": "6bcbc417b0ac14323382aef5c5192c0075bf8a9d6b41820a2b66371eac6b6d79"
            },
            "downloads": -1,
            "filename": "linkify_it_py-2.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d2740ba84a6fd9fc626b058ac67a8aa8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 19820,
            "upload_time": "2024-02-04T14:48:02",
            "upload_time_iso_8601": "2024-02-04T14:48:02.496466Z",
            "url": "https://files.pythonhosted.org/packages/04/1e/b832de447dee8b582cac175871d2f6c3d5077cc56d5575cadba1fd1cccfa/linkify_it_py-2.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2aaebb56c6828e4797ba5a4821eec7c43b8bf40f69cda4d4f5f8c8a2810ec96a",
                "md5": "77789d84906609eb68246a43e9e3f626",
                "sha256": "68cda27e162e9215c17d786649d1da0021a451bdc436ef9e0fa0ba5234b9b048"
            },
            "downloads": -1,
            "filename": "linkify-it-py-2.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "77789d84906609eb68246a43e9e3f626",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 27946,
            "upload_time": "2024-02-04T14:48:04",
            "upload_time_iso_8601": "2024-02-04T14:48:04.179736Z",
            "url": "https://files.pythonhosted.org/packages/2a/ae/bb56c6828e4797ba5a4821eec7c43b8bf40f69cda4d4f5f8c8a2810ec96a/linkify-it-py-2.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-04 14:48:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tsutsu3",
    "github_project": "linkify-it-py",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "linkify-it-py"
}
        
Elapsed time: 0.22775s