rfc3987-syntax


Namerfc3987-syntax JSON
Version 1.1.0 PyPI version JSON
download
home_pageNone
SummaryHelper functions to syntactically validate strings according to RFC 3987.
upload_time2025-07-18 01:05:05
maintainerNone
docs_urlNone
authorJan Kowalleck
requires_python>=3.9
licenseNone
keywords rfc 3987 rfc3987 parser syntax validator
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # rfc3987-syntax

Helper functions to parse and validate the **syntax** of terms defined in **[RFC 3987](https://www.rfc-editor.org/info/rfc3987)** โ€” the IETF standard for Internationalized Resource Identifiers (IRIs).


## ๐ŸŽฏ Purpose

The goal of `rfc3987-syntax` is to provide a **lightweight, permissively licensed Python module** for validating that strings conform to the **ABNF grammar defined in RFC 3987**. These helpers are:

- โœ… Strictly aligned with the **syntax rules of RFC 3987**
- โœ… Built using a **permissive MIT license**
- โœ… Designed for both **open source and proprietary use**
- โœ… Powered by [Lark](https://github.com/lark-parser/lark), a fast, EBNF-based parser

> ๐Ÿง  **Note:** This project focuses on **syntax validation only**. RFC 3987 specifies **additional semantic rules** (e.g., Unicode normalization, BiDi constraints, percent-encoding requirements) that must be enforced separately.


## ๐Ÿ“„ License, Attribution, and Citation

**`rfc3987-syntax`** is licensed under the [MIT License](LICENSE), which allows reuse in both open source and commercial software.

This project:

- โŒ Does **not** depend on the `rfc3987` Python package (GPL-licensed)
- โœ… Uses [`lark`](https://github.com/lark-parser/lark), licensed under MIT
- โœ… Implements grammar from **[RFC 3987](https://datatracker.ietf.org/doc/html/rfc3987)**, using **[RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986)** where RFC 3987 delegates syntax

> โš ๏ธ This project is **not affiliated with or endorsed by** the authors of RFC 3987 or the `rfc3987` Python package.

Please cite this software in accordance with the enclosed CITATION.cff file.


## โš ๏ธ Limitations

The grammar and parser enforce **only the ABNF syntax** defined in RFC 3987. The following are **not validated** and must be handled separately for full compliance:

- โœ… Unicode **Normalization Form C (NFC)**
- โœ… Bidirectional text (**BiDi**) constraints (RFC 3987 ยง4.1)
- โœ… **Port number ranges** (must be 0โ€“65535)
- โœ… Valid **IPv6 compression** (only one `::`, max segments)
- โœ… Context-aware **percent-encoding** requirements

ChatGPT 40 was used during the original development process. Errors may exist due to this assistance. Additional review, testing, and bug fixes by human experts is welcome.


## ๐Ÿ“ฆ Installation

```bash
pip install rfc3987-syntax
```

## ๐Ÿ›  Usage

### List all supported "terms" (i.e., non-terminals and terminals within ABNF production rules) used to validate the syntax of an IRI according to RFC 3987

```python
from rfc3987_syntax import RFC3987_SYNTAX_TERMS

print("Supported terms:")
for term in RFC3987_SYNTAX_TERMS:
    print(term)
```

### Syntactically validate a string using the general-purpose validator

```python
from rfc3987_syntax import is_valid_syntax

if is_valid_syntax(term='iri', value='http://github.com'):
    print("โœ“ Valid IRI syntax")

if not is_valid_syntax(term='iri', value='bob'):
    print("โœ— Invalid IRI syntax")

if not is_valid_syntax(term='iri_reference', value='bob'):
    print("โœ“ Valid IRI-reference syntax")
```

### Alternatively, use term-specific helpers to validate RFC 3987 syntax.

```python
from rfc3987_syntax import is_valid_syntax_iri
from rfc3987_syntax import is_valid_syntax_iri_reference

if is_valid_syntax_iri('http://github.com'):
    print("โœ“ Valid IRI syntax")

if not is_valid_syntax_iri('bob'):
    print("โœ— Invalid IRI syntax")
    
if is_valid_syntax_iri_reference('bob'):
    print("โœ“ Valid IRI-reference syntax")
```

### Get the Lark parse tree for a syntax validation (useful for additional semantic validation)

```python
from rfc3987_syntax import parse

ptree: ParseTree = parse(term="iri", value="http://github.com")

print(ptree)
```

## ๐Ÿ“š Sources

This grammar was derived from:

- **[RFC 3987 โ€“ Internationalized Resource Identifiers (IRIs)]**  
  โ†’ Defines IRI syntax and extensions to URI (e.g. Unicode characters, `ucschar`)  
  โ†’ https://datatracker.ietf.org/doc/html/rfc3987

- **[RFC 3986 โ€“ Uniform Resource Identifier (URI): Generic Syntax)]**  
  โ†’ Provides reusable components like `scheme`, `authority`, `ipv4address`, etc.  
  โ†’ https://datatracker.ietf.org/doc/html/rfc3986

> ๐Ÿ“ When `RFC 3986` is listed as the source, it is **used in accordance with RFC 3987**, which explicitly references it for foundational elements.

### Rule-to-Source Mapping

| Rule/Component       | Source     | Notes |
|----------------------|------------|-------|
| `iri`                | RFC 3987   | Top-level IRI rule |
| `iri_reference`      | RFC 3987   | Top-level IRI Reference rule |
| `absolute_iri`       | RFC 3987   | Top-level Absolute IRI rule |
| `scheme`             | RFC 3986   | Referenced by RFC 3987 ยง2.2 |
| `ihier_part`         | RFC 3987   | IRI-specific hierarchy |
| `irelative_ref`      | RFC 3987   | IRI-specific relative ref |
| `irelative_part`     | RFC 3987   | IRI-specific relative part |
| `iauthority`         | RFC 3986   | Standard URI authority |
| `ipath_abempty`      | RFC 3986   | Path format variant |
| `ipath_absolute`     | RFC 3986   | Absolute path |
| `ipath_noscheme`     | RFC 3986   | Path disallowing scheme prefix |
| `ipath_rootless`     | RFC 3986   | Used in non-scheme contexts |
| `iquery`             | RFC 3987   | Query extension to URI |
| `ifragment`          | RFC 3987   | Fragment extension to URI |
| `ipchar`, `isegment` | RFC 3986   | Path characters and segments |
| `isegment_nz_nc`     | RFC 3987   | IRI-specific path constraint |
| `iunreserved`        | RFC 3987   | Includes `ucschar` |
| `ucschar`, `iprivate`| RFC 3987   | Unicode support |
| `sub_delims`         | RFC 3986   | Reserved characters |
| `ip_literal`         | RFC 3986   | IPv6 or IPvFuture in `[]` |
| `ipv6address`        | RFC 3986   | Expanded forms only |
| `ipvfuture`          | RFC 3986   | Forward-compatible |
| `ipv4address`        | RFC 3986   | Dotted-decimal IPv4 |
| `ls32`               | RFC 3986   | Final 32 bits of IPv6 |
| `h16`, `dec_octet`   | RFC 3986   | Hex and decimal chunks |
| `port`               | RFC 3986   | Optional numeric |
| `pct_encoded`        | RFC 3986   | Percent encoding (e.g. `%20`) |
| `alpha`, `digit`, `hexdig` | RFC 3986 | Character classes |

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rfc3987-syntax",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "RFC 3987, RFC3987, parser, syntax, validator",
    "author": "Jan Kowalleck",
    "author_email": "Will Riley <wanderingwill@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/2c/06/37c1a5557acf449e8e406a830a05bf885ac47d33270aec454ef78675008d/rfc3987_syntax-1.1.0.tar.gz",
    "platform": null,
    "description": "# rfc3987-syntax\n\nHelper functions to parse and validate the **syntax** of terms defined in **[RFC 3987](https://www.rfc-editor.org/info/rfc3987)** \u2014 the IETF standard for Internationalized Resource Identifiers (IRIs).\n\n\n## \ud83c\udfaf Purpose\n\nThe goal of `rfc3987-syntax` is to provide a **lightweight, permissively licensed Python module** for validating that strings conform to the **ABNF grammar defined in RFC 3987**. These helpers are:\n\n- \u2705 Strictly aligned with the **syntax rules of RFC 3987**\n- \u2705 Built using a **permissive MIT license**\n- \u2705 Designed for both **open source and proprietary use**\n- \u2705 Powered by [Lark](https://github.com/lark-parser/lark), a fast, EBNF-based parser\n\n> \ud83e\udde0 **Note:** This project focuses on **syntax validation only**. RFC 3987 specifies **additional semantic rules** (e.g., Unicode normalization, BiDi constraints, percent-encoding requirements) that must be enforced separately.\n\n\n## \ud83d\udcc4 License, Attribution, and Citation\n\n**`rfc3987-syntax`** is licensed under the [MIT License](LICENSE), which allows reuse in both open source and commercial software.\n\nThis project:\n\n- \u274c Does **not** depend on the `rfc3987` Python package (GPL-licensed)\n- \u2705 Uses [`lark`](https://github.com/lark-parser/lark), licensed under MIT\n- \u2705 Implements grammar from **[RFC 3987](https://datatracker.ietf.org/doc/html/rfc3987)**, using **[RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986)** where RFC 3987 delegates syntax\n\n> \u26a0\ufe0f This project is **not affiliated with or endorsed by** the authors of RFC 3987 or the `rfc3987` Python package.\n\nPlease cite this software in accordance with the enclosed CITATION.cff file.\n\n\n## \u26a0\ufe0f Limitations\n\nThe grammar and parser enforce **only the ABNF syntax** defined in RFC 3987. The following are **not validated** and must be handled separately for full compliance:\n\n- \u2705 Unicode **Normalization Form C (NFC)**\n- \u2705 Bidirectional text (**BiDi**) constraints (RFC 3987 \u00a74.1)\n- \u2705 **Port number ranges** (must be 0\u201365535)\n- \u2705 Valid **IPv6 compression** (only one `::`, max segments)\n- \u2705 Context-aware **percent-encoding** requirements\n\nChatGPT 40 was used during the original development process. Errors may exist due to this assistance. Additional review, testing, and bug fixes by human experts is welcome.\n\n\n## \ud83d\udce6 Installation\n\n```bash\npip install rfc3987-syntax\n```\n\n## \ud83d\udee0 Usage\n\n### List all supported \"terms\" (i.e., non-terminals and terminals within ABNF production rules) used to validate the syntax of an IRI according to RFC 3987\n\n```python\nfrom rfc3987_syntax import RFC3987_SYNTAX_TERMS\n\nprint(\"Supported terms:\")\nfor term in RFC3987_SYNTAX_TERMS:\n    print(term)\n```\n\n### Syntactically validate a string using the general-purpose validator\n\n```python\nfrom rfc3987_syntax import is_valid_syntax\n\nif is_valid_syntax(term='iri', value='http://github.com'):\n    print(\"\u2713 Valid IRI syntax\")\n\nif not is_valid_syntax(term='iri', value='bob'):\n    print(\"\u2717 Invalid IRI syntax\")\n\nif not is_valid_syntax(term='iri_reference', value='bob'):\n    print(\"\u2713 Valid IRI-reference syntax\")\n```\n\n### Alternatively, use term-specific helpers to validate RFC 3987 syntax.\n\n```python\nfrom rfc3987_syntax import is_valid_syntax_iri\nfrom rfc3987_syntax import is_valid_syntax_iri_reference\n\nif is_valid_syntax_iri('http://github.com'):\n    print(\"\u2713 Valid IRI syntax\")\n\nif not is_valid_syntax_iri('bob'):\n    print(\"\u2717 Invalid IRI syntax\")\n    \nif is_valid_syntax_iri_reference('bob'):\n    print(\"\u2713 Valid IRI-reference syntax\")\n```\n\n### Get the Lark parse tree for a syntax validation (useful for additional semantic validation)\n\n```python\nfrom rfc3987_syntax import parse\n\nptree: ParseTree = parse(term=\"iri\", value=\"http://github.com\")\n\nprint(ptree)\n```\n\n## \ud83d\udcda Sources\n\nThis grammar was derived from:\n\n- **[RFC 3987 \u2013 Internationalized Resource Identifiers (IRIs)]**  \n  \u2192 Defines IRI syntax and extensions to URI (e.g. Unicode characters, `ucschar`)  \n  \u2192 https://datatracker.ietf.org/doc/html/rfc3987\n\n- **[RFC 3986 \u2013 Uniform Resource Identifier (URI): Generic Syntax)]**  \n  \u2192 Provides reusable components like `scheme`, `authority`, `ipv4address`, etc.  \n  \u2192 https://datatracker.ietf.org/doc/html/rfc3986\n\n> \ud83d\udcdd When `RFC 3986` is listed as the source, it is **used in accordance with RFC 3987**, which explicitly references it for foundational elements.\n\n### Rule-to-Source Mapping\n\n| Rule/Component       | Source     | Notes |\n|----------------------|------------|-------|\n| `iri`                | RFC 3987   | Top-level IRI rule |\n| `iri_reference`      | RFC 3987   | Top-level IRI Reference rule |\n| `absolute_iri`       | RFC 3987   | Top-level Absolute IRI rule |\n| `scheme`             | RFC 3986   | Referenced by RFC 3987 \u00a72.2 |\n| `ihier_part`         | RFC 3987   | IRI-specific hierarchy |\n| `irelative_ref`      | RFC 3987   | IRI-specific relative ref |\n| `irelative_part`     | RFC 3987   | IRI-specific relative part |\n| `iauthority`         | RFC 3986   | Standard URI authority |\n| `ipath_abempty`      | RFC 3986   | Path format variant |\n| `ipath_absolute`     | RFC 3986   | Absolute path |\n| `ipath_noscheme`     | RFC 3986   | Path disallowing scheme prefix |\n| `ipath_rootless`     | RFC 3986   | Used in non-scheme contexts |\n| `iquery`             | RFC 3987   | Query extension to URI |\n| `ifragment`          | RFC 3987   | Fragment extension to URI |\n| `ipchar`, `isegment` | RFC 3986   | Path characters and segments |\n| `isegment_nz_nc`     | RFC 3987   | IRI-specific path constraint |\n| `iunreserved`        | RFC 3987   | Includes `ucschar` |\n| `ucschar`, `iprivate`| RFC 3987   | Unicode support |\n| `sub_delims`         | RFC 3986   | Reserved characters |\n| `ip_literal`         | RFC 3986   | IPv6 or IPvFuture in `[]` |\n| `ipv6address`        | RFC 3986   | Expanded forms only |\n| `ipvfuture`          | RFC 3986   | Forward-compatible |\n| `ipv4address`        | RFC 3986   | Dotted-decimal IPv4 |\n| `ls32`               | RFC 3986   | Final 32 bits of IPv6 |\n| `h16`, `dec_octet`   | RFC 3986   | Hex and decimal chunks |\n| `port`               | RFC 3986   | Optional numeric |\n| `pct_encoded`        | RFC 3986   | Percent encoding (e.g. `%20`) |\n| `alpha`, `digit`, `hexdig` | RFC 3986 | Character classes |\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Helper functions to syntactically validate strings according to RFC 3987.",
    "version": "1.1.0",
    "project_urls": {
        "Documentation": "https://github.com/willynilly/rfc3987-syntax#readme",
        "Homepage": "https://github.com/willynilly/rfc3987-syntax",
        "Issues": "https://github.com/willynilly/rfc3987-syntax/issues",
        "Source": "https://github.com/willynilly/rfc3987-syntax"
    },
    "split_keywords": [
        "rfc 3987",
        " rfc3987",
        " parser",
        " syntax",
        " validator"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7e7144ce230e1b7fadd372515a97e32a83011f906ddded8d03e3c6aafbdedbb7",
                "md5": "7fddc63551c99ee6e9f096903fcbb79d",
                "sha256": "6c3d97604e4c5ce9f714898e05401a0445a641cfa276432b0a648c80856f6a3f"
            },
            "downloads": -1,
            "filename": "rfc3987_syntax-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7fddc63551c99ee6e9f096903fcbb79d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 8046,
            "upload_time": "2025-07-18T01:05:03",
            "upload_time_iso_8601": "2025-07-18T01:05:03.843605Z",
            "url": "https://files.pythonhosted.org/packages/7e/71/44ce230e1b7fadd372515a97e32a83011f906ddded8d03e3c6aafbdedbb7/rfc3987_syntax-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2c0637c1a5557acf449e8e406a830a05bf885ac47d33270aec454ef78675008d",
                "md5": "b12f9966a7f15414812eb7c55ac13201",
                "sha256": "717a62cbf33cffdd16dfa3a497d81ce48a660ea691b1ddd7be710c22f00b4a0d"
            },
            "downloads": -1,
            "filename": "rfc3987_syntax-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "b12f9966a7f15414812eb7c55ac13201",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 14239,
            "upload_time": "2025-07-18T01:05:05",
            "upload_time_iso_8601": "2025-07-18T01:05:05.015375Z",
            "url": "https://files.pythonhosted.org/packages/2c/06/37c1a5557acf449e8e406a830a05bf885ac47d33270aec454ef78675008d/rfc3987_syntax-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-18 01:05:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "willynilly",
    "github_project": "rfc3987-syntax#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "rfc3987-syntax"
}
        
Elapsed time: 0.45219s