# rfc3987-syntax
Helper functions to parse and validate the **syntax** of terms defined in **[RFC 3987](https://www.rfc-editor.org/info/rfc3987)** โ the IETF standard for Internationalized Resource Identifiers (IRIs).
## ๐ฏ Purpose
The goal of `rfc3987-syntax` is to provide a **lightweight, permissively licensed Python module** for validating that strings conform to the **ABNF grammar defined in RFC 3987**. These helpers are:
- โ
Strictly aligned with the **syntax rules of RFC 3987**
- โ
Built using a **permissive MIT license**
- โ
Designed for both **open source and proprietary use**
- โ
Powered by [Lark](https://github.com/lark-parser/lark), a fast, EBNF-based parser
> ๐ง **Note:** This project focuses on **syntax validation only**. RFC 3987 specifies **additional semantic rules** (e.g., Unicode normalization, BiDi constraints, percent-encoding requirements) that must be enforced separately.
## ๐ License, Attribution, and Citation
**`rfc3987-syntax`** is licensed under the [MIT License](LICENSE), which allows reuse in both open source and commercial software.
This project:
- โ Does **not** depend on the `rfc3987` Python package (GPL-licensed)
- โ
Uses [`lark`](https://github.com/lark-parser/lark), licensed under MIT
- โ
Implements grammar from **[RFC 3987](https://datatracker.ietf.org/doc/html/rfc3987)**, using **[RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986)** where RFC 3987 delegates syntax
> โ ๏ธ This project is **not affiliated with or endorsed by** the authors of RFC 3987 or the `rfc3987` Python package.
Please cite this software in accordance with the enclosed CITATION.cff file.
## โ ๏ธ Limitations
The grammar and parser enforce **only the ABNF syntax** defined in RFC 3987. The following are **not validated** and must be handled separately for full compliance:
- โ
Unicode **Normalization Form C (NFC)**
- โ
Bidirectional text (**BiDi**) constraints (RFC 3987 ยง4.1)
- โ
**Port number ranges** (must be 0โ65535)
- โ
Valid **IPv6 compression** (only one `::`, max segments)
- โ
Context-aware **percent-encoding** requirements
ChatGPT 40 was used during the original development process. Errors may exist due to this assistance. Additional review, testing, and bug fixes by human experts is welcome.
## ๐ฆ Installation
```bash
pip install rfc3987-syntax
```
## ๐ Usage
### List all supported "terms" (i.e., non-terminals and terminals within ABNF production rules) used to validate the syntax of an IRI according to RFC 3987
```python
from rfc3987_syntax import RFC3987_SYNTAX_TERMS
print("Supported terms:")
for term in RFC3987_SYNTAX_TERMS:
print(term)
```
### Syntactically validate a string using the general-purpose validator
```python
from rfc3987_syntax import is_valid_syntax
if is_valid_syntax(term='iri', value='http://github.com'):
print("โ Valid IRI syntax")
if not is_valid_syntax(term='iri', value='bob'):
print("โ Invalid IRI syntax")
if not is_valid_syntax(term='iri_reference', value='bob'):
print("โ Valid IRI-reference syntax")
```
### Alternatively, use term-specific helpers to validate RFC 3987 syntax.
```python
from rfc3987_syntax import is_valid_syntax_iri
from rfc3987_syntax import is_valid_syntax_iri_reference
if is_valid_syntax_iri('http://github.com'):
print("โ Valid IRI syntax")
if not is_valid_syntax_iri('bob'):
print("โ Invalid IRI syntax")
if is_valid_syntax_iri_reference('bob'):
print("โ Valid IRI-reference syntax")
```
### Get the Lark parse tree for a syntax validation (useful for additional semantic validation)
```python
from rfc3987_syntax import parse
ptree: ParseTree = parse(term="iri", value="http://github.com")
print(ptree)
```
## ๐ Sources
This grammar was derived from:
- **[RFC 3987 โ Internationalized Resource Identifiers (IRIs)]**
โ Defines IRI syntax and extensions to URI (e.g. Unicode characters, `ucschar`)
โ https://datatracker.ietf.org/doc/html/rfc3987
- **[RFC 3986 โ Uniform Resource Identifier (URI): Generic Syntax)]**
โ Provides reusable components like `scheme`, `authority`, `ipv4address`, etc.
โ https://datatracker.ietf.org/doc/html/rfc3986
> ๐ When `RFC 3986` is listed as the source, it is **used in accordance with RFC 3987**, which explicitly references it for foundational elements.
### Rule-to-Source Mapping
| Rule/Component | Source | Notes |
|----------------------|------------|-------|
| `iri` | RFC 3987 | Top-level IRI rule |
| `iri_reference` | RFC 3987 | Top-level IRI Reference rule |
| `absolute_iri` | RFC 3987 | Top-level Absolute IRI rule |
| `scheme` | RFC 3986 | Referenced by RFC 3987 ยง2.2 |
| `ihier_part` | RFC 3987 | IRI-specific hierarchy |
| `irelative_ref` | RFC 3987 | IRI-specific relative ref |
| `irelative_part` | RFC 3987 | IRI-specific relative part |
| `iauthority` | RFC 3986 | Standard URI authority |
| `ipath_abempty` | RFC 3986 | Path format variant |
| `ipath_absolute` | RFC 3986 | Absolute path |
| `ipath_noscheme` | RFC 3986 | Path disallowing scheme prefix |
| `ipath_rootless` | RFC 3986 | Used in non-scheme contexts |
| `iquery` | RFC 3987 | Query extension to URI |
| `ifragment` | RFC 3987 | Fragment extension to URI |
| `ipchar`, `isegment` | RFC 3986 | Path characters and segments |
| `isegment_nz_nc` | RFC 3987 | IRI-specific path constraint |
| `iunreserved` | RFC 3987 | Includes `ucschar` |
| `ucschar`, `iprivate`| RFC 3987 | Unicode support |
| `sub_delims` | RFC 3986 | Reserved characters |
| `ip_literal` | RFC 3986 | IPv6 or IPvFuture in `[]` |
| `ipv6address` | RFC 3986 | Expanded forms only |
| `ipvfuture` | RFC 3986 | Forward-compatible |
| `ipv4address` | RFC 3986 | Dotted-decimal IPv4 |
| `ls32` | RFC 3986 | Final 32 bits of IPv6 |
| `h16`, `dec_octet` | RFC 3986 | Hex and decimal chunks |
| `port` | RFC 3986 | Optional numeric |
| `pct_encoded` | RFC 3986 | Percent encoding (e.g. `%20`) |
| `alpha`, `digit`, `hexdig` | RFC 3986 | Character classes |
Raw data
{
"_id": null,
"home_page": null,
"name": "rfc3987-syntax",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "RFC 3987, RFC3987, parser, syntax, validator",
"author": "Jan Kowalleck",
"author_email": "Will Riley <wanderingwill@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/2c/06/37c1a5557acf449e8e406a830a05bf885ac47d33270aec454ef78675008d/rfc3987_syntax-1.1.0.tar.gz",
"platform": null,
"description": "# rfc3987-syntax\n\nHelper functions to parse and validate the **syntax** of terms defined in **[RFC 3987](https://www.rfc-editor.org/info/rfc3987)** \u2014 the IETF standard for Internationalized Resource Identifiers (IRIs).\n\n\n## \ud83c\udfaf Purpose\n\nThe goal of `rfc3987-syntax` is to provide a **lightweight, permissively licensed Python module** for validating that strings conform to the **ABNF grammar defined in RFC 3987**. These helpers are:\n\n- \u2705 Strictly aligned with the **syntax rules of RFC 3987**\n- \u2705 Built using a **permissive MIT license**\n- \u2705 Designed for both **open source and proprietary use**\n- \u2705 Powered by [Lark](https://github.com/lark-parser/lark), a fast, EBNF-based parser\n\n> \ud83e\udde0 **Note:** This project focuses on **syntax validation only**. RFC 3987 specifies **additional semantic rules** (e.g., Unicode normalization, BiDi constraints, percent-encoding requirements) that must be enforced separately.\n\n\n## \ud83d\udcc4 License, Attribution, and Citation\n\n**`rfc3987-syntax`** is licensed under the [MIT License](LICENSE), which allows reuse in both open source and commercial software.\n\nThis project:\n\n- \u274c Does **not** depend on the `rfc3987` Python package (GPL-licensed)\n- \u2705 Uses [`lark`](https://github.com/lark-parser/lark), licensed under MIT\n- \u2705 Implements grammar from **[RFC 3987](https://datatracker.ietf.org/doc/html/rfc3987)**, using **[RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986)** where RFC 3987 delegates syntax\n\n> \u26a0\ufe0f This project is **not affiliated with or endorsed by** the authors of RFC 3987 or the `rfc3987` Python package.\n\nPlease cite this software in accordance with the enclosed CITATION.cff file.\n\n\n## \u26a0\ufe0f Limitations\n\nThe grammar and parser enforce **only the ABNF syntax** defined in RFC 3987. The following are **not validated** and must be handled separately for full compliance:\n\n- \u2705 Unicode **Normalization Form C (NFC)**\n- \u2705 Bidirectional text (**BiDi**) constraints (RFC 3987 \u00a74.1)\n- \u2705 **Port number ranges** (must be 0\u201365535)\n- \u2705 Valid **IPv6 compression** (only one `::`, max segments)\n- \u2705 Context-aware **percent-encoding** requirements\n\nChatGPT 40 was used during the original development process. Errors may exist due to this assistance. Additional review, testing, and bug fixes by human experts is welcome.\n\n\n## \ud83d\udce6 Installation\n\n```bash\npip install rfc3987-syntax\n```\n\n## \ud83d\udee0 Usage\n\n### List all supported \"terms\" (i.e., non-terminals and terminals within ABNF production rules) used to validate the syntax of an IRI according to RFC 3987\n\n```python\nfrom rfc3987_syntax import RFC3987_SYNTAX_TERMS\n\nprint(\"Supported terms:\")\nfor term in RFC3987_SYNTAX_TERMS:\n print(term)\n```\n\n### Syntactically validate a string using the general-purpose validator\n\n```python\nfrom rfc3987_syntax import is_valid_syntax\n\nif is_valid_syntax(term='iri', value='http://github.com'):\n print(\"\u2713 Valid IRI syntax\")\n\nif not is_valid_syntax(term='iri', value='bob'):\n print(\"\u2717 Invalid IRI syntax\")\n\nif not is_valid_syntax(term='iri_reference', value='bob'):\n print(\"\u2713 Valid IRI-reference syntax\")\n```\n\n### Alternatively, use term-specific helpers to validate RFC 3987 syntax.\n\n```python\nfrom rfc3987_syntax import is_valid_syntax_iri\nfrom rfc3987_syntax import is_valid_syntax_iri_reference\n\nif is_valid_syntax_iri('http://github.com'):\n print(\"\u2713 Valid IRI syntax\")\n\nif not is_valid_syntax_iri('bob'):\n print(\"\u2717 Invalid IRI syntax\")\n \nif is_valid_syntax_iri_reference('bob'):\n print(\"\u2713 Valid IRI-reference syntax\")\n```\n\n### Get the Lark parse tree for a syntax validation (useful for additional semantic validation)\n\n```python\nfrom rfc3987_syntax import parse\n\nptree: ParseTree = parse(term=\"iri\", value=\"http://github.com\")\n\nprint(ptree)\n```\n\n## \ud83d\udcda Sources\n\nThis grammar was derived from:\n\n- **[RFC 3987 \u2013 Internationalized Resource Identifiers (IRIs)]** \n \u2192 Defines IRI syntax and extensions to URI (e.g. Unicode characters, `ucschar`) \n \u2192 https://datatracker.ietf.org/doc/html/rfc3987\n\n- **[RFC 3986 \u2013 Uniform Resource Identifier (URI): Generic Syntax)]** \n \u2192 Provides reusable components like `scheme`, `authority`, `ipv4address`, etc. \n \u2192 https://datatracker.ietf.org/doc/html/rfc3986\n\n> \ud83d\udcdd When `RFC 3986` is listed as the source, it is **used in accordance with RFC 3987**, which explicitly references it for foundational elements.\n\n### Rule-to-Source Mapping\n\n| Rule/Component | Source | Notes |\n|----------------------|------------|-------|\n| `iri` | RFC 3987 | Top-level IRI rule |\n| `iri_reference` | RFC 3987 | Top-level IRI Reference rule |\n| `absolute_iri` | RFC 3987 | Top-level Absolute IRI rule |\n| `scheme` | RFC 3986 | Referenced by RFC 3987 \u00a72.2 |\n| `ihier_part` | RFC 3987 | IRI-specific hierarchy |\n| `irelative_ref` | RFC 3987 | IRI-specific relative ref |\n| `irelative_part` | RFC 3987 | IRI-specific relative part |\n| `iauthority` | RFC 3986 | Standard URI authority |\n| `ipath_abempty` | RFC 3986 | Path format variant |\n| `ipath_absolute` | RFC 3986 | Absolute path |\n| `ipath_noscheme` | RFC 3986 | Path disallowing scheme prefix |\n| `ipath_rootless` | RFC 3986 | Used in non-scheme contexts |\n| `iquery` | RFC 3987 | Query extension to URI |\n| `ifragment` | RFC 3987 | Fragment extension to URI |\n| `ipchar`, `isegment` | RFC 3986 | Path characters and segments |\n| `isegment_nz_nc` | RFC 3987 | IRI-specific path constraint |\n| `iunreserved` | RFC 3987 | Includes `ucschar` |\n| `ucschar`, `iprivate`| RFC 3987 | Unicode support |\n| `sub_delims` | RFC 3986 | Reserved characters |\n| `ip_literal` | RFC 3986 | IPv6 or IPvFuture in `[]` |\n| `ipv6address` | RFC 3986 | Expanded forms only |\n| `ipvfuture` | RFC 3986 | Forward-compatible |\n| `ipv4address` | RFC 3986 | Dotted-decimal IPv4 |\n| `ls32` | RFC 3986 | Final 32 bits of IPv6 |\n| `h16`, `dec_octet` | RFC 3986 | Hex and decimal chunks |\n| `port` | RFC 3986 | Optional numeric |\n| `pct_encoded` | RFC 3986 | Percent encoding (e.g. `%20`) |\n| `alpha`, `digit`, `hexdig` | RFC 3986 | Character classes |\n",
"bugtrack_url": null,
"license": null,
"summary": "Helper functions to syntactically validate strings according to RFC 3987.",
"version": "1.1.0",
"project_urls": {
"Documentation": "https://github.com/willynilly/rfc3987-syntax#readme",
"Homepage": "https://github.com/willynilly/rfc3987-syntax",
"Issues": "https://github.com/willynilly/rfc3987-syntax/issues",
"Source": "https://github.com/willynilly/rfc3987-syntax"
},
"split_keywords": [
"rfc 3987",
" rfc3987",
" parser",
" syntax",
" validator"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "7e7144ce230e1b7fadd372515a97e32a83011f906ddded8d03e3c6aafbdedbb7",
"md5": "7fddc63551c99ee6e9f096903fcbb79d",
"sha256": "6c3d97604e4c5ce9f714898e05401a0445a641cfa276432b0a648c80856f6a3f"
},
"downloads": -1,
"filename": "rfc3987_syntax-1.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7fddc63551c99ee6e9f096903fcbb79d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 8046,
"upload_time": "2025-07-18T01:05:03",
"upload_time_iso_8601": "2025-07-18T01:05:03.843605Z",
"url": "https://files.pythonhosted.org/packages/7e/71/44ce230e1b7fadd372515a97e32a83011f906ddded8d03e3c6aafbdedbb7/rfc3987_syntax-1.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2c0637c1a5557acf449e8e406a830a05bf885ac47d33270aec454ef78675008d",
"md5": "b12f9966a7f15414812eb7c55ac13201",
"sha256": "717a62cbf33cffdd16dfa3a497d81ce48a660ea691b1ddd7be710c22f00b4a0d"
},
"downloads": -1,
"filename": "rfc3987_syntax-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "b12f9966a7f15414812eb7c55ac13201",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 14239,
"upload_time": "2025-07-18T01:05:05",
"upload_time_iso_8601": "2025-07-18T01:05:05.015375Z",
"url": "https://files.pythonhosted.org/packages/2c/06/37c1a5557acf449e8e406a830a05bf885ac47d33270aec454ef78675008d/rfc3987_syntax-1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-18 01:05:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "willynilly",
"github_project": "rfc3987-syntax#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "rfc3987-syntax"
}