Name | xri JSON |
Version |
0.7.4
JSON |
| download |
home_page | None |
Summary | Simple and efficient Python data types for URIs and IRIs |
upload_time | 2024-10-08 10:08:18 |
maintainer | None |
docs_url | None |
author | Nigel Small |
requires_python | >=3.7 |
license | Apache 2.0 |
keywords |
uri
iri
rfc3986
rfc3987
rfc6570
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# XRI
XRI is a small Python library for efficient and RFC-correct representation of URIs and IRIs.
It is currently work-in-progress and, as such, is not recommended for production environments.
The generic syntax for URIs is defined in [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986/).
This is extended in the IRI specification, [RFC 3987](https://datatracker.ietf.org/doc/html/rfc3987/), to support extended characters outside of the ASCII range.
The `URI` and `IRI` types defined in this library implement those definitions and store their constituent parts as `bytes` or `str` values respectively.
## Creating a URI or IRI
To get started, simply pass a string value into the `URI` or `IRI` constructor.
These can both accept either `bytes` or `str` values, and will encode or decode UTF-8 values as required.
```python-repl
>>> from xri import URI
>>> uri = URI("http://alice@example.com/a/b/c?q=x#z")
>>> uri
<URI scheme=b'http' authority=URI.Authority(b'example.com', userinfo=b'alice') \
path=URI.Path(b'/a/b/c') query=b'q=x' fragment=b'z'>
>>> uri.scheme = "https"
>>> print(uri)
https://alice@example.com/a/b/c?q=x#z
```
## Component parts
Each `URI` or `IRI` object is fully mutable, allowing any component parts to be get, set, or deleted.
The following component parts are available:
- `URI`/`IRI` object
- `.scheme` (None or string)
- `.authority` (None or `Authority` object)
- `.userinfo` (None or string)
- `.host` (string)
- `.port` (None, string or int)
- `.path` (`Path` object - can be used as an iterable of segment strings)
- `.query` (None or `Query` object)
- `.fragment` (None or string)
(The type "string" here refers to `bytes` or `bytearray` for `URI` objects, and `str` for `IRI` objects.)
## Percent encoding and decoding
Each of the `URI` and `IRI` classes has class methods called `pct_encode` and `pct_decode`.
These operate slightly differently, depending on the base class, as a slightly different set of characters are kept "safe" during encoding.
```python
>>> URI.pct_encode("abc/def")
'abc%2Fdef'
>>> URI.pct_encode("abc/def", safe="/")
'abc/def'
>>> URI.pct_encode("20% of $125 is $25")
'20%25%20of%20%24125%20is%20%2425'
>>> URI.pct_encode("20% of £125 is £25") # '£' is encoded with UTF-8
'20%25%20of%20%C2%A3125%20is%20%C2%A325'
>>> IRI.pct_encode("20% of £125 is £25") # '£' is safe within an IRI
'20%25%20of%20£125%20is%20£25'
>>> URI.pct_decode('20%25%20of%20%C2%A3125%20is%20%C2%A325') # str in, str out (using UTF-8)
'20% of £125 is £25'
>>> URI.pct_decode(b'20%25%20of%20%C2%A3125%20is%20%C2%A325') # bytes in, bytes out (no UTF-8)
b'20% of \xc2\xa3125 is \xc2\xa325'
```
Safe characters (passed in via the `safe` argument) can only be drawn from the set below.
Other characters passed to this argument will give a `ValueError`.
```
! # $ & ' ( ) * + , / : ; = ? @ [ ]
```
## Advantages over built-in `urllib.parse` module
### Correct handling of character encodings
RFC 3986 specifies that extended characters (beyond the ASCII range) are not supported directly within URIs.
When used, these should always be encoded with UTF-8 before percent encoding.
IRIs (defined in RFC 3987) do however allow such characters.
`urllib.parse` does not enforce this behaviour according to the RFCs, and does not support UTF-8 encoded bytes as input values.
```python
>>> urlparse("https://example.com/ä").path
'/ä'
>>> urlparse("https://example.com/ä".encode("utf-8")).path
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 20: ordinal not in range(128)
```
Conversely, `xri` handles these scenarios correctly according to the RFCs.
```python
>>> URI("https://example.com/ä").path
URI.Path(b'/%C3%A4')
>>> URI("https://example.com/ä".encode("utf-8")).path
URI.Path(b'/%C3%A4')
>>> IRI("https://example.com/ä").path
IRI.Path('/ä')
>>> IRI("https://example.com/ä".encode("utf-8")).path
IRI.Path('/ä')
```
### Optional components may be empty
Optional URI components, such as _query_ and _fragment_ are allowed to be present but empty, [according to RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986/#section-3.4).
As such, there is a semantic difference between an empty component and a missing component.
When composed, this will be denoted by the absence or presence of a marker character (`'?'` in the case of the query component).
The `urlparse` function does not distinguish between empty and missing components;
both are treated as "missing".
```python
>>> urlparse("https://example.com/a").geturl()
'https://example.com/a'
>>> urlparse("https://example.com/a?").geturl()
'https://example.com/a'
```
`xri`, on the other hand, correctly distinguishes between these cases:
```python
>>> str(URI("https://example.com/a"))
'https://example.com/a'
>>> str(URI("https://example.com/a?"))
'https://example.com/a?'
```
Raw data
{
"_id": null,
"home_page": null,
"name": "xri",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "URI, IRI, RFC3986, RFC3987, RFC6570",
"author": "Nigel Small",
"author_email": "technige@nige.tech",
"download_url": "https://files.pythonhosted.org/packages/53/d7/95685365f048ac7a1c358fc732368995f7b50037ac5c51e87b15f90c169a/xri-0.7.4.tar.gz",
"platform": null,
"description": "# XRI\n\nXRI is a small Python library for efficient and RFC-correct representation of URIs and IRIs.\nIt is currently work-in-progress and, as such, is not recommended for production environments.\n\nThe generic syntax for URIs is defined in [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986/).\nThis is extended in the IRI specification, [RFC 3987](https://datatracker.ietf.org/doc/html/rfc3987/), to support extended characters outside of the ASCII range. \nThe `URI` and `IRI` types defined in this library implement those definitions and store their constituent parts as `bytes` or `str` values respectively.\n\n\n## Creating a URI or IRI\n\nTo get started, simply pass a string value into the `URI` or `IRI` constructor.\nThese can both accept either `bytes` or `str` values, and will encode or decode UTF-8 values as required.\n\n```python-repl\n>>> from xri import URI\n>>> uri = URI(\"http://alice@example.com/a/b/c?q=x#z\")\n>>> uri\n<URI scheme=b'http' authority=URI.Authority(b'example.com', userinfo=b'alice') \\\n path=URI.Path(b'/a/b/c') query=b'q=x' fragment=b'z'>\n>>> uri.scheme = \"https\"\n>>> print(uri)\nhttps://alice@example.com/a/b/c?q=x#z\n```\n\n\n## Component parts\n\nEach `URI` or `IRI` object is fully mutable, allowing any component parts to be get, set, or deleted.\nThe following component parts are available:\n\n- `URI`/`IRI` object\n - `.scheme` (None or string)\n - `.authority` (None or `Authority` object)\n - `.userinfo` (None or string) \n - `.host` (string)\n - `.port` (None, string or int)\n - `.path` (`Path` object - can be used as an iterable of segment strings)\n - `.query` (None or `Query` object)\n - `.fragment` (None or string)\n\n(The type \"string\" here refers to `bytes` or `bytearray` for `URI` objects, and `str` for `IRI` objects.)\n\n\n## Percent encoding and decoding\n\nEach of the `URI` and `IRI` classes has class methods called `pct_encode` and `pct_decode`.\nThese operate slightly differently, depending on the base class, as a slightly different set of characters are kept \"safe\" during encoding.\n\n```python\n>>> URI.pct_encode(\"abc/def\")\n'abc%2Fdef'\n>>> URI.pct_encode(\"abc/def\", safe=\"/\")\n'abc/def'\n>>> URI.pct_encode(\"20% of $125 is $25\")\n'20%25%20of%20%24125%20is%20%2425'\n>>> URI.pct_encode(\"20% of \u00a3125 is \u00a325\") # '\u00a3' is encoded with UTF-8\n'20%25%20of%20%C2%A3125%20is%20%C2%A325'\n>>> IRI.pct_encode(\"20% of \u00a3125 is \u00a325\") # '\u00a3' is safe within an IRI\n'20%25%20of%20\u00a3125%20is%20\u00a325'\n>>> URI.pct_decode('20%25%20of%20%C2%A3125%20is%20%C2%A325') # str in, str out (using UTF-8)\n'20% of \u00a3125 is \u00a325'\n>>> URI.pct_decode(b'20%25%20of%20%C2%A3125%20is%20%C2%A325') # bytes in, bytes out (no UTF-8)\nb'20% of \\xc2\\xa3125 is \\xc2\\xa325'\n```\n\nSafe characters (passed in via the `safe` argument) can only be drawn from the set below.\nOther characters passed to this argument will give a `ValueError`.\n```\n! # $ & ' ( ) * + , / : ; = ? @ [ ]\n```\n\n\n## Advantages over built-in `urllib.parse` module\n\n### Correct handling of character encodings\n\nRFC 3986 specifies that extended characters (beyond the ASCII range) are not supported directly within URIs.\nWhen used, these should always be encoded with UTF-8 before percent encoding.\nIRIs (defined in RFC 3987) do however allow such characters. \n\n`urllib.parse` does not enforce this behaviour according to the RFCs, and does not support UTF-8 encoded bytes as input values.\n```python\n>>> urlparse(\"https://example.com/\u00e4\").path\n'/\u00e4'\n>>> urlparse(\"https://example.com/\u00e4\".encode(\"utf-8\")).path\nUnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 20: ordinal not in range(128)\n```\n\nConversely, `xri` handles these scenarios correctly according to the RFCs.\n```python\n>>> URI(\"https://example.com/\u00e4\").path\nURI.Path(b'/%C3%A4')\n>>> URI(\"https://example.com/\u00e4\".encode(\"utf-8\")).path\nURI.Path(b'/%C3%A4')\n>>> IRI(\"https://example.com/\u00e4\").path\nIRI.Path('/\u00e4')\n>>> IRI(\"https://example.com/\u00e4\".encode(\"utf-8\")).path\nIRI.Path('/\u00e4')\n```\n\n### Optional components may be empty\nOptional URI components, such as _query_ and _fragment_ are allowed to be present but empty, [according to RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986/#section-3.4).\nAs such, there is a semantic difference between an empty component and a missing component.\nWhen composed, this will be denoted by the absence or presence of a marker character (`'?'` in the case of the query component).\n\nThe `urlparse` function does not distinguish between empty and missing components;\nboth are treated as \"missing\".\n```python\n>>> urlparse(\"https://example.com/a\").geturl()\n'https://example.com/a'\n>>> urlparse(\"https://example.com/a?\").geturl()\n'https://example.com/a'\n```\n\n`xri`, on the other hand, correctly distinguishes between these cases:\n```python\n>>> str(URI(\"https://example.com/a\"))\n'https://example.com/a'\n>>> str(URI(\"https://example.com/a?\"))\n'https://example.com/a?'\n```\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Simple and efficient Python data types for URIs and IRIs",
"version": "0.7.4",
"project_urls": null,
"split_keywords": [
"uri",
" iri",
" rfc3986",
" rfc3987",
" rfc6570"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "53d795685365f048ac7a1c358fc732368995f7b50037ac5c51e87b15f90c169a",
"md5": "e5172666708e502d7419ce07139188ee",
"sha256": "7d009075c8e0672a9655cdfd7cdb94fbbfb76d54be266cf90c5d767eafc0bbe1"
},
"downloads": -1,
"filename": "xri-0.7.4.tar.gz",
"has_sig": false,
"md5_digest": "e5172666708e502d7419ce07139188ee",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 18155,
"upload_time": "2024-10-08T10:08:18",
"upload_time_iso_8601": "2024-10-08T10:08:18.033941Z",
"url": "https://files.pythonhosted.org/packages/53/d7/95685365f048ac7a1c358fc732368995f7b50037ac5c51e87b15f90c169a/xri-0.7.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-08 10:08:18",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "xri"
}