# urlstd
[![PyPI](https://img.shields.io/pypi/v/urlstd)](https://pypi.org/project/urlstd/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/urlstd)](https://pypi.org/project/urlstd/)
[![PyPI - License](https://img.shields.io/pypi/l/urlstd)](https://pypi.org/project/urlstd/)
[![CI](https://github.com/miute/urlstd/actions/workflows/main.yml/badge.svg)](https://github.com/miute/urlstd/actions/workflows/main.yml)
[![codecov](https://codecov.io/gh/miute/urlstd/branch/main/graph/badge.svg?token=XJGM09H5TS)](https://codecov.io/gh/miute/urlstd)
`urlstd` is a Python implementation of the WHATWG [URL Living Standard](https://url.spec.whatwg.org/).
This library provides `URL` class, `URLSearchParams` class, and low-level APIs that comply with the URL specification.
## Supported APIs
- [URL class](https://url.spec.whatwg.org/#url-class)
- class urlstd.parse.`URL(url: str, base: Optional[str | URL] = None)`
- [canParse](https://url.spec.whatwg.org/#dom-url-canparse): classmethod `can_parse(url: str, base: Optional[str | URL] = None) -> bool`
- stringifier: `__str__() -> str`
- [href](https://url.spec.whatwg.org/#dom-url-href): `readonly property href: str`
- [origin](https://url.spec.whatwg.org/#dom-url-origin): `readonly property origin: str`
- [protocol](https://url.spec.whatwg.org/#dom-url-protocol): `property protocol: str`
- [username](https://url.spec.whatwg.org/#dom-url-username): `property username: str`
- [password](https://url.spec.whatwg.org/#dom-url-password): `property password: str`
- [host](https://url.spec.whatwg.org/#dom-url-host): `property host: str`
- [hostname](https://url.spec.whatwg.org/#dom-url-hostname): `property hostname: str`
- [port](https://url.spec.whatwg.org/#dom-url-port): `property port: str`
- [pathname](https://url.spec.whatwg.org/#dom-url-pathname): `property pathname: str`
- [search](https://url.spec.whatwg.org/#dom-url-search): `property search: str`
- [searchParams](https://url.spec.whatwg.org/#dom-url-searchparams): `readonly property search_params: URLSearchParams`
- [hash](https://url.spec.whatwg.org/#dom-url-hash): `property hash: str`
- [URL equivalence](https://url.spec.whatwg.org/#url-equivalence): `__eq__(other: Any) -> bool` and `equals(other: URL, exclude_fragments: bool = False) β bool`
- [URLSearchParams class](https://url.spec.whatwg.org/#interface-urlsearchparams)
- class urlstd.parse.`URLSearchParams(init: Optional[str | Sequence[Sequence[str | int | float]] | dict[str, str | int | float] | URLRecord | URLSearchParams] = None)`
- [size](https://url.spec.whatwg.org/#dom-urlsearchparams-size): `__len__() -> int`
- [append](https://url.spec.whatwg.org/#dom-urlsearchparams-append): `append(name: str, value: str | int | float) -> None`
- [delete](https://url.spec.whatwg.org/#dom-urlsearchparams-delete): `delete(name: str, value: Optional[str | int | float] = None) -> None`
- [get](https://url.spec.whatwg.org/#dom-urlsearchparams-get): `get(name: str) -> str | None`
- [getAll](https://url.spec.whatwg.org/#dom-urlsearchparams-getall): `get_all(name: str) -> tuple[str, ...]`
- [has](https://url.spec.whatwg.org/#dom-urlsearchparams-has): `has(name: str, value: Optional[str | int | float] = None) -> bool`
- [set](https://url.spec.whatwg.org/#dom-urlsearchparams-set): `set(name: str, value: str | int | float) -> None`
- [sort](https://url.spec.whatwg.org/#dom-urlsearchparams-sort): `sort() -> None`
- iterable<USVString, USVString>: `__iter__() -> Iterator[tuple[str, str]]`
- [stringifier](https://url.spec.whatwg.org/#urlsearchparams-stringification-behavior): `__str__() -> str`
- Low-level APIs
- [URL parser](https://url.spec.whatwg.org/#concept-url-parser)
- urlstd.parse.`parse_url(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = "utf-8") -> URLRecord`
- [basic URL parser](https://url.spec.whatwg.org/#concept-basic-url-parser)
- class urlstd.parse.`BasicURLParser`
- classmethod `parse(urlstring: str, base: Optional[URLRecord] = None, encoding: str = "utf-8", url: Optional[URLRecord] = None, state_override: Optional[URLParserState] = None) -> URLRecord`
- [URL record](https://url.spec.whatwg.org/#concept-url)
- class urlstd.parse.`URLRecord`
- [scheme](https://url.spec.whatwg.org/#concept-url-scheme): `property scheme: str = ""`
- [username](https://url.spec.whatwg.org/#concept-url-username): `property username: str = ""`
- [password](https://url.spec.whatwg.org/#concept-url-password): `property password: str = ""`
- [host](https://url.spec.whatwg.org/#concept-url-host): `property host: Optional[str | int | tuple[int, ...]] = None`
- [port](https://url.spec.whatwg.org/#concept-url-port): `property port: Optional[int] = None`
- [path](https://url.spec.whatwg.org/#concept-url-path): `property path: list[str] | str = []`
- [query](https://url.spec.whatwg.org/#concept-url-query): `property query: Optional[str] = None`
- [fragment](https://url.spec.whatwg.org/#concept-url-fragment): `property fragment: Optional[str] = None`
- [origin](https://url.spec.whatwg.org/#concept-url-origin): `readonly property origin: Origin | None`
- [is special](https://url.spec.whatwg.org/#is-special): `is_special() -> bool`
- [is not special](https://url.spec.whatwg.org/#is-not-special): `is_not_special() -> bool`
- [includes credentials](https://url.spec.whatwg.org/#include-credentials): `includes_credentials() -> bool`
- [has an opaque path](https://url.spec.whatwg.org/#url-opaque-path): `has_opaque_path() -> bool`
- [cannot have a username/password/port](https://url.spec.whatwg.org/#cannot-have-a-username-password-port): `cannot_have_username_password_port() -> bool`
- [URL serializer](https://url.spec.whatwg.org/#concept-url-serializer): `serialize_url(exclude_fragment: bool = False) -> str`
- [host serializer](https://url.spec.whatwg.org/#concept-host-serializer): `serialize_host() -> str`
- [URL path serializer](https://url.spec.whatwg.org/#url-path-serializer): `serialize_path() -> str`
- [URL equivalence](https://url.spec.whatwg.org/#url-equivalence): `__eq__(other: Any) -> bool` and `equals(other: URLRecord, exclude_fragments: bool = False) β bool`
- [Hosts (domains and IP addresses)](https://url.spec.whatwg.org/#hosts-(domains-and-ip-addresses))
- class urlstd.parse.`IDNA`
- [domain to ASCII](https://url.spec.whatwg.org/#concept-domain-to-ascii): classmethod `domain_to_ascii(domain: str, be_strict: bool = False) -> str`
- [domain to Unicode](https://url.spec.whatwg.org/#concept-domain-to-unicode): classmethod `domain_to_unicode(domain: str, be_strict: bool = False) -> str`
- class urlstd.parse.`Host`
- [host parser](https://url.spec.whatwg.org/#concept-host-parser): classmethod `parse(host: str, is_not_special: bool = False) -> str | int | tuple[int, ...]`
- [host serializer](https://url.spec.whatwg.org/#concept-host-serializer): classmethod `serialize(host: str | int | Sequence[int]) -> str`
- [percent-decode a string](https://url.spec.whatwg.org/#string-percent-decode)
- urlstd.parse.`string_percent_decode(s: str) -> bytes`
- [percent-encode after encoding](https://url.spec.whatwg.org/#string-percent-encode-after-encoding)
- urlstd.parse.`string_percent_encode(s: str, safe: str, encoding: str = "utf-8", space_as_plus: bool = False) -> str`
- [application/x-www-form-urlencoded parser](https://url.spec.whatwg.org/#concept-urlencoded-parser)
- urlstd.parse.`parse_qsl(query: bytes) -> list[tuple[str, str]]`
- [application/x-www-form-urlencoded serializer](https://url.spec.whatwg.org/#concept-urlencoded-serializer)
- urlstd.parse.`urlencode(query: Sequence[tuple[str, str]], encoding: str = "utf-8") -> str`
- Validation
- class urlstd.parse.`HostValidator`
- [valid host string](https://url.spec.whatwg.org/#valid-host-string): classmethod `is_valid(host: str) -> bool`
- [valid domain string](https://url.spec.whatwg.org/#valid-domain-string): classmethod `is_valid_domain(domain: str) -> bool`
- [valid IPv4-address string](https://url.spec.whatwg.org/#valid-ipv4-address-string): classmethod `is_valid_ipv4_address(address: str) -> bool`
- [valid IPv6-address string](https://url.spec.whatwg.org/#valid-ipv6-address-string): classmethod `is_valid_ipv6_address(address: str) -> bool`
- class urlstd.parse.`URLValidator`
- [valid URL string](https://url.spec.whatwg.org/#valid-url-string): classmethod `is_valid(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = "utf-8") -> bool`
- valid [URL-scheme string](https://url.spec.whatwg.org/#url-scheme-string): classmethod `is_valid_url_scheme(value: str) -> bool`
- Compatibility with standard library `urllib`
- urlstd.parse.`urlparse(urlstring: str, base: str = None, encoding: str = "utf-8", allow_fragments: bool = True) -> urllib.parse.ParseResult`
`urlstd.parse.urlparse()` ia an alternative to `urllib.parse.urlparse()`.
Parses a string representation of a URL using the basic URL parser, and returns `urllib.parse.ParseResult`.
## Basic Usage
To parse a string into a `URL`:
```python
from urlstd.parse import URL
URL('http://user:pass@foo:21/bar;par?b#c')
# β <URL(href='http://user:pass@foo:21/bar;par?b#c', origin='http://foo:21', protocol='http:', username='user', password='pass', host='foo:21', hostname='foo', port='21', pathname='/bar;par', search='?b', hash='#c')>
```
To parse a string into a `URL` with using a base URL:
```python
url = URL('?ο¬&π', base='http://example.org')
url # β <URL(href='http://example.org/?%EF%AC%83&%F0%9F%8C%88', origin='http://example.org', protocol='http:', username='', password='', host='example.org', hostname='example.org', port='', pathname='/', search='?%EF%AC%83&%F0%9F%8C%88', hash='')>
url.search # β '?%EF%AC%83&%F0%9F%8C%88'
params = url.search_params
params # β URLSearchParams([('ο¬', ''), ('π', '')])
params.sort()
params # β URLSearchParams([('π', ''), ('ο¬', '')])
url.search # β '?%F0%9F%8C%88=&%EF%AC%83='
str(url) # β 'http://example.org/?%F0%9F%8C%88=&%EF%AC%83='
```
To validate a URL string:
```python
from urlstd.parse import URL, URLValidator, ValidityState
URL.can_parse('https://user:password@example.org/') # β True
URLValidator.is_valid('https://user:password@example.org/') # β False
validity = ValidityState()
URLValidator.is_valid('https://user:password@example.org/', validity=validity)
validity.valid # β False
validity.validation_errors # β 1
validity.descriptions[0] # β "invalid-credentials: input includes credentials: 'https://user:password@example.org/' at position 21"
```
```python
URL.can_parse('file:///C|/demo') # β True
URLValidator.is_valid('file:///C|/demo') # β False
validity = ValidityState()
URLValidator.is_valid('file:///C|/demo', validity=validity) # β False
validity.valid # β False
validity.validation_errors # β 1
validity.descriptions[0] # β "invalid-URL-unit: code point is found that is not a URL unit: U+007C (|) in 'file:///C|/demo' at position 9"
```
To parse a string into a `urllib.parse.ParseResult` with using a base URL:
```python
import html
from urllib.parse import unquote
from urlstd.parse import urlparse
pr = urlparse('?aΓΏb', base='http://example.org/foo/', encoding='utf-8')
pr # β ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%C3%BFb', fragment='')
unquote(pr.query) # β 'aΓΏb'
pr = urlparse('?aΓΏb', base='http://example.org/foo/', encoding='windows-1251')
pr # β ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%26%23255%3Bb', fragment='')
unquote(pr.query, encoding='windows-1251') # β 'aÿb'
html.unescape('aÿb') # β 'aΓΏb'
pr = urlparse('?aΓΏb', base='http://example.org/foo/', encoding='windows-1252')
pr # β ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%FFb', fragment='')
unquote(pr.query, encoding='windows-1252') # β 'aΓΏb'
```
## Logging
`urlstd` uses standard library [logging](https://docs.python.org/3/library/logging.html) for [validation error](https://url.spec.whatwg.org/#validation-error).
Change the logger log level of `urlstd` if needed:
```python
logging.getLogger('urlstd').setLevel(logging.ERROR)
```
## Dependencies
- [icupy](https://pypi.org/project/icupy/) >= 0.11.0 ([pre-built packages](https://github.com/miute/icupy/releases) are available)
- `icupy` requirements:
- [ICU4C](https://github.com/unicode-org/icu/releases) ([ICU - International Components for Unicode](https://icu.unicode.org/)) - latest version recommended
- C++17 compatible compiler (see [supported compilers](https://github.com/pybind/pybind11#supported-compilers))
- [CMake](https://cmake.org/) >= 3.7
## Installation
1. Configuring environment variables for icupy (ICU):
- Windows:
- Set the `ICU_ROOT` environment variable to the root of the ICU installation (default is `C:\icu`).
For example, if the ICU is located in `C:\icu4c`:
```bat
set ICU_ROOT=C:\icu4c
```
or in PowerShell:
```bat
$env:ICU_ROOT = "C:\icu4c"
```
- To verify settings using *icuinfo (64 bit)*:
```bat
%ICU_ROOT%\bin64\icuinfo
```
or in PowerShell:
```bat
& $env:ICU_ROOT\bin64\icuinfo
```
- Linux/POSIX:
- If the ICU is located in a non-regular place, set the `PKG_CONFIG_PATH` and `LD_LIBRARY_PATH` environment variables.
For example, if the ICU is located in `/usr/local`:
```bash
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
```
- To verify settings using *pkg-config*:
```bash
$ pkg-config --cflags --libs icu-uc
-I/usr/local/include -L/usr/local/lib -licuuc -licudata
```
2. Installing from PyPI:
```bash
pip install urlstd
```
## Running Tests
Install dependencies:
```bash
pipx install tox
# or
pip install --user tox
```
To run tests and generate a report:
```bash
git clone https://github.com/miute/urlstd.git
cd urlstd
tox -e wpt
```
See result: [tests/wpt/report.html](https://htmlpreview.github.io/?https://github.com/miute/urlstd/blob/main/tests/wpt/report.html)
## License
[MIT License](https://github.com/miute/urlstd/blob/main/LICENSE).
Raw data
{
"_id": null,
"home_page": "https://github.com/miute/urlstd",
"name": "urlstd",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "url,whatwg-url,url-standard,url-parser,url-parsing",
"author": "Tetsuya Miura",
"author_email": "miute.dev@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d5/34/ccc954ae7638071e312069250e4de8fc6925ce36f7f7c97e946ad21d7a61/urlstd-2023.7.26.1.tar.gz",
"platform": null,
"description": "# urlstd\n\n[![PyPI](https://img.shields.io/pypi/v/urlstd)](https://pypi.org/project/urlstd/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/urlstd)](https://pypi.org/project/urlstd/)\n[![PyPI - License](https://img.shields.io/pypi/l/urlstd)](https://pypi.org/project/urlstd/)\n[![CI](https://github.com/miute/urlstd/actions/workflows/main.yml/badge.svg)](https://github.com/miute/urlstd/actions/workflows/main.yml)\n[![codecov](https://codecov.io/gh/miute/urlstd/branch/main/graph/badge.svg?token=XJGM09H5TS)](https://codecov.io/gh/miute/urlstd)\n\n`urlstd` is a Python implementation of the WHATWG [URL Living Standard](https://url.spec.whatwg.org/).\n\nThis library provides `URL` class, `URLSearchParams` class, and low-level APIs that comply with the URL specification.\n\n## Supported APIs\n\n- [URL class](https://url.spec.whatwg.org/#url-class)\n - class urlstd.parse.`URL(url: str, base: Optional[str | URL] = None)`\n - [canParse](https://url.spec.whatwg.org/#dom-url-canparse): classmethod `can_parse(url: str, base: Optional[str | URL] = None) -> bool`\n - stringifier: `__str__() -> str`\n - [href](https://url.spec.whatwg.org/#dom-url-href): `readonly property href: str`\n - [origin](https://url.spec.whatwg.org/#dom-url-origin): `readonly property origin: str`\n - [protocol](https://url.spec.whatwg.org/#dom-url-protocol): `property protocol: str`\n - [username](https://url.spec.whatwg.org/#dom-url-username): `property username: str`\n - [password](https://url.spec.whatwg.org/#dom-url-password): `property password: str`\n - [host](https://url.spec.whatwg.org/#dom-url-host): `property host: str`\n - [hostname](https://url.spec.whatwg.org/#dom-url-hostname): `property hostname: str`\n - [port](https://url.spec.whatwg.org/#dom-url-port): `property port: str`\n - [pathname](https://url.spec.whatwg.org/#dom-url-pathname): `property pathname: str`\n - [search](https://url.spec.whatwg.org/#dom-url-search): `property search: str`\n - [searchParams](https://url.spec.whatwg.org/#dom-url-searchparams): `readonly property search_params: URLSearchParams`\n - [hash](https://url.spec.whatwg.org/#dom-url-hash): `property hash: str`\n - [URL equivalence](https://url.spec.whatwg.org/#url-equivalence): `__eq__(other: Any) -> bool` and `equals(other: URL, exclude_fragments: bool = False) \u2192 bool`\n\n- [URLSearchParams class](https://url.spec.whatwg.org/#interface-urlsearchparams)\n - class urlstd.parse.`URLSearchParams(init: Optional[str | Sequence[Sequence[str | int | float]] | dict[str, str | int | float] | URLRecord | URLSearchParams] = None)`\n - [size](https://url.spec.whatwg.org/#dom-urlsearchparams-size): `__len__() -> int`\n - [append](https://url.spec.whatwg.org/#dom-urlsearchparams-append): `append(name: str, value: str | int | float) -> None`\n - [delete](https://url.spec.whatwg.org/#dom-urlsearchparams-delete): `delete(name: str, value: Optional[str | int | float] = None) -> None`\n - [get](https://url.spec.whatwg.org/#dom-urlsearchparams-get): `get(name: str) -> str | None`\n - [getAll](https://url.spec.whatwg.org/#dom-urlsearchparams-getall): `get_all(name: str) -> tuple[str, ...]`\n - [has](https://url.spec.whatwg.org/#dom-urlsearchparams-has): `has(name: str, value: Optional[str | int | float] = None) -> bool`\n - [set](https://url.spec.whatwg.org/#dom-urlsearchparams-set): `set(name: str, value: str | int | float) -> None`\n - [sort](https://url.spec.whatwg.org/#dom-urlsearchparams-sort): `sort() -> None`\n - iterable<USVString, USVString>: `__iter__() -> Iterator[tuple[str, str]]`\n - [stringifier](https://url.spec.whatwg.org/#urlsearchparams-stringification-behavior): `__str__() -> str`\n\n- Low-level APIs\n\n - [URL parser](https://url.spec.whatwg.org/#concept-url-parser)\n - urlstd.parse.`parse_url(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = \"utf-8\") -> URLRecord`\n\n - [basic URL parser](https://url.spec.whatwg.org/#concept-basic-url-parser)\n - class urlstd.parse.`BasicURLParser`\n - classmethod `parse(urlstring: str, base: Optional[URLRecord] = None, encoding: str = \"utf-8\", url: Optional[URLRecord] = None, state_override: Optional[URLParserState] = None) -> URLRecord`\n\n - [URL record](https://url.spec.whatwg.org/#concept-url)\n - class urlstd.parse.`URLRecord`\n - [scheme](https://url.spec.whatwg.org/#concept-url-scheme): `property scheme: str = \"\"`\n - [username](https://url.spec.whatwg.org/#concept-url-username): `property username: str = \"\"`\n - [password](https://url.spec.whatwg.org/#concept-url-password): `property password: str = \"\"`\n - [host](https://url.spec.whatwg.org/#concept-url-host): `property host: Optional[str | int | tuple[int, ...]] = None`\n - [port](https://url.spec.whatwg.org/#concept-url-port): `property port: Optional[int] = None`\n - [path](https://url.spec.whatwg.org/#concept-url-path): `property path: list[str] | str = []`\n - [query](https://url.spec.whatwg.org/#concept-url-query): `property query: Optional[str] = None`\n - [fragment](https://url.spec.whatwg.org/#concept-url-fragment): `property fragment: Optional[str] = None`\n - [origin](https://url.spec.whatwg.org/#concept-url-origin): `readonly property origin: Origin | None`\n - [is special](https://url.spec.whatwg.org/#is-special): `is_special() -> bool`\n - [is not special](https://url.spec.whatwg.org/#is-not-special): `is_not_special() -> bool`\n - [includes credentials](https://url.spec.whatwg.org/#include-credentials): `includes_credentials() -> bool`\n - [has an opaque path](https://url.spec.whatwg.org/#url-opaque-path): `has_opaque_path() -> bool`\n - [cannot have a username/password/port](https://url.spec.whatwg.org/#cannot-have-a-username-password-port): `cannot_have_username_password_port() -> bool`\n - [URL serializer](https://url.spec.whatwg.org/#concept-url-serializer): `serialize_url(exclude_fragment: bool = False) -> str`\n - [host serializer](https://url.spec.whatwg.org/#concept-host-serializer): `serialize_host() -> str`\n - [URL path serializer](https://url.spec.whatwg.org/#url-path-serializer): `serialize_path() -> str`\n - [URL equivalence](https://url.spec.whatwg.org/#url-equivalence): `__eq__(other: Any) -> bool` and `equals(other: URLRecord, exclude_fragments: bool = False) \u2192 bool`\n\n - [Hosts (domains and IP addresses)](https://url.spec.whatwg.org/#hosts-(domains-and-ip-addresses))\n - class urlstd.parse.`IDNA`\n - [domain to ASCII](https://url.spec.whatwg.org/#concept-domain-to-ascii): classmethod `domain_to_ascii(domain: str, be_strict: bool = False) -> str`\n - [domain to Unicode](https://url.spec.whatwg.org/#concept-domain-to-unicode): classmethod `domain_to_unicode(domain: str, be_strict: bool = False) -> str`\n - class urlstd.parse.`Host`\n - [host parser](https://url.spec.whatwg.org/#concept-host-parser): classmethod `parse(host: str, is_not_special: bool = False) -> str | int | tuple[int, ...]`\n - [host serializer](https://url.spec.whatwg.org/#concept-host-serializer): classmethod `serialize(host: str | int | Sequence[int]) -> str`\n\n - [percent-decode a string](https://url.spec.whatwg.org/#string-percent-decode)\n - urlstd.parse.`string_percent_decode(s: str) -> bytes`\n\n - [percent-encode after encoding](https://url.spec.whatwg.org/#string-percent-encode-after-encoding)\n - urlstd.parse.`string_percent_encode(s: str, safe: str, encoding: str = \"utf-8\", space_as_plus: bool = False) -> str`\n\n - [application/x-www-form-urlencoded parser](https://url.spec.whatwg.org/#concept-urlencoded-parser)\n - urlstd.parse.`parse_qsl(query: bytes) -> list[tuple[str, str]]`\n\n - [application/x-www-form-urlencoded serializer](https://url.spec.whatwg.org/#concept-urlencoded-serializer)\n - urlstd.parse.`urlencode(query: Sequence[tuple[str, str]], encoding: str = \"utf-8\") -> str`\n\n - Validation\n - class urlstd.parse.`HostValidator`\n - [valid host string](https://url.spec.whatwg.org/#valid-host-string): classmethod `is_valid(host: str) -> bool`\n - [valid domain string](https://url.spec.whatwg.org/#valid-domain-string): classmethod `is_valid_domain(domain: str) -> bool`\n - [valid IPv4-address string](https://url.spec.whatwg.org/#valid-ipv4-address-string): classmethod `is_valid_ipv4_address(address: str) -> bool`\n - [valid IPv6-address string](https://url.spec.whatwg.org/#valid-ipv6-address-string): classmethod `is_valid_ipv6_address(address: str) -> bool`\n - class urlstd.parse.`URLValidator`\n - [valid URL string](https://url.spec.whatwg.org/#valid-url-string): classmethod `is_valid(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = \"utf-8\") -> bool`\n - valid [URL-scheme string](https://url.spec.whatwg.org/#url-scheme-string): classmethod `is_valid_url_scheme(value: str) -> bool`\n\n- Compatibility with standard library `urllib`\n - urlstd.parse.`urlparse(urlstring: str, base: str = None, encoding: str = \"utf-8\", allow_fragments: bool = True) -> urllib.parse.ParseResult`\n\n `urlstd.parse.urlparse()` ia an alternative to `urllib.parse.urlparse()`.\n Parses a string representation of a URL using the basic URL parser, and returns `urllib.parse.ParseResult`.\n\n## Basic Usage\n\nTo parse a string into a `URL`:\n\n```python\nfrom urlstd.parse import URL\nURL('http://user:pass@foo:21/bar;par?b#c')\n# \u2192 <URL(href='http://user:pass@foo:21/bar;par?b#c', origin='http://foo:21', protocol='http:', username='user', password='pass', host='foo:21', hostname='foo', port='21', pathname='/bar;par', search='?b', hash='#c')>\n```\n\nTo parse a string into a `URL` with using a base URL:\n\n```python\nurl = URL('?\ufb03&\ud83c\udf08', base='http://example.org')\nurl # \u2192 <URL(href='http://example.org/?%EF%AC%83&%F0%9F%8C%88', origin='http://example.org', protocol='http:', username='', password='', host='example.org', hostname='example.org', port='', pathname='/', search='?%EF%AC%83&%F0%9F%8C%88', hash='')>\nurl.search # \u2192 '?%EF%AC%83&%F0%9F%8C%88'\nparams = url.search_params\nparams # \u2192 URLSearchParams([('\ufb03', ''), ('\ud83c\udf08', '')])\nparams.sort()\nparams # \u2192 URLSearchParams([('\ud83c\udf08', ''), ('\ufb03', '')])\nurl.search # \u2192 '?%F0%9F%8C%88=&%EF%AC%83='\nstr(url) # \u2192 'http://example.org/?%F0%9F%8C%88=&%EF%AC%83='\n```\n\nTo validate a URL string:\n\n```python\nfrom urlstd.parse import URL, URLValidator, ValidityState\nURL.can_parse('https://user:password@example.org/') # \u2192 True\nURLValidator.is_valid('https://user:password@example.org/') # \u2192 False\nvalidity = ValidityState()\nURLValidator.is_valid('https://user:password@example.org/', validity=validity)\nvalidity.valid # \u2192 False\nvalidity.validation_errors # \u2192 1\nvalidity.descriptions[0] # \u2192 \"invalid-credentials: input includes credentials: 'https://user:password@example.org/' at position 21\"\n```\n\n```python\nURL.can_parse('file:///C|/demo') # \u2192 True\nURLValidator.is_valid('file:///C|/demo') # \u2192 False\nvalidity = ValidityState()\nURLValidator.is_valid('file:///C|/demo', validity=validity) # \u2192 False\nvalidity.valid # \u2192 False\nvalidity.validation_errors # \u2192 1\nvalidity.descriptions[0] # \u2192 \"invalid-URL-unit: code point is found that is not a URL unit: U+007C (|) in 'file:///C|/demo' at position 9\"\n```\n\nTo parse a string into a `urllib.parse.ParseResult` with using a base URL:\n\n```python\nimport html\nfrom urllib.parse import unquote\nfrom urlstd.parse import urlparse\npr = urlparse('?a\u00ffb', base='http://example.org/foo/', encoding='utf-8')\npr # \u2192 ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%C3%BFb', fragment='')\nunquote(pr.query) # \u2192 'a\u00ffb'\npr = urlparse('?a\u00ffb', base='http://example.org/foo/', encoding='windows-1251')\npr # \u2192 ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%26%23255%3Bb', fragment='')\nunquote(pr.query, encoding='windows-1251') # \u2192 'aÿb'\nhtml.unescape('aÿb') # \u2192 'a\u00ffb'\npr = urlparse('?a\u00ffb', base='http://example.org/foo/', encoding='windows-1252')\npr # \u2192 ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%FFb', fragment='')\nunquote(pr.query, encoding='windows-1252') # \u2192 'a\u00ffb'\n```\n\n## Logging\n\n`urlstd` uses standard library [logging](https://docs.python.org/3/library/logging.html) for [validation error](https://url.spec.whatwg.org/#validation-error).\nChange the logger log level of `urlstd` if needed:\n\n```python\nlogging.getLogger('urlstd').setLevel(logging.ERROR)\n```\n\n## Dependencies\n\n- [icupy](https://pypi.org/project/icupy/) >= 0.11.0 ([pre-built packages](https://github.com/miute/icupy/releases) are available)\n - `icupy` requirements:\n - [ICU4C](https://github.com/unicode-org/icu/releases) ([ICU - International Components for Unicode](https://icu.unicode.org/)) - latest version recommended\n - C++17 compatible compiler (see [supported compilers](https://github.com/pybind/pybind11#supported-compilers))\n - [CMake](https://cmake.org/) >= 3.7\n\n## Installation\n\n1. Configuring environment variables for icupy (ICU):\n - Windows:\n - Set the `ICU_ROOT` environment variable to the root of the ICU installation (default is `C:\\icu`).\n For example, if the ICU is located in `C:\\icu4c`:\n\n ```bat\n set ICU_ROOT=C:\\icu4c\n ```\n\n or in PowerShell:\n\n ```bat\n $env:ICU_ROOT = \"C:\\icu4c\"\n ```\n\n - To verify settings using *icuinfo (64 bit)*:\n\n ```bat\n %ICU_ROOT%\\bin64\\icuinfo\n ```\n\n or in PowerShell:\n\n ```bat\n & $env:ICU_ROOT\\bin64\\icuinfo\n ```\n\n - Linux/POSIX:\n - If the ICU is located in a non-regular place, set the `PKG_CONFIG_PATH` and `LD_LIBRARY_PATH` environment variables.\n For example, if the ICU is located in `/usr/local`:\n\n ```bash\n export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH\n export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH\n ```\n\n - To verify settings using *pkg-config*:\n\n ```bash\n $ pkg-config --cflags --libs icu-uc\n -I/usr/local/include -L/usr/local/lib -licuuc -licudata\n ```\n\n2. Installing from PyPI:\n\n ```bash\n pip install urlstd\n ```\n\n## Running Tests\n\nInstall dependencies:\n\n```bash\npipx install tox\n# or\npip install --user tox\n```\n\nTo run tests and generate a report:\n\n```bash\ngit clone https://github.com/miute/urlstd.git\ncd urlstd\ntox -e wpt\n```\n\nSee result: [tests/wpt/report.html](https://htmlpreview.github.io/?https://github.com/miute/urlstd/blob/main/tests/wpt/report.html)\n\n## License\n\n[MIT License](https://github.com/miute/urlstd/blob/main/LICENSE).\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python implementation of the WHATWG URL Standard",
"version": "2023.7.26.1",
"project_urls": {
"Documentation": "https://miute.github.io/urlstd/",
"Homepage": "https://github.com/miute/urlstd"
},
"split_keywords": [
"url",
"whatwg-url",
"url-standard",
"url-parser",
"url-parsing"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e2d216bd0ec40523996e527a3c948da40faee91682005ec932cfb7f955aa315c",
"md5": "82b727c17a3169eef8398573e132ea3a",
"sha256": "f0174403e956b3937038440e0da01742982b6e9711a2191b4ee79f84ae607b6f"
},
"downloads": -1,
"filename": "urlstd-2023.7.26.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "82b727c17a3169eef8398573e132ea3a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 37278,
"upload_time": "2023-09-12T07:49:14",
"upload_time_iso_8601": "2023-09-12T07:49:14.918185Z",
"url": "https://files.pythonhosted.org/packages/e2/d2/16bd0ec40523996e527a3c948da40faee91682005ec932cfb7f955aa315c/urlstd-2023.7.26.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d534ccc954ae7638071e312069250e4de8fc6925ce36f7f7c97e946ad21d7a61",
"md5": "ec8481fce2fccf93bde0fb22d50b0d6d",
"sha256": "8064d7a2034d3836cec844533b108af14429244d6119cfa6f268ef2bfc711358"
},
"downloads": -1,
"filename": "urlstd-2023.7.26.1.tar.gz",
"has_sig": false,
"md5_digest": "ec8481fce2fccf93bde0fb22d50b0d6d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 105682,
"upload_time": "2023-09-12T07:49:17",
"upload_time_iso_8601": "2023-09-12T07:49:17.992625Z",
"url": "https://files.pythonhosted.org/packages/d5/34/ccc954ae7638071e312069250e4de8fc6925ce36f7f7c97e946ad21d7a61/urlstd-2023.7.26.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-12 07:49:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "miute",
"github_project": "urlstd",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "urlstd"
}