urlstd


Nameurlstd JSON
Version 2023.7.26.1 PyPI version JSON
download
home_pagehttps://github.com/miute/urlstd
SummaryPython implementation of the WHATWG URL Standard
upload_time2023-09-12 07:49:17
maintainer
docs_urlNone
authorTetsuya Miura
requires_python>=3.8,<4.0
licenseMIT
keywords url whatwg-url url-standard url-parser url-parsing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # urlstd

[![PyPI](https://img.shields.io/pypi/v/urlstd)](https://pypi.org/project/urlstd/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/urlstd)](https://pypi.org/project/urlstd/)
[![PyPI - License](https://img.shields.io/pypi/l/urlstd)](https://pypi.org/project/urlstd/)
[![CI](https://github.com/miute/urlstd/actions/workflows/main.yml/badge.svg)](https://github.com/miute/urlstd/actions/workflows/main.yml)
[![codecov](https://codecov.io/gh/miute/urlstd/branch/main/graph/badge.svg?token=XJGM09H5TS)](https://codecov.io/gh/miute/urlstd)

`urlstd` is a Python implementation of the WHATWG [URL Living Standard](https://url.spec.whatwg.org/).

This library provides `URL` class, `URLSearchParams` class, and low-level APIs that comply with the URL specification.

## Supported APIs

- [URL class](https://url.spec.whatwg.org/#url-class)
  - class urlstd.parse.`URL(url: str, base: Optional[str | URL] = None)`
    - [canParse](https://url.spec.whatwg.org/#dom-url-canparse): classmethod `can_parse(url: str, base: Optional[str | URL] = None) -> bool`
    - stringifier: `__str__() -> str`
    - [href](https://url.spec.whatwg.org/#dom-url-href): `readonly property href: str`
    - [origin](https://url.spec.whatwg.org/#dom-url-origin): `readonly property origin: str`
    - [protocol](https://url.spec.whatwg.org/#dom-url-protocol): `property protocol: str`
    - [username](https://url.spec.whatwg.org/#dom-url-username): `property username: str`
    - [password](https://url.spec.whatwg.org/#dom-url-password): `property password: str`
    - [host](https://url.spec.whatwg.org/#dom-url-host): `property host: str`
    - [hostname](https://url.spec.whatwg.org/#dom-url-hostname): `property hostname: str`
    - [port](https://url.spec.whatwg.org/#dom-url-port): `property port: str`
    - [pathname](https://url.spec.whatwg.org/#dom-url-pathname): `property pathname: str`
    - [search](https://url.spec.whatwg.org/#dom-url-search): `property search: str`
    - [searchParams](https://url.spec.whatwg.org/#dom-url-searchparams): `readonly property search_params: URLSearchParams`
    - [hash](https://url.spec.whatwg.org/#dom-url-hash): `property hash: str`
    - [URL equivalence](https://url.spec.whatwg.org/#url-equivalence): `__eq__(other: Any) -> bool` and `equals(other: URL, exclude_fragments: bool = False) β†’ bool`

- [URLSearchParams class](https://url.spec.whatwg.org/#interface-urlsearchparams)
  - class urlstd.parse.`URLSearchParams(init: Optional[str | Sequence[Sequence[str | int | float]] | dict[str, str | int | float] | URLRecord | URLSearchParams] = None)`
    - [size](https://url.spec.whatwg.org/#dom-urlsearchparams-size): `__len__() -> int`
    - [append](https://url.spec.whatwg.org/#dom-urlsearchparams-append): `append(name: str, value: str | int | float) -> None`
    - [delete](https://url.spec.whatwg.org/#dom-urlsearchparams-delete): `delete(name: str, value: Optional[str | int | float] = None) -> None`
    - [get](https://url.spec.whatwg.org/#dom-urlsearchparams-get): `get(name: str) -> str | None`
    - [getAll](https://url.spec.whatwg.org/#dom-urlsearchparams-getall): `get_all(name: str) -> tuple[str, ...]`
    - [has](https://url.spec.whatwg.org/#dom-urlsearchparams-has): `has(name: str, value: Optional[str | int | float] = None) -> bool`
    - [set](https://url.spec.whatwg.org/#dom-urlsearchparams-set): `set(name: str, value: str | int | float) -> None`
    - [sort](https://url.spec.whatwg.org/#dom-urlsearchparams-sort): `sort() -> None`
    - iterable<USVString, USVString>: `__iter__() -> Iterator[tuple[str, str]]`
    - [stringifier](https://url.spec.whatwg.org/#urlsearchparams-stringification-behavior): `__str__() -> str`

- Low-level APIs

  - [URL parser](https://url.spec.whatwg.org/#concept-url-parser)
    - urlstd.parse.`parse_url(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = "utf-8") -> URLRecord`

  - [basic URL parser](https://url.spec.whatwg.org/#concept-basic-url-parser)
    - class urlstd.parse.`BasicURLParser`
      - classmethod `parse(urlstring: str, base: Optional[URLRecord] = None, encoding: str = "utf-8", url: Optional[URLRecord] = None, state_override: Optional[URLParserState] = None) -> URLRecord`

  - [URL record](https://url.spec.whatwg.org/#concept-url)
    - class urlstd.parse.`URLRecord`
      - [scheme](https://url.spec.whatwg.org/#concept-url-scheme): `property scheme: str = ""`
      - [username](https://url.spec.whatwg.org/#concept-url-username): `property username: str = ""`
      - [password](https://url.spec.whatwg.org/#concept-url-password): `property password: str = ""`
      - [host](https://url.spec.whatwg.org/#concept-url-host): `property host: Optional[str | int | tuple[int, ...]] = None`
      - [port](https://url.spec.whatwg.org/#concept-url-port): `property port: Optional[int] = None`
      - [path](https://url.spec.whatwg.org/#concept-url-path): `property path: list[str] | str = []`
      - [query](https://url.spec.whatwg.org/#concept-url-query): `property query: Optional[str] = None`
      - [fragment](https://url.spec.whatwg.org/#concept-url-fragment): `property fragment: Optional[str] = None`
      - [origin](https://url.spec.whatwg.org/#concept-url-origin): `readonly property origin: Origin | None`
      - [is special](https://url.spec.whatwg.org/#is-special): `is_special() -> bool`
      - [is not special](https://url.spec.whatwg.org/#is-not-special): `is_not_special() -> bool`
      - [includes credentials](https://url.spec.whatwg.org/#include-credentials): `includes_credentials() -> bool`
      - [has an opaque path](https://url.spec.whatwg.org/#url-opaque-path): `has_opaque_path() -> bool`
      - [cannot have a username/password/port](https://url.spec.whatwg.org/#cannot-have-a-username-password-port): `cannot_have_username_password_port() -> bool`
      - [URL serializer](https://url.spec.whatwg.org/#concept-url-serializer): `serialize_url(exclude_fragment: bool = False) -> str`
      - [host serializer](https://url.spec.whatwg.org/#concept-host-serializer): `serialize_host() -> str`
      - [URL path serializer](https://url.spec.whatwg.org/#url-path-serializer): `serialize_path() -> str`
      - [URL equivalence](https://url.spec.whatwg.org/#url-equivalence): `__eq__(other: Any) -> bool` and `equals(other: URLRecord, exclude_fragments: bool = False) β†’ bool`

  - [Hosts (domains and IP addresses)](https://url.spec.whatwg.org/#hosts-(domains-and-ip-addresses))
    - class urlstd.parse.`IDNA`
      - [domain to ASCII](https://url.spec.whatwg.org/#concept-domain-to-ascii): classmethod `domain_to_ascii(domain: str, be_strict: bool = False) -> str`
      - [domain to Unicode](https://url.spec.whatwg.org/#concept-domain-to-unicode): classmethod `domain_to_unicode(domain: str, be_strict: bool = False) -> str`
    - class urlstd.parse.`Host`
      - [host parser](https://url.spec.whatwg.org/#concept-host-parser): classmethod `parse(host: str, is_not_special: bool = False) -> str | int | tuple[int, ...]`
      - [host serializer](https://url.spec.whatwg.org/#concept-host-serializer): classmethod `serialize(host: str | int | Sequence[int]) -> str`

  - [percent-decode a string](https://url.spec.whatwg.org/#string-percent-decode)
    - urlstd.parse.`string_percent_decode(s: str) -> bytes`

  - [percent-encode after encoding](https://url.spec.whatwg.org/#string-percent-encode-after-encoding)
    - urlstd.parse.`string_percent_encode(s: str, safe: str, encoding: str = "utf-8", space_as_plus: bool = False) -> str`

  - [application/x-www-form-urlencoded parser](https://url.spec.whatwg.org/#concept-urlencoded-parser)
    - urlstd.parse.`parse_qsl(query: bytes) -> list[tuple[str, str]]`

  - [application/x-www-form-urlencoded serializer](https://url.spec.whatwg.org/#concept-urlencoded-serializer)
    - urlstd.parse.`urlencode(query: Sequence[tuple[str, str]], encoding: str = "utf-8") -> str`

  - Validation
    - class urlstd.parse.`HostValidator`
      - [valid host string](https://url.spec.whatwg.org/#valid-host-string): classmethod `is_valid(host: str) -> bool`
      - [valid domain string](https://url.spec.whatwg.org/#valid-domain-string): classmethod `is_valid_domain(domain: str) -> bool`
      - [valid IPv4-address string](https://url.spec.whatwg.org/#valid-ipv4-address-string): classmethod `is_valid_ipv4_address(address: str) -> bool`
      - [valid IPv6-address string](https://url.spec.whatwg.org/#valid-ipv6-address-string): classmethod `is_valid_ipv6_address(address: str) -> bool`
    - class urlstd.parse.`URLValidator`
      - [valid URL string](https://url.spec.whatwg.org/#valid-url-string): classmethod `is_valid(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = "utf-8") -> bool`
      - valid [URL-scheme string](https://url.spec.whatwg.org/#url-scheme-string): classmethod `is_valid_url_scheme(value: str) -> bool`

- Compatibility with standard library `urllib`
  - urlstd.parse.`urlparse(urlstring: str, base: str = None, encoding: str = "utf-8", allow_fragments: bool = True) -> urllib.parse.ParseResult`

    `urlstd.parse.urlparse()` ia an alternative to `urllib.parse.urlparse()`.
    Parses a string representation of a URL using the basic URL parser, and returns `urllib.parse.ParseResult`.

## Basic Usage

To parse a string into a `URL`:

```python
from urlstd.parse import URL
URL('http://user:pass@foo:21/bar;par?b#c')
# β†’ <URL(href='http://user:pass@foo:21/bar;par?b#c', origin='http://foo:21', protocol='http:', username='user', password='pass', host='foo:21', hostname='foo', port='21', pathname='/bar;par', search='?b', hash='#c')>
```

To parse a string into a `URL` with using a base URL:

```python
url = URL('?ffi&🌈', base='http://example.org')
url  # β†’ <URL(href='http://example.org/?%EF%AC%83&%F0%9F%8C%88', origin='http://example.org', protocol='http:', username='', password='', host='example.org', hostname='example.org', port='', pathname='/', search='?%EF%AC%83&%F0%9F%8C%88', hash='')>
url.search  # β†’ '?%EF%AC%83&%F0%9F%8C%88'
params = url.search_params
params  # β†’ URLSearchParams([('ffi', ''), ('🌈', '')])
params.sort()
params  # β†’ URLSearchParams([('🌈', ''), ('ffi', '')])
url.search  # β†’ '?%F0%9F%8C%88=&%EF%AC%83='
str(url)  # β†’ 'http://example.org/?%F0%9F%8C%88=&%EF%AC%83='
```

To validate a URL string:

```python
from urlstd.parse import URL, URLValidator, ValidityState
URL.can_parse('https://user:password@example.org/')  # β†’ True
URLValidator.is_valid('https://user:password@example.org/')  # β†’ False
validity = ValidityState()
URLValidator.is_valid('https://user:password@example.org/', validity=validity)
validity.valid  # β†’ False
validity.validation_errors  # β†’ 1
validity.descriptions[0]  # β†’ "invalid-credentials: input includes credentials: 'https://user:password@example.org/' at position 21"
```

```python
URL.can_parse('file:///C|/demo')  # β†’ True
URLValidator.is_valid('file:///C|/demo')  # β†’ False
validity = ValidityState()
URLValidator.is_valid('file:///C|/demo', validity=validity)  # β†’ False
validity.valid  # β†’ False
validity.validation_errors  # β†’ 1
validity.descriptions[0]  # β†’ "invalid-URL-unit: code point is found that is not a URL unit: U+007C (|) in 'file:///C|/demo' at position 9"
```

To parse a string into a `urllib.parse.ParseResult` with using a base URL:

```python
import html
from urllib.parse import unquote
from urlstd.parse import urlparse
pr = urlparse('?aΓΏb', base='http://example.org/foo/', encoding='utf-8')
pr  # β†’ ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%C3%BFb', fragment='')
unquote(pr.query)  # β†’ 'aΓΏb'
pr = urlparse('?aΓΏb', base='http://example.org/foo/', encoding='windows-1251')
pr  # β†’ ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%26%23255%3Bb', fragment='')
unquote(pr.query, encoding='windows-1251')  # β†’ 'a&#255;b'
html.unescape('a&#255;b')  # β†’ 'aΓΏb'
pr = urlparse('?aΓΏb', base='http://example.org/foo/', encoding='windows-1252')
pr  # β†’ ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%FFb', fragment='')
unquote(pr.query, encoding='windows-1252')  # β†’ 'aΓΏb'
```

## Logging

`urlstd` uses standard library [logging](https://docs.python.org/3/library/logging.html) for [validation error](https://url.spec.whatwg.org/#validation-error).
Change the logger log level of `urlstd` if needed:

```python
logging.getLogger('urlstd').setLevel(logging.ERROR)
```

## Dependencies

- [icupy](https://pypi.org/project/icupy/) >= 0.11.0 ([pre-built packages](https://github.com/miute/icupy/releases) are available)
  - `icupy` requirements:
    - [ICU4C](https://github.com/unicode-org/icu/releases) ([ICU - International Components for Unicode](https://icu.unicode.org/)) - latest version recommended
    - C++17 compatible compiler (see [supported compilers](https://github.com/pybind/pybind11#supported-compilers))
    - [CMake](https://cmake.org/) >= 3.7

## Installation

1. Configuring environment variables for icupy (ICU):
    - Windows:
      - Set the `ICU_ROOT` environment variable to the root of the ICU installation (default is `C:\icu`).
        For example, if the ICU is located in `C:\icu4c`:

        ```bat
        set ICU_ROOT=C:\icu4c
        ```

        or in PowerShell:

        ```bat
        $env:ICU_ROOT = "C:\icu4c"
        ```

      - To verify settings using *icuinfo (64 bit)*:

        ```bat
        %ICU_ROOT%\bin64\icuinfo
        ```

        or in PowerShell:

        ```bat
        & $env:ICU_ROOT\bin64\icuinfo
        ```

    - Linux/POSIX:
      - If the ICU is located in a non-regular place, set the `PKG_CONFIG_PATH` and `LD_LIBRARY_PATH` environment variables.
        For example, if the ICU is located in `/usr/local`:

        ```bash
        export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
        export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
        ```

      - To verify settings using *pkg-config*:

        ```bash
        $ pkg-config --cflags --libs icu-uc
        -I/usr/local/include -L/usr/local/lib -licuuc -licudata
        ```

2. Installing from PyPI:

    ```bash
    pip install urlstd
    ```

## Running Tests

Install dependencies:

```bash
pipx install tox
# or
pip install --user tox
```

To run tests and generate a report:

```bash
git clone https://github.com/miute/urlstd.git
cd urlstd
tox -e wpt
```

See result: [tests/wpt/report.html](https://htmlpreview.github.io/?https://github.com/miute/urlstd/blob/main/tests/wpt/report.html)

## License

[MIT License](https://github.com/miute/urlstd/blob/main/LICENSE).


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/miute/urlstd",
    "name": "urlstd",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "url,whatwg-url,url-standard,url-parser,url-parsing",
    "author": "Tetsuya Miura",
    "author_email": "miute.dev@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d5/34/ccc954ae7638071e312069250e4de8fc6925ce36f7f7c97e946ad21d7a61/urlstd-2023.7.26.1.tar.gz",
    "platform": null,
    "description": "# urlstd\n\n[![PyPI](https://img.shields.io/pypi/v/urlstd)](https://pypi.org/project/urlstd/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/urlstd)](https://pypi.org/project/urlstd/)\n[![PyPI - License](https://img.shields.io/pypi/l/urlstd)](https://pypi.org/project/urlstd/)\n[![CI](https://github.com/miute/urlstd/actions/workflows/main.yml/badge.svg)](https://github.com/miute/urlstd/actions/workflows/main.yml)\n[![codecov](https://codecov.io/gh/miute/urlstd/branch/main/graph/badge.svg?token=XJGM09H5TS)](https://codecov.io/gh/miute/urlstd)\n\n`urlstd` is a Python implementation of the WHATWG [URL Living Standard](https://url.spec.whatwg.org/).\n\nThis library provides `URL` class, `URLSearchParams` class, and low-level APIs that comply with the URL specification.\n\n## Supported APIs\n\n- [URL class](https://url.spec.whatwg.org/#url-class)\n  - class urlstd.parse.`URL(url: str, base: Optional[str | URL] = None)`\n    - [canParse](https://url.spec.whatwg.org/#dom-url-canparse): classmethod `can_parse(url: str, base: Optional[str | URL] = None) -> bool`\n    - stringifier: `__str__() -> str`\n    - [href](https://url.spec.whatwg.org/#dom-url-href): `readonly property href: str`\n    - [origin](https://url.spec.whatwg.org/#dom-url-origin): `readonly property origin: str`\n    - [protocol](https://url.spec.whatwg.org/#dom-url-protocol): `property protocol: str`\n    - [username](https://url.spec.whatwg.org/#dom-url-username): `property username: str`\n    - [password](https://url.spec.whatwg.org/#dom-url-password): `property password: str`\n    - [host](https://url.spec.whatwg.org/#dom-url-host): `property host: str`\n    - [hostname](https://url.spec.whatwg.org/#dom-url-hostname): `property hostname: str`\n    - [port](https://url.spec.whatwg.org/#dom-url-port): `property port: str`\n    - [pathname](https://url.spec.whatwg.org/#dom-url-pathname): `property pathname: str`\n    - [search](https://url.spec.whatwg.org/#dom-url-search): `property search: str`\n    - [searchParams](https://url.spec.whatwg.org/#dom-url-searchparams): `readonly property search_params: URLSearchParams`\n    - [hash](https://url.spec.whatwg.org/#dom-url-hash): `property hash: str`\n    - [URL equivalence](https://url.spec.whatwg.org/#url-equivalence): `__eq__(other: Any) -> bool` and `equals(other: URL, exclude_fragments: bool = False) \u2192 bool`\n\n- [URLSearchParams class](https://url.spec.whatwg.org/#interface-urlsearchparams)\n  - class urlstd.parse.`URLSearchParams(init: Optional[str | Sequence[Sequence[str | int | float]] | dict[str, str | int | float] | URLRecord | URLSearchParams] = None)`\n    - [size](https://url.spec.whatwg.org/#dom-urlsearchparams-size): `__len__() -> int`\n    - [append](https://url.spec.whatwg.org/#dom-urlsearchparams-append): `append(name: str, value: str | int | float) -> None`\n    - [delete](https://url.spec.whatwg.org/#dom-urlsearchparams-delete): `delete(name: str, value: Optional[str | int | float] = None) -> None`\n    - [get](https://url.spec.whatwg.org/#dom-urlsearchparams-get): `get(name: str) -> str | None`\n    - [getAll](https://url.spec.whatwg.org/#dom-urlsearchparams-getall): `get_all(name: str) -> tuple[str, ...]`\n    - [has](https://url.spec.whatwg.org/#dom-urlsearchparams-has): `has(name: str, value: Optional[str | int | float] = None) -> bool`\n    - [set](https://url.spec.whatwg.org/#dom-urlsearchparams-set): `set(name: str, value: str | int | float) -> None`\n    - [sort](https://url.spec.whatwg.org/#dom-urlsearchparams-sort): `sort() -> None`\n    - iterable<USVString, USVString>: `__iter__() -> Iterator[tuple[str, str]]`\n    - [stringifier](https://url.spec.whatwg.org/#urlsearchparams-stringification-behavior): `__str__() -> str`\n\n- Low-level APIs\n\n  - [URL parser](https://url.spec.whatwg.org/#concept-url-parser)\n    - urlstd.parse.`parse_url(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = \"utf-8\") -> URLRecord`\n\n  - [basic URL parser](https://url.spec.whatwg.org/#concept-basic-url-parser)\n    - class urlstd.parse.`BasicURLParser`\n      - classmethod `parse(urlstring: str, base: Optional[URLRecord] = None, encoding: str = \"utf-8\", url: Optional[URLRecord] = None, state_override: Optional[URLParserState] = None) -> URLRecord`\n\n  - [URL record](https://url.spec.whatwg.org/#concept-url)\n    - class urlstd.parse.`URLRecord`\n      - [scheme](https://url.spec.whatwg.org/#concept-url-scheme): `property scheme: str = \"\"`\n      - [username](https://url.spec.whatwg.org/#concept-url-username): `property username: str = \"\"`\n      - [password](https://url.spec.whatwg.org/#concept-url-password): `property password: str = \"\"`\n      - [host](https://url.spec.whatwg.org/#concept-url-host): `property host: Optional[str | int | tuple[int, ...]] = None`\n      - [port](https://url.spec.whatwg.org/#concept-url-port): `property port: Optional[int] = None`\n      - [path](https://url.spec.whatwg.org/#concept-url-path): `property path: list[str] | str = []`\n      - [query](https://url.spec.whatwg.org/#concept-url-query): `property query: Optional[str] = None`\n      - [fragment](https://url.spec.whatwg.org/#concept-url-fragment): `property fragment: Optional[str] = None`\n      - [origin](https://url.spec.whatwg.org/#concept-url-origin): `readonly property origin: Origin | None`\n      - [is special](https://url.spec.whatwg.org/#is-special): `is_special() -> bool`\n      - [is not special](https://url.spec.whatwg.org/#is-not-special): `is_not_special() -> bool`\n      - [includes credentials](https://url.spec.whatwg.org/#include-credentials): `includes_credentials() -> bool`\n      - [has an opaque path](https://url.spec.whatwg.org/#url-opaque-path): `has_opaque_path() -> bool`\n      - [cannot have a username/password/port](https://url.spec.whatwg.org/#cannot-have-a-username-password-port): `cannot_have_username_password_port() -> bool`\n      - [URL serializer](https://url.spec.whatwg.org/#concept-url-serializer): `serialize_url(exclude_fragment: bool = False) -> str`\n      - [host serializer](https://url.spec.whatwg.org/#concept-host-serializer): `serialize_host() -> str`\n      - [URL path serializer](https://url.spec.whatwg.org/#url-path-serializer): `serialize_path() -> str`\n      - [URL equivalence](https://url.spec.whatwg.org/#url-equivalence): `__eq__(other: Any) -> bool` and `equals(other: URLRecord, exclude_fragments: bool = False) \u2192 bool`\n\n  - [Hosts (domains and IP addresses)](https://url.spec.whatwg.org/#hosts-(domains-and-ip-addresses))\n    - class urlstd.parse.`IDNA`\n      - [domain to ASCII](https://url.spec.whatwg.org/#concept-domain-to-ascii): classmethod `domain_to_ascii(domain: str, be_strict: bool = False) -> str`\n      - [domain to Unicode](https://url.spec.whatwg.org/#concept-domain-to-unicode): classmethod `domain_to_unicode(domain: str, be_strict: bool = False) -> str`\n    - class urlstd.parse.`Host`\n      - [host parser](https://url.spec.whatwg.org/#concept-host-parser): classmethod `parse(host: str, is_not_special: bool = False) -> str | int | tuple[int, ...]`\n      - [host serializer](https://url.spec.whatwg.org/#concept-host-serializer): classmethod `serialize(host: str | int | Sequence[int]) -> str`\n\n  - [percent-decode a string](https://url.spec.whatwg.org/#string-percent-decode)\n    - urlstd.parse.`string_percent_decode(s: str) -> bytes`\n\n  - [percent-encode after encoding](https://url.spec.whatwg.org/#string-percent-encode-after-encoding)\n    - urlstd.parse.`string_percent_encode(s: str, safe: str, encoding: str = \"utf-8\", space_as_plus: bool = False) -> str`\n\n  - [application/x-www-form-urlencoded parser](https://url.spec.whatwg.org/#concept-urlencoded-parser)\n    - urlstd.parse.`parse_qsl(query: bytes) -> list[tuple[str, str]]`\n\n  - [application/x-www-form-urlencoded serializer](https://url.spec.whatwg.org/#concept-urlencoded-serializer)\n    - urlstd.parse.`urlencode(query: Sequence[tuple[str, str]], encoding: str = \"utf-8\") -> str`\n\n  - Validation\n    - class urlstd.parse.`HostValidator`\n      - [valid host string](https://url.spec.whatwg.org/#valid-host-string): classmethod `is_valid(host: str) -> bool`\n      - [valid domain string](https://url.spec.whatwg.org/#valid-domain-string): classmethod `is_valid_domain(domain: str) -> bool`\n      - [valid IPv4-address string](https://url.spec.whatwg.org/#valid-ipv4-address-string): classmethod `is_valid_ipv4_address(address: str) -> bool`\n      - [valid IPv6-address string](https://url.spec.whatwg.org/#valid-ipv6-address-string): classmethod `is_valid_ipv6_address(address: str) -> bool`\n    - class urlstd.parse.`URLValidator`\n      - [valid URL string](https://url.spec.whatwg.org/#valid-url-string): classmethod `is_valid(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = \"utf-8\") -> bool`\n      - valid [URL-scheme string](https://url.spec.whatwg.org/#url-scheme-string): classmethod `is_valid_url_scheme(value: str) -> bool`\n\n- Compatibility with standard library `urllib`\n  - urlstd.parse.`urlparse(urlstring: str, base: str = None, encoding: str = \"utf-8\", allow_fragments: bool = True) -> urllib.parse.ParseResult`\n\n    `urlstd.parse.urlparse()` ia an alternative to `urllib.parse.urlparse()`.\n    Parses a string representation of a URL using the basic URL parser, and returns `urllib.parse.ParseResult`.\n\n## Basic Usage\n\nTo parse a string into a `URL`:\n\n```python\nfrom urlstd.parse import URL\nURL('http://user:pass@foo:21/bar;par?b#c')\n# \u2192 <URL(href='http://user:pass@foo:21/bar;par?b#c', origin='http://foo:21', protocol='http:', username='user', password='pass', host='foo:21', hostname='foo', port='21', pathname='/bar;par', search='?b', hash='#c')>\n```\n\nTo parse a string into a `URL` with using a base URL:\n\n```python\nurl = URL('?\ufb03&\ud83c\udf08', base='http://example.org')\nurl  # \u2192 <URL(href='http://example.org/?%EF%AC%83&%F0%9F%8C%88', origin='http://example.org', protocol='http:', username='', password='', host='example.org', hostname='example.org', port='', pathname='/', search='?%EF%AC%83&%F0%9F%8C%88', hash='')>\nurl.search  # \u2192 '?%EF%AC%83&%F0%9F%8C%88'\nparams = url.search_params\nparams  # \u2192 URLSearchParams([('\ufb03', ''), ('\ud83c\udf08', '')])\nparams.sort()\nparams  # \u2192 URLSearchParams([('\ud83c\udf08', ''), ('\ufb03', '')])\nurl.search  # \u2192 '?%F0%9F%8C%88=&%EF%AC%83='\nstr(url)  # \u2192 'http://example.org/?%F0%9F%8C%88=&%EF%AC%83='\n```\n\nTo validate a URL string:\n\n```python\nfrom urlstd.parse import URL, URLValidator, ValidityState\nURL.can_parse('https://user:password@example.org/')  # \u2192 True\nURLValidator.is_valid('https://user:password@example.org/')  # \u2192 False\nvalidity = ValidityState()\nURLValidator.is_valid('https://user:password@example.org/', validity=validity)\nvalidity.valid  # \u2192 False\nvalidity.validation_errors  # \u2192 1\nvalidity.descriptions[0]  # \u2192 \"invalid-credentials: input includes credentials: 'https://user:password@example.org/' at position 21\"\n```\n\n```python\nURL.can_parse('file:///C|/demo')  # \u2192 True\nURLValidator.is_valid('file:///C|/demo')  # \u2192 False\nvalidity = ValidityState()\nURLValidator.is_valid('file:///C|/demo', validity=validity)  # \u2192 False\nvalidity.valid  # \u2192 False\nvalidity.validation_errors  # \u2192 1\nvalidity.descriptions[0]  # \u2192 \"invalid-URL-unit: code point is found that is not a URL unit: U+007C (|) in 'file:///C|/demo' at position 9\"\n```\n\nTo parse a string into a `urllib.parse.ParseResult` with using a base URL:\n\n```python\nimport html\nfrom urllib.parse import unquote\nfrom urlstd.parse import urlparse\npr = urlparse('?a\u00ffb', base='http://example.org/foo/', encoding='utf-8')\npr  # \u2192 ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%C3%BFb', fragment='')\nunquote(pr.query)  # \u2192 'a\u00ffb'\npr = urlparse('?a\u00ffb', base='http://example.org/foo/', encoding='windows-1251')\npr  # \u2192 ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%26%23255%3Bb', fragment='')\nunquote(pr.query, encoding='windows-1251')  # \u2192 'a&#255;b'\nhtml.unescape('a&#255;b')  # \u2192 'a\u00ffb'\npr = urlparse('?a\u00ffb', base='http://example.org/foo/', encoding='windows-1252')\npr  # \u2192 ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%FFb', fragment='')\nunquote(pr.query, encoding='windows-1252')  # \u2192 'a\u00ffb'\n```\n\n## Logging\n\n`urlstd` uses standard library [logging](https://docs.python.org/3/library/logging.html) for [validation error](https://url.spec.whatwg.org/#validation-error).\nChange the logger log level of `urlstd` if needed:\n\n```python\nlogging.getLogger('urlstd').setLevel(logging.ERROR)\n```\n\n## Dependencies\n\n- [icupy](https://pypi.org/project/icupy/) >= 0.11.0 ([pre-built packages](https://github.com/miute/icupy/releases) are available)\n  - `icupy` requirements:\n    - [ICU4C](https://github.com/unicode-org/icu/releases) ([ICU - International Components for Unicode](https://icu.unicode.org/)) - latest version recommended\n    - C++17 compatible compiler (see [supported compilers](https://github.com/pybind/pybind11#supported-compilers))\n    - [CMake](https://cmake.org/) >= 3.7\n\n## Installation\n\n1. Configuring environment variables for icupy (ICU):\n    - Windows:\n      - Set the `ICU_ROOT` environment variable to the root of the ICU installation (default is `C:\\icu`).\n        For example, if the ICU is located in `C:\\icu4c`:\n\n        ```bat\n        set ICU_ROOT=C:\\icu4c\n        ```\n\n        or in PowerShell:\n\n        ```bat\n        $env:ICU_ROOT = \"C:\\icu4c\"\n        ```\n\n      - To verify settings using *icuinfo (64 bit)*:\n\n        ```bat\n        %ICU_ROOT%\\bin64\\icuinfo\n        ```\n\n        or in PowerShell:\n\n        ```bat\n        & $env:ICU_ROOT\\bin64\\icuinfo\n        ```\n\n    - Linux/POSIX:\n      - If the ICU is located in a non-regular place, set the `PKG_CONFIG_PATH` and `LD_LIBRARY_PATH` environment variables.\n        For example, if the ICU is located in `/usr/local`:\n\n        ```bash\n        export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH\n        export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH\n        ```\n\n      - To verify settings using *pkg-config*:\n\n        ```bash\n        $ pkg-config --cflags --libs icu-uc\n        -I/usr/local/include -L/usr/local/lib -licuuc -licudata\n        ```\n\n2. Installing from PyPI:\n\n    ```bash\n    pip install urlstd\n    ```\n\n## Running Tests\n\nInstall dependencies:\n\n```bash\npipx install tox\n# or\npip install --user tox\n```\n\nTo run tests and generate a report:\n\n```bash\ngit clone https://github.com/miute/urlstd.git\ncd urlstd\ntox -e wpt\n```\n\nSee result: [tests/wpt/report.html](https://htmlpreview.github.io/?https://github.com/miute/urlstd/blob/main/tests/wpt/report.html)\n\n## License\n\n[MIT License](https://github.com/miute/urlstd/blob/main/LICENSE).\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python implementation of the WHATWG URL Standard",
    "version": "2023.7.26.1",
    "project_urls": {
        "Documentation": "https://miute.github.io/urlstd/",
        "Homepage": "https://github.com/miute/urlstd"
    },
    "split_keywords": [
        "url",
        "whatwg-url",
        "url-standard",
        "url-parser",
        "url-parsing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e2d216bd0ec40523996e527a3c948da40faee91682005ec932cfb7f955aa315c",
                "md5": "82b727c17a3169eef8398573e132ea3a",
                "sha256": "f0174403e956b3937038440e0da01742982b6e9711a2191b4ee79f84ae607b6f"
            },
            "downloads": -1,
            "filename": "urlstd-2023.7.26.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "82b727c17a3169eef8398573e132ea3a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 37278,
            "upload_time": "2023-09-12T07:49:14",
            "upload_time_iso_8601": "2023-09-12T07:49:14.918185Z",
            "url": "https://files.pythonhosted.org/packages/e2/d2/16bd0ec40523996e527a3c948da40faee91682005ec932cfb7f955aa315c/urlstd-2023.7.26.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d534ccc954ae7638071e312069250e4de8fc6925ce36f7f7c97e946ad21d7a61",
                "md5": "ec8481fce2fccf93bde0fb22d50b0d6d",
                "sha256": "8064d7a2034d3836cec844533b108af14429244d6119cfa6f268ef2bfc711358"
            },
            "downloads": -1,
            "filename": "urlstd-2023.7.26.1.tar.gz",
            "has_sig": false,
            "md5_digest": "ec8481fce2fccf93bde0fb22d50b0d6d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 105682,
            "upload_time": "2023-09-12T07:49:17",
            "upload_time_iso_8601": "2023-09-12T07:49:17.992625Z",
            "url": "https://files.pythonhosted.org/packages/d5/34/ccc954ae7638071e312069250e4de8fc6925ce36f7f7c97e946ad21d7a61/urlstd-2023.7.26.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-12 07:49:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "miute",
    "github_project": "urlstd",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "urlstd"
}
        
Elapsed time: 0.76158s