dataclass-binder


Namedataclass-binder JSON
Version 0.3.3 PyPI version JSON
download
home_pagehttps://github.com/ProtixIT/dataclass-binder
SummaryLibrary to bind TOML data to Python dataclasses in a type-safe way.
upload_time2023-07-31 13:46:55
maintainer
docs_urlNone
authorMaarten ter Huurne
requires_python>=3.10,<4.0
licenseMIT
keywords dataclass toml bind binding
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Dataclass Binder

Library to bind TOML data to Python dataclasses in a type-safe way.


## Features

Currently it has the following properties that might set it apart from other data binding libraries:

- requires Python 3.10+
- relies only on dataclasses from the Python standard library
- detailed error messages which mention location, expected data and actual data
- strict parsing which considers unknown keys to be errors
- support for durations (`timedelta`)
- support for immutable (frozen) dataclasses
- can bind data from files, I/O streams or pre-parsed dictionaries
- can generate configuration templates from dataclass definitions

This library was originally designed for parsing configuration files.
As TOML's data model is very similar to JSON's, adding support for JSON in the future would be an option and would make the library useful for binding HTTP API requests.


## Maturity

This library is fully type-checked, has unit tests which provide 100% branch coverage and is used in production, so it should be reliable.

The API might still change in incompatible ways until the 1.0 release.
In particular the following aspects are subject to change:

- use of key suffixes for `timedelta`: this mechanism doesn't work for arrays
- the handling of separators in keys: currently `-` in TOML is mapped to `_` in Python and `_` is forbidden in TOML; most applications seem to accept both `-` and `_` in TOML instead


## Why Dataclasses?

A typical TOML, JSON or YAML parser returns the parse results as a nested dictionary.
You might wonder why you would want to use a data binding library rather than just getting the values directly from that dictionary.

Let's take the following example code for a service that connects to a database using a connection URL configured in a TOML file:

```py
import tomllib  # or 'tomli' on Python <3.11


def read_config() -> dict:
    with open("config.toml", "rb") as f:
        config = tomllib.load(f)
    return config

def handle_request(config: dict) -> None:
    url = config["database-url"]
    print("connect to database:", url)

config = read_config()
...
handle_request(config)
```

If the configuration is missing a `database-url` key or its value is not a string, this service would start up without complaints and then fail when the first requests comes in.
It would be better to instead check the configuration on startup, so let's add code for that:

```py
def read_config():
    with open("config.toml", "rb") as f:
        config = tomllib.load(f)

    url = config["database-url"]
    if not isinstance(url, str):
        raise TypeError(
            f"Value for 'database-url' has type '{type(url).__name__}', expected 'str'"
        )

    return config
```

Imagine you have 20 different configurable options: you'd need this code 20 times.

Now let's assume that you use a type checker like `mypy`.
Inside `read_config()`, the type checker will know that `url` is a `str`, but if you fetch the same value elsewhere in the code, that information is lost:

```py
def handle_request(config: dict) -> None:
    url = config["database-url"]
    reveal_type(url)
    print("connect to database:", url)
```

When you run `mypy` on this code, it will output 'Revealed type is "Any"'.
Falling back to `Any` means type checking will not be able to find type mismatches and autocomplete in an IDE will not work well either.

Declaring the desired type in a dataclass solves both these issues:
- the type can be verified at runtime before instantiating the dataclass
- tooling knows the type when you read the value from the dataclass

Having the dataclass as a central and formal place for defining the configuration format is also an advantage.
For example, it enables automatic generation of a documented configuration file template.


## Usage

The `dataclass_binder` module contains the `Binder` class which makes it easy to bind TOML data, such as a configuration file, to Python [dataclasses](https://docs.python.org/3/library/dataclasses.html).

The binding is a two-step process:
- instantiate the `Binder` class by passing your top-level dataclass as an argument
- call the `parse_toml()` method, providing the path of the configuration file as its argument

Put together, the code looks like this:

```py
import logging
import sys
from pathlib import Path

from dataclass_binder import Binder


logger = logging.getLogger(__name__)

if __name__ == "__main__":
    config_file = Path("config.toml")
    try:
        config = Binder(Config).parse_toml(config_file)
    except Exception as ex:
        logger.critical("Error reading configuration file '%s': %s", config_file, ex)
        sys.exit(1)
```

### Binding a Pre-parsed Dictionary

If you don't want to bind the contents of a full file, there is also the option to bind a pre-parsed dictionary instead.
For this, you can use the `bind()` method on the `Binder` object.

For example, the following service is configured by one table within a larger TOML configuration file:

```py
import tomllib  # or 'tomli' on Python <3.11
from dataclass_binder import Binder


with open("config.toml", "rb") as f:
    config = tomllib.load(f)
service_config = Binder(ServiceConfig).bind(config["service"])
```

To keep these examples short, from now on `import` statements will only be included the first time a particular imported name is used.

### Basic Types

Dataclass fields correspond to TOML keys. In the dataclass, underscores are used as word separators, while dashes are used in the TOML file. Let's configure a service that listens on a TCP port for requests and stores its data in a database, using the following TOML fragment:

```toml
database-url = 'postgresql://user:password@host/db'
port = 8080
```

This configuration can be bound to the following dataclass:

```py
from dataclasses import dataclass

@dataclass
class Config:
    database_url: str
    port: int
    verbose: bool
```

The `float` type can be used to bind floating point numbers.
Support for `Decimal` is not there at the moment but would be relatively easy to add, as `tomllib`/`tomli` has an option for that.

### Defaults

Fields can be made optional by assigning a default value. Using `None` as a default value is allowed too:

```py
@dataclass
class Config:
    verbose: bool = False
    webhook_url: str | None = None
```

If you want to mix fields with and without defaults in any order, mark the fields as keyword-only:

```py
@dataclass(kw_only=True)
class Config:
    database_url: str
    verbose: bool = False
    port: int
```

### Dates and Times

TOML handles dates and timestamps as first-class values.
Date, time and date+time TOML values are bound to `datetime.date`, `datetime.time` and `datetime.datetime` Python objects respectively.

There is also support for time intervals using `datetime.timedelta`:

```py
from datetime import timedelta

@dataclass
class Config:
    retry_after: timedelta
    delete_after: timedelta
```

Intervals shorter than a day can be specified using a TOML time value.
Longer intervals are supported by adding an `-hours`, `-days`, or `-weeks` suffix.
Other supported suffixes are `-minutes`, `-seconds`, `-milliseconds` and `-microseconds`, but these are there for completeness sake and less likely to be useful.
Here is an example TOML fragment corresponding to the dataclass above:

```toml
retry-after = 00:02:30
delete-after-days = 30
```

### Collections

Lists and dictionaries can be used to bind TOML arrays and tables.
If you want to make a `list` or `dict` optional, you need to provide a default value via the `default_factory` mechanism as usual, see the [dataclasses documentation](https://docs.python.org/3/library/dataclasses.html#mutable-default-values) for details.

```py
from dataclasses import dataclass, field

@dataclass
class Config:
    tags: list[str] = field(default_factory=list)
    limits: dict[str, int]
```

The dataclass above can be used to bind the following TOML fragment:

```toml
tags = ["production", "development"]
limits = {ram-gb = 1, disk-gb = 100}
```

An alternative to `default_factory` is to use a homogeneous (single element type) tuple:

```py
@dataclass
class Config:
    tags: tuple[str, ...] = ()
    limits: dict[str, int]
```

Heterogeneous tuples are supported too: for example `tuple[str, bool]` binds a TOML array that must always have a string as its first element and a Boolean as its second and last element.
It is generally clearer though to define a separate dataclass when you need more than one value to configure something:

```py
@dataclass
class Webhook:
    url: str
    token: str

@dataclass
class Config:
    webhooks: tuple[Webhook, ...] = ()
```

The extra keys (`url` and `token` in this example) provide the clarity:

```toml
webhooks = [
    {url = "https://host1/notify", token = "12345"},
    {url = "https://host2/hook", token = "frperg"}
]
```

TOML's array-of-tables syntax can make this configuration a bit easier on the eyes:

```toml
[[webhooks]]
url = "https://host1/notify"
token = "12345"

[[webhooks]]
url = "https://host2/hook"
token = "frperg"
```

Always define additional dataclasses at the module level in your Python code: if the class is for example defined inside a function, the `Binder` constructor will not be able to find it.

### Untyped Data

Sometimes the full structure of the data you want to bind is either too complex or too much in flux to be worth fully annotating.
In such a situation, you can use `typing.Any` as the annotation to simply capture the output of Python's TOML parser without type-checking it.

In the following example, a service uses the Python standard library logging implementation, configured using the [configuration dictionary schema](https://docs.python.org/3/library/logging.config.html#logging-config-dictschema):

```py
import logging.config
from dataclasses import dataclass
from typing import Any

from dataclass_binder import Binder


@dataclass
class Config:
    database_url: str
    logging: Any


def run(url: str) -> None:
    logging.info("Service starting")


if __name__ == "__main__":
    config = Binder[Config].parse_toml("service.toml")
    logging.config.dictConfig(config.logging)
    run(config.database_url)
```

The `service.toml` configuration file for this service could look like this:

```toml
database-url = 'postgresql://user:password@host/db'

[logging]
version = 1

[logging.root]
level = 'INFO'
handlers = ['file']

[logging.handlers.file]
class = 'logging.handlers.RotatingFileHandler'
filename = 'service.log'
formatter = 'simple'

[logging.formatters.simple]
format = '%(asctime)s %(name)s %(levelname)s %(message)s'
```

### Plugins

To select plugins to activate, you can bind Python classes or modules using `type[BaseClass]` and `types.ModuleType` annotations respectively:

```py
from dataclasses import dataclass, field
from types import ModuleType

from supertool.plugins import PostProcessor


@dataclass
class PluginConfig:
    postprocessors = tuple[type[PostProcessor], ...] = ()
    modules: dict[str, ModuleType] = field(default_factory=dict)
```

In the TOML, you specify Python classes or modules using their fully qualified names:

```toml
postprocessors = ["supertool_addons.reporters.JSONReport"]
modules = {lint = "supertool_addons.linter"}
```

There is no mechanism yet to add configuration to be used by the plugins.

### Immutable

If you prefer immutable configuration objects, you can achieve that using the `frozen` argument of the `dataclass` decorator and using abstract collection types in the annotations. For example, the following dataclass will be instantiated with a `tuple` object for `tags` and an immutable dictionary view for `limits`:

```py
from collections.abc import Mapping, Sequence


@dataclass(frozen=True)
class Config:
    tags: Sequence[str] = ()
    limits: Mapping[str, int]
```

### Layered Binding

`Binder` can be instantiated from a dataclass object rather than the dataclass itself.
The dataclass object will provide new default values when binding data to it.
This can be used to implement a layered configuration parsing mechanism, where there is a default configuration that can be customized using a system-wide configuration file and/or a per-user configuration file:

```py
config = Config()
if system_config_path.exists():
    config = Binder(config).parse_toml(system_config_path)
if user_config_path.exists():
    config = Binder(config).parse_toml(user_config_path)
```

Later layers can override individual fields in nested dataclasses, allowing fine-grained configuration merging, but collections are replaced whole instead of merged.

### Generating a Configuration Template

To provide users with a starting point for configuring your application/service, you can automatically generate a configuration template from the information in the dataclass.

For example, when the following dataclass defines your configuration:

```py
@dataclass
class Config:
    database_url: str
    """The URL of the database to connect to."""

    port: int = 12345
    """TCP port on which to accept connections."""
```

You can generate a template configuration file using:

```py
from dataclass_binder import Binder


for line in Binder(Config).format_toml_template():
    print(line)
```

Which will print:

```toml
# The URL of the database to connect to.
# Mandatory.
database-url = '???'

# TCP port on which to accept connections.
# Default:
# port = 12345
```

It is also possible to provide placeholder values by passing a dataclass instance rather than the dataclass itself to `format_toml_template()`:

```py
TEMPLATE = Config(
    database_url="postgresql://<username>:<password>@<hostname>/<database name>",
    port=8080,
)

for line in Binder(TEMPLATE).format_toml_template():
    print(line)
```

Which will print:

```toml
# The URL of the database to connect to.
# Mandatory.
database-url = 'postgresql://<username>:<password>@<hostname>/<database name>'

# TCP port on which to accept connections.
# Default:
# port = 12345
port = 8080
```

### Generating a Compact Configuration File

If you want to generate a fully populated configuration, you can use the `format_toml()` method.
Compared to the template formatting, this leaves out optional parts for which no data has been provided.

For example, when the following dataclass defines your configuration:

```py
@dataclass
class Config:
    path: str
    """Path of input file."""

    verbose: bool = False
    """Be extra verbose in logging."""

    options: dict[str, Any] = field(default_factory=dict)
    """Various named options that are passed on to tool XYZ."""
```

This code generates a populated configuration file:

```py
config = Config(path="./data")

with open("config.toml", "w") as out:
    for line in Binder(config).format_toml():
        print(line, file=out)
```

With the contents of `config.toml` containing only the `path` field:

```toml
# Path of input file.
path = './data'
```

### Troubleshooting

Finally, a troubleshooting tip: instead of the full `Binder(Config).parse_toml()`, first try to execute only `Binder(Config)`.
If that fails, the problem is in the dataclass definitions.
If that succeeds, but the `parse_toml()` call fails, the problem is that the TOML file does not match the format defined in the dataclasses.


## Development Environment

[Poetry](https://python-poetry.org/) is used to set up a virtual environment with all the dependencies and development tools that you need:

    $ cd dataclass-binder
    $ poetry install

You can activate a shell which contains the development tools in its search path:

    $ poetry shell

We recommend setting up pre-commit hooks for Git in the `dataclass-binder` work area.
These hooks automatically run a few simple checks and cleanups when you create a new commit.
After you first set up your virtual environment with Poetry, run this command to install the pre-commit hooks:

    $ pre-commit install


## Release Procedure

- Verify that CI passes on the branch that you want to release (typically `main`)
- Create a release on the GitHub web interface; name the tag `v<major>.<minor>.<patchlevel>`
- After publishing the release on GitHub, the package will be built and published on PyPI automatically via Actions


## Deprecations

### Binder Specialization

Prior to version 0.2.0, the `Binder` class was specialized using a type argument (`Binder[Config]`) rather than instantiation (`Binder(config)`). The old syntax is still supported for now, but the backwards compatibility might be removed in a minor release prior to 1.0 if it becomes a maintenance burden, so please update your code.

### Template Formatting

In version 0.3.0, the function `format_template()` has been replaced by the method `Binder.format_toml_template()`. The old function is still available for now.

## Changelog

### 0.1.0 - 2023-02-21:

- First open source release; thanks to my employer [Protix](https://protix.eu/) for making this possible

### 0.1.1 - 2023-02-22:

- Relax `Binder.bind()` argument type to `Mapping` ([#3](https://github.com/ProtixIT/dataclass-binder/issues/3))

### 0.1.2 - 2023-03-03:

- Fix `get()` and `[]` on object bound to read-only mapping ([#6](https://github.com/ProtixIT/dataclass-binder/issues/6))

### 0.1.3 - 2023-03-05:

- Ignore dataclass fields with `init=False` ([#2](https://github.com/ProtixIT/dataclass-binder/issues/2))

### 0.2.0 - 2023-06-26:

- Instantiate `Binder` instead of specializing it ([#14](https://github.com/ProtixIT/dataclass-binder/pull/14))
- Support `typing.Any` as a field annotation ([#10](https://github.com/ProtixIT/dataclass-binder/issues/10))
- Fix crash in `format_template()` on optional fields with non-string annotations ([#16](https://github.com/ProtixIT/dataclass-binder/pull/16))

### 0.3.0 - 2023-07-13:

- Replace `format_template()` function by `Binder.format_toml_template()` method ([#23](https://github.com/ProtixIT/dataclass-binder/pull/23))
- Format nested dataclasses as TOML tables ([#25](https://github.com/ProtixIT/dataclass-binder/pull/25))
- Format untyped mappings and sequences as TOML tables ([#27](https://github.com/ProtixIT/dataclass-binder/pull/27))
- Fix formatting of `init=False` field in nested dataclasses ([#22](https://github.com/ProtixIT/dataclass-binder/pull/22))
- Fix annotation evaluation on inherited dataclasses ([#21](https://github.com/ProtixIT/dataclass-binder/pull/21))

### 0.3.1 - 2023-07-17:

- Generate template in depth-first order ([#28](https://github.com/ProtixIT/dataclass-binder/pull/28))
- Fix binder creation and formatting for recursive dataclasses ([#28](https://github.com/ProtixIT/dataclass-binder/pull/28))

### 0.3.2 - 2023-07-27:

- Document fields with a `default_factory` as optional in template ([#35](https://github.com/ProtixIT/dataclass-binder/pull/35))
- Omit values from template that are formatted equally to the default ([#36](https://github.com/ProtixIT/dataclass-binder/pull/36))
- Require fields with `None` in their annotation to have `None` as their default ([#37](https://github.com/ProtixIT/dataclass-binder/pull/37))

### 0.3.3 - 2023-07-31:

- Add `Binder.format_toml()` method to generate more compact TOML that excludes unused optional parts ([#38](https://github.com/ProtixIT/dataclass-binder/pull/38))

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ProtixIT/dataclass-binder",
    "name": "dataclass-binder",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10,<4.0",
    "maintainer_email": "",
    "keywords": "dataclass,toml,bind,binding",
    "author": "Maarten ter Huurne",
    "author_email": "maarten.terhuurne@protix.eu",
    "download_url": "https://files.pythonhosted.org/packages/1f/c8/29e7370e257af7acf845eb21bc42015c713af1a7c2f1e996ab9616fab494/dataclass_binder-0.3.3.tar.gz",
    "platform": null,
    "description": "# Dataclass Binder\n\nLibrary to bind TOML data to Python dataclasses in a type-safe way.\n\n\n## Features\n\nCurrently it has the following properties that might set it apart from other data binding libraries:\n\n- requires Python 3.10+\n- relies only on dataclasses from the Python standard library\n- detailed error messages which mention location, expected data and actual data\n- strict parsing which considers unknown keys to be errors\n- support for durations (`timedelta`)\n- support for immutable (frozen) dataclasses\n- can bind data from files, I/O streams or pre-parsed dictionaries\n- can generate configuration templates from dataclass definitions\n\nThis library was originally designed for parsing configuration files.\nAs TOML's data model is very similar to JSON's, adding support for JSON in the future would be an option and would make the library useful for binding HTTP API requests.\n\n\n## Maturity\n\nThis library is fully type-checked, has unit tests which provide 100% branch coverage and is used in production, so it should be reliable.\n\nThe API might still change in incompatible ways until the 1.0 release.\nIn particular the following aspects are subject to change:\n\n- use of key suffixes for `timedelta`: this mechanism doesn't work for arrays\n- the handling of separators in keys: currently `-` in TOML is mapped to `_` in Python and `_` is forbidden in TOML; most applications seem to accept both `-` and `_` in TOML instead\n\n\n## Why Dataclasses?\n\nA typical TOML, JSON or YAML parser returns the parse results as a nested dictionary.\nYou might wonder why you would want to use a data binding library rather than just getting the values directly from that dictionary.\n\nLet's take the following example code for a service that connects to a database using a connection URL configured in a TOML file:\n\n```py\nimport tomllib  # or 'tomli' on Python <3.11\n\n\ndef read_config() -> dict:\n    with open(\"config.toml\", \"rb\") as f:\n        config = tomllib.load(f)\n    return config\n\ndef handle_request(config: dict) -> None:\n    url = config[\"database-url\"]\n    print(\"connect to database:\", url)\n\nconfig = read_config()\n...\nhandle_request(config)\n```\n\nIf the configuration is missing a `database-url` key or its value is not a string, this service would start up without complaints and then fail when the first requests comes in.\nIt would be better to instead check the configuration on startup, so let's add code for that:\n\n```py\ndef read_config():\n    with open(\"config.toml\", \"rb\") as f:\n        config = tomllib.load(f)\n\n    url = config[\"database-url\"]\n    if not isinstance(url, str):\n        raise TypeError(\n            f\"Value for 'database-url' has type '{type(url).__name__}', expected 'str'\"\n        )\n\n    return config\n```\n\nImagine you have 20 different configurable options: you'd need this code 20 times.\n\nNow let's assume that you use a type checker like `mypy`.\nInside `read_config()`, the type checker will know that `url` is a `str`, but if you fetch the same value elsewhere in the code, that information is lost:\n\n```py\ndef handle_request(config: dict) -> None:\n    url = config[\"database-url\"]\n    reveal_type(url)\n    print(\"connect to database:\", url)\n```\n\nWhen you run `mypy` on this code, it will output 'Revealed type is \"Any\"'.\nFalling back to `Any` means type checking will not be able to find type mismatches and autocomplete in an IDE will not work well either.\n\nDeclaring the desired type in a dataclass solves both these issues:\n- the type can be verified at runtime before instantiating the dataclass\n- tooling knows the type when you read the value from the dataclass\n\nHaving the dataclass as a central and formal place for defining the configuration format is also an advantage.\nFor example, it enables automatic generation of a documented configuration file template.\n\n\n## Usage\n\nThe `dataclass_binder` module contains the `Binder` class which makes it easy to bind TOML data, such as a configuration file, to Python [dataclasses](https://docs.python.org/3/library/dataclasses.html).\n\nThe binding is a two-step process:\n- instantiate the `Binder` class by passing your top-level dataclass as an argument\n- call the `parse_toml()` method, providing the path of the configuration file as its argument\n\nPut together, the code looks like this:\n\n```py\nimport logging\nimport sys\nfrom pathlib import Path\n\nfrom dataclass_binder import Binder\n\n\nlogger = logging.getLogger(__name__)\n\nif __name__ == \"__main__\":\n    config_file = Path(\"config.toml\")\n    try:\n        config = Binder(Config).parse_toml(config_file)\n    except Exception as ex:\n        logger.critical(\"Error reading configuration file '%s': %s\", config_file, ex)\n        sys.exit(1)\n```\n\n### Binding a Pre-parsed Dictionary\n\nIf you don't want to bind the contents of a full file, there is also the option to bind a pre-parsed dictionary instead.\nFor this, you can use the `bind()` method on the `Binder` object.\n\nFor example, the following service is configured by one table within a larger TOML configuration file:\n\n```py\nimport tomllib  # or 'tomli' on Python <3.11\nfrom dataclass_binder import Binder\n\n\nwith open(\"config.toml\", \"rb\") as f:\n    config = tomllib.load(f)\nservice_config = Binder(ServiceConfig).bind(config[\"service\"])\n```\n\nTo keep these examples short, from now on `import` statements will only be included the first time a particular imported name is used.\n\n### Basic Types\n\nDataclass fields correspond to TOML keys. In the dataclass, underscores are used as word separators, while dashes are used in the TOML file. Let's configure a service that listens on a TCP port for requests and stores its data in a database, using the following TOML fragment:\n\n```toml\ndatabase-url = 'postgresql://user:password@host/db'\nport = 8080\n```\n\nThis configuration can be bound to the following dataclass:\n\n```py\nfrom dataclasses import dataclass\n\n@dataclass\nclass Config:\n    database_url: str\n    port: int\n    verbose: bool\n```\n\nThe `float` type can be used to bind floating point numbers.\nSupport for `Decimal` is not there at the moment but would be relatively easy to add, as `tomllib`/`tomli` has an option for that.\n\n### Defaults\n\nFields can be made optional by assigning a default value. Using `None` as a default value is allowed too:\n\n```py\n@dataclass\nclass Config:\n    verbose: bool = False\n    webhook_url: str | None = None\n```\n\nIf you want to mix fields with and without defaults in any order, mark the fields as keyword-only:\n\n```py\n@dataclass(kw_only=True)\nclass Config:\n    database_url: str\n    verbose: bool = False\n    port: int\n```\n\n### Dates and Times\n\nTOML handles dates and timestamps as first-class values.\nDate, time and date+time TOML values are bound to `datetime.date`, `datetime.time` and `datetime.datetime` Python objects respectively.\n\nThere is also support for time intervals using `datetime.timedelta`:\n\n```py\nfrom datetime import timedelta\n\n@dataclass\nclass Config:\n    retry_after: timedelta\n    delete_after: timedelta\n```\n\nIntervals shorter than a day can be specified using a TOML time value.\nLonger intervals are supported by adding an `-hours`, `-days`, or `-weeks` suffix.\nOther supported suffixes are `-minutes`, `-seconds`, `-milliseconds` and `-microseconds`, but these are there for completeness sake and less likely to be useful.\nHere is an example TOML fragment corresponding to the dataclass above:\n\n```toml\nretry-after = 00:02:30\ndelete-after-days = 30\n```\n\n### Collections\n\nLists and dictionaries can be used to bind TOML arrays and tables.\nIf you want to make a `list` or `dict` optional, you need to provide a default value via the `default_factory` mechanism as usual, see the [dataclasses documentation](https://docs.python.org/3/library/dataclasses.html#mutable-default-values) for details.\n\n```py\nfrom dataclasses import dataclass, field\n\n@dataclass\nclass Config:\n    tags: list[str] = field(default_factory=list)\n    limits: dict[str, int]\n```\n\nThe dataclass above can be used to bind the following TOML fragment:\n\n```toml\ntags = [\"production\", \"development\"]\nlimits = {ram-gb = 1, disk-gb = 100}\n```\n\nAn alternative to `default_factory` is to use a homogeneous (single element type) tuple:\n\n```py\n@dataclass\nclass Config:\n    tags: tuple[str, ...] = ()\n    limits: dict[str, int]\n```\n\nHeterogeneous tuples are supported too: for example `tuple[str, bool]` binds a TOML array that must always have a string as its first element and a Boolean as its second and last element.\nIt is generally clearer though to define a separate dataclass when you need more than one value to configure something:\n\n```py\n@dataclass\nclass Webhook:\n    url: str\n    token: str\n\n@dataclass\nclass Config:\n    webhooks: tuple[Webhook, ...] = ()\n```\n\nThe extra keys (`url` and `token` in this example) provide the clarity:\n\n```toml\nwebhooks = [\n    {url = \"https://host1/notify\", token = \"12345\"},\n    {url = \"https://host2/hook\", token = \"frperg\"}\n]\n```\n\nTOML's array-of-tables syntax can make this configuration a bit easier on the eyes:\n\n```toml\n[[webhooks]]\nurl = \"https://host1/notify\"\ntoken = \"12345\"\n\n[[webhooks]]\nurl = \"https://host2/hook\"\ntoken = \"frperg\"\n```\n\nAlways define additional dataclasses at the module level in your Python code: if the class is for example defined inside a function, the `Binder` constructor will not be able to find it.\n\n### Untyped Data\n\nSometimes the full structure of the data you want to bind is either too complex or too much in flux to be worth fully annotating.\nIn such a situation, you can use `typing.Any` as the annotation to simply capture the output of Python's TOML parser without type-checking it.\n\nIn the following example, a service uses the Python standard library logging implementation, configured using the [configuration dictionary schema](https://docs.python.org/3/library/logging.config.html#logging-config-dictschema):\n\n```py\nimport logging.config\nfrom dataclasses import dataclass\nfrom typing import Any\n\nfrom dataclass_binder import Binder\n\n\n@dataclass\nclass Config:\n    database_url: str\n    logging: Any\n\n\ndef run(url: str) -> None:\n    logging.info(\"Service starting\")\n\n\nif __name__ == \"__main__\":\n    config = Binder[Config].parse_toml(\"service.toml\")\n    logging.config.dictConfig(config.logging)\n    run(config.database_url)\n```\n\nThe `service.toml` configuration file for this service could look like this:\n\n```toml\ndatabase-url = 'postgresql://user:password@host/db'\n\n[logging]\nversion = 1\n\n[logging.root]\nlevel = 'INFO'\nhandlers = ['file']\n\n[logging.handlers.file]\nclass = 'logging.handlers.RotatingFileHandler'\nfilename = 'service.log'\nformatter = 'simple'\n\n[logging.formatters.simple]\nformat = '%(asctime)s %(name)s %(levelname)s %(message)s'\n```\n\n### Plugins\n\nTo select plugins to activate, you can bind Python classes or modules using `type[BaseClass]` and `types.ModuleType` annotations respectively:\n\n```py\nfrom dataclasses import dataclass, field\nfrom types import ModuleType\n\nfrom supertool.plugins import PostProcessor\n\n\n@dataclass\nclass PluginConfig:\n    postprocessors = tuple[type[PostProcessor], ...] = ()\n    modules: dict[str, ModuleType] = field(default_factory=dict)\n```\n\nIn the TOML, you specify Python classes or modules using their fully qualified names:\n\n```toml\npostprocessors = [\"supertool_addons.reporters.JSONReport\"]\nmodules = {lint = \"supertool_addons.linter\"}\n```\n\nThere is no mechanism yet to add configuration to be used by the plugins.\n\n### Immutable\n\nIf you prefer immutable configuration objects, you can achieve that using the `frozen` argument of the `dataclass` decorator and using abstract collection types in the annotations. For example, the following dataclass will be instantiated with a `tuple` object for `tags` and an immutable dictionary view for `limits`:\n\n```py\nfrom collections.abc import Mapping, Sequence\n\n\n@dataclass(frozen=True)\nclass Config:\n    tags: Sequence[str] = ()\n    limits: Mapping[str, int]\n```\n\n### Layered Binding\n\n`Binder` can be instantiated from a dataclass object rather than the dataclass itself.\nThe dataclass object will provide new default values when binding data to it.\nThis can be used to implement a layered configuration parsing mechanism, where there is a default configuration that can be customized using a system-wide configuration file and/or a per-user configuration file:\n\n```py\nconfig = Config()\nif system_config_path.exists():\n    config = Binder(config).parse_toml(system_config_path)\nif user_config_path.exists():\n    config = Binder(config).parse_toml(user_config_path)\n```\n\nLater layers can override individual fields in nested dataclasses, allowing fine-grained configuration merging, but collections are replaced whole instead of merged.\n\n### Generating a Configuration Template\n\nTo provide users with a starting point for configuring your application/service, you can automatically generate a configuration template from the information in the dataclass.\n\nFor example, when the following dataclass defines your configuration:\n\n```py\n@dataclass\nclass Config:\n    database_url: str\n    \"\"\"The URL of the database to connect to.\"\"\"\n\n    port: int = 12345\n    \"\"\"TCP port on which to accept connections.\"\"\"\n```\n\nYou can generate a template configuration file using:\n\n```py\nfrom dataclass_binder import Binder\n\n\nfor line in Binder(Config).format_toml_template():\n    print(line)\n```\n\nWhich will print:\n\n```toml\n# The URL of the database to connect to.\n# Mandatory.\ndatabase-url = '???'\n\n# TCP port on which to accept connections.\n# Default:\n# port = 12345\n```\n\nIt is also possible to provide placeholder values by passing a dataclass instance rather than the dataclass itself to `format_toml_template()`:\n\n```py\nTEMPLATE = Config(\n    database_url=\"postgresql://<username>:<password>@<hostname>/<database name>\",\n    port=8080,\n)\n\nfor line in Binder(TEMPLATE).format_toml_template():\n    print(line)\n```\n\nWhich will print:\n\n```toml\n# The URL of the database to connect to.\n# Mandatory.\ndatabase-url = 'postgresql://<username>:<password>@<hostname>/<database name>'\n\n# TCP port on which to accept connections.\n# Default:\n# port = 12345\nport = 8080\n```\n\n### Generating a Compact Configuration File\n\nIf you want to generate a fully populated configuration, you can use the `format_toml()` method.\nCompared to the template formatting, this leaves out optional parts for which no data has been provided.\n\nFor example, when the following dataclass defines your configuration:\n\n```py\n@dataclass\nclass Config:\n    path: str\n    \"\"\"Path of input file.\"\"\"\n\n    verbose: bool = False\n    \"\"\"Be extra verbose in logging.\"\"\"\n\n    options: dict[str, Any] = field(default_factory=dict)\n    \"\"\"Various named options that are passed on to tool XYZ.\"\"\"\n```\n\nThis code generates a populated configuration file:\n\n```py\nconfig = Config(path=\"./data\")\n\nwith open(\"config.toml\", \"w\") as out:\n    for line in Binder(config).format_toml():\n        print(line, file=out)\n```\n\nWith the contents of `config.toml` containing only the `path` field:\n\n```toml\n# Path of input file.\npath = './data'\n```\n\n### Troubleshooting\n\nFinally, a troubleshooting tip: instead of the full `Binder(Config).parse_toml()`, first try to execute only `Binder(Config)`.\nIf that fails, the problem is in the dataclass definitions.\nIf that succeeds, but the `parse_toml()` call fails, the problem is that the TOML file does not match the format defined in the dataclasses.\n\n\n## Development Environment\n\n[Poetry](https://python-poetry.org/) is used to set up a virtual environment with all the dependencies and development tools that you need:\n\n    $ cd dataclass-binder\n    $ poetry install\n\nYou can activate a shell which contains the development tools in its search path:\n\n    $ poetry shell\n\nWe recommend setting up pre-commit hooks for Git in the `dataclass-binder` work area.\nThese hooks automatically run a few simple checks and cleanups when you create a new commit.\nAfter you first set up your virtual environment with Poetry, run this command to install the pre-commit hooks:\n\n    $ pre-commit install\n\n\n## Release Procedure\n\n- Verify that CI passes on the branch that you want to release (typically `main`)\n- Create a release on the GitHub web interface; name the tag `v<major>.<minor>.<patchlevel>`\n- After publishing the release on GitHub, the package will be built and published on PyPI automatically via Actions\n\n\n## Deprecations\n\n### Binder Specialization\n\nPrior to version 0.2.0, the `Binder` class was specialized using a type argument (`Binder[Config]`) rather than instantiation (`Binder(config)`). The old syntax is still supported for now, but the backwards compatibility might be removed in a minor release prior to 1.0 if it becomes a maintenance burden, so please update your code.\n\n### Template Formatting\n\nIn version 0.3.0, the function `format_template()` has been replaced by the method `Binder.format_toml_template()`. The old function is still available for now.\n\n## Changelog\n\n### 0.1.0 - 2023-02-21:\n\n- First open source release; thanks to my employer [Protix](https://protix.eu/) for making this possible\n\n### 0.1.1 - 2023-02-22:\n\n- Relax `Binder.bind()` argument type to `Mapping` ([#3](https://github.com/ProtixIT/dataclass-binder/issues/3))\n\n### 0.1.2 - 2023-03-03:\n\n- Fix `get()` and `[]` on object bound to read-only mapping ([#6](https://github.com/ProtixIT/dataclass-binder/issues/6))\n\n### 0.1.3 - 2023-03-05:\n\n- Ignore dataclass fields with `init=False` ([#2](https://github.com/ProtixIT/dataclass-binder/issues/2))\n\n### 0.2.0 - 2023-06-26:\n\n- Instantiate `Binder` instead of specializing it ([#14](https://github.com/ProtixIT/dataclass-binder/pull/14))\n- Support `typing.Any` as a field annotation ([#10](https://github.com/ProtixIT/dataclass-binder/issues/10))\n- Fix crash in `format_template()` on optional fields with non-string annotations ([#16](https://github.com/ProtixIT/dataclass-binder/pull/16))\n\n### 0.3.0 - 2023-07-13:\n\n- Replace `format_template()` function by `Binder.format_toml_template()` method ([#23](https://github.com/ProtixIT/dataclass-binder/pull/23))\n- Format nested dataclasses as TOML tables ([#25](https://github.com/ProtixIT/dataclass-binder/pull/25))\n- Format untyped mappings and sequences as TOML tables ([#27](https://github.com/ProtixIT/dataclass-binder/pull/27))\n- Fix formatting of `init=False` field in nested dataclasses ([#22](https://github.com/ProtixIT/dataclass-binder/pull/22))\n- Fix annotation evaluation on inherited dataclasses ([#21](https://github.com/ProtixIT/dataclass-binder/pull/21))\n\n### 0.3.1 - 2023-07-17:\n\n- Generate template in depth-first order ([#28](https://github.com/ProtixIT/dataclass-binder/pull/28))\n- Fix binder creation and formatting for recursive dataclasses ([#28](https://github.com/ProtixIT/dataclass-binder/pull/28))\n\n### 0.3.2 - 2023-07-27:\n\n- Document fields with a `default_factory` as optional in template ([#35](https://github.com/ProtixIT/dataclass-binder/pull/35))\n- Omit values from template that are formatted equally to the default ([#36](https://github.com/ProtixIT/dataclass-binder/pull/36))\n- Require fields with `None` in their annotation to have `None` as their default ([#37](https://github.com/ProtixIT/dataclass-binder/pull/37))\n\n### 0.3.3 - 2023-07-31:\n\n- Add `Binder.format_toml()` method to generate more compact TOML that excludes unused optional parts ([#38](https://github.com/ProtixIT/dataclass-binder/pull/38))\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Library to bind TOML data to Python dataclasses in a type-safe way.",
    "version": "0.3.3",
    "project_urls": {
        "Homepage": "https://github.com/ProtixIT/dataclass-binder",
        "Issue Tracker": "https://github.com/ProtixIT/dataclass-binder/issues",
        "Repository": "https://github.com/ProtixIT/dataclass-binder"
    },
    "split_keywords": [
        "dataclass",
        "toml",
        "bind",
        "binding"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aab9f93bb87287ab24e15b3ddc05bc681999fb5d239bfba83b725688bc705541",
                "md5": "d38500fd5f97f6109ff9b3bd410f87bb",
                "sha256": "5b6b6273a963a36d2af96cbed2661da8446071b83860c8fa09c15030ba0a577e"
            },
            "downloads": -1,
            "filename": "dataclass_binder-0.3.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d38500fd5f97f6109ff9b3bd410f87bb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10,<4.0",
            "size": 18546,
            "upload_time": "2023-07-31T13:46:54",
            "upload_time_iso_8601": "2023-07-31T13:46:54.197019Z",
            "url": "https://files.pythonhosted.org/packages/aa/b9/f93bb87287ab24e15b3ddc05bc681999fb5d239bfba83b725688bc705541/dataclass_binder-0.3.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1fc829e7370e257af7acf845eb21bc42015c713af1a7c2f1e996ab9616fab494",
                "md5": "78860c5c0b14e1328d3d05a87153927b",
                "sha256": "ebc7cf940b6fe1dbacb73b71ce0eebc1ef4da5db3f31d8468bc7ebfba6840726"
            },
            "downloads": -1,
            "filename": "dataclass_binder-0.3.3.tar.gz",
            "has_sig": false,
            "md5_digest": "78860c5c0b14e1328d3d05a87153927b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10,<4.0",
            "size": 26319,
            "upload_time": "2023-07-31T13:46:55",
            "upload_time_iso_8601": "2023-07-31T13:46:55.797430Z",
            "url": "https://files.pythonhosted.org/packages/1f/c8/29e7370e257af7acf845eb21bc42015c713af1a7c2f1e996ab9616fab494/dataclass_binder-0.3.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-31 13:46:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ProtixIT",
    "github_project": "dataclass-binder",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "dataclass-binder"
}
        
Elapsed time: 0.09636s