pyyaml-include


Namepyyaml-include JSON
Version 2.2 PyPI version JSON
download
home_pageNone
SummaryAn extending constructor of PyYAML: include other YAML files into current YAML document
upload_time2024-11-09 09:36:16
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseGPLv3+
keywords yaml pyyaml include yml
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # pyyaml-include

[![GitHub tag](https://img.shields.io/github/tag/tanbro/pyyaml-include.svg)](https://github.com/tanbro/pyyaml-include)
[![Python Package](https://github.com/tanbro/pyyaml-include/workflows/Python%20package/badge.svg)](https://github.com/tanbro/pyyaml-include/actions?query=workflow%3A%22Python+package%22)
[![Documentation Status](https://readthedocs.org/projects/pyyaml-include/badge/?version=latest)](https://pyyaml-include.readthedocs.io/en/latest/)
[![PyPI](https://img.shields.io/pypi/v/pyyaml-include.svg)](https://pypi.org/project/pyyaml-include/)
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=tanbro_pyyaml-include&metric=alert_status)](https://sonarcloud.io/dashboard?id=tanbro_pyyaml-include)

An extending constructor of [PyYAML][]: include other [YAML][] files into current [YAML][] document.

In version `2.0`, [fsspec][] was introduced. With it, we can even include files by HTTP, SFTP, S3 ...

> ⚠️ **Warning** \
> “pyyaml-include” `2.0` is **NOT compatible** with `1.0`

## Install

```bash
pip install "pyyaml-include"
```

Because [fsspec][] was introduced to open the including files since v2.0, an installation can be performed like below, if want to open remote files:

- for files on website:

  ```bash
  pip install "pyyaml-include" fsspec[http]
  ```

- for files on S3:

  ```bash
  pip install "pyyaml-include" fsspec[s3]
  ```

- see [fsspec][]'s documentation for more

> 🔖 **Tip** \
> “pyyaml-include” depends on [fsspec][], it will be installed no matter including local or remote files.

## Basic usages

Consider we have such [YAML][] files:

```
├── 0.yml
└── include.d
    ├── 1.yml
    └── 2.yml
```

- `1.yml` 's content:

  ```yaml
  name: "1"
  ```

- `2.yml` 's content:

  ```yaml
  name: "2"
  ```

To include `1.yml`, `2.yml` in `0.yml`, we shall:

1. Register a `yaml_include.Constructor` to [PyYAML][]'s loader class, with `!inc`(or any other tags start with `!` character) as it's tag:

   ```python
   import yaml
   import yaml_include

   # add the tag
   yaml.add_constructor("!inc", yaml_include.Constructor(base_dir='/your/conf/dir'))
   ```

1. Use `!inc` tag(s) in `0.yaml`:

   ```yaml
   file1: !inc include.d/1.yml
   file2: !inc include.d/2.yml
   ```

1. Load `0.yaml` in your Python program

   ```python
   with open('0.yml') as f:
      data = yaml.full_load(f)
   print(data)
   ```

   we'll get:

   ```python
   {'file1': {'name': '1'}, 'file2': {'name': '2'}}
   ```

1. (optional) the constructor can be unregistered:

   ```python
   del yaml.Loader.yaml_constructors["!inc"]
   del yaml.UnSafeLoader.yaml_constructors["!inc"]
   del yaml.FullLoader.yaml_constructors["!inc"]
   ```

### Include in Mapping

If `0.yml` was:

```yaml
file1: !inc include.d/1.yml
file2: !inc include.d/2.yml
```

We'll get:

```yaml
file1:
  name: "1"
file2:
  name: "2"
```

### Include in Sequence

If `0.yml` was:

```yaml
files:
  - !inc include.d/1.yml
  - !inc include.d/2.yml
```

We'll get:

```yaml
files:
  - name: "1"
  - name: "2"
```

## Advanced usages

### Wildcards

File name can contain shell-style wildcards. Data loaded from the file(s) found by wildcards will be set in a sequence.

That is, a list will be returned when including file name contains wildcards.
Length of the returned list equals number of matched files:

If `0.yml` was:

```yaml
files: !inc include.d/*.yml
```

We'll get:

```yaml
files:
  - name: "1"
  - name: "2"
```

- when only 1 file matched, length of list will be 1
- when there are no files matched, an empty list will be returned

We support `**`, `?` and `[..]`. We do not support `^` for pattern negation.
The `maxdepth` option is applied on the first `**` found in the path.

> ❗ **Important**
>
> - Using the `**` pattern in large directory trees or remote file system (S3, HTTP ...) may consume an inordinate amount of time.
> - There is no method like lazy-load or iteration, all data of found files returned to the YAML doc-tree are fully loaded in memory, large amount of memory may be needed if there were many or big files.

### Work with fsspec

In `v2.0`, we use [fsspec][] to open including files, thus we can include files from many different sources, such as local file system, S3, HTTP, SFTP ...

For example, we can include a file from website in YAML:

```yaml
conf:
  logging: !inc http://domain/etc/app/conf.d/logging.yml
```

In such situations, when creating a `Constructor` constructor, a [fsspec][] filesystem object shall be set to `fs` argument.

For example, if want to include files from website, we shall:

1. create a `Constructor` with a [fsspec][] HTTP filesystem object as it's `fs`:

   ```python
   import yaml
   import fsspec
   import yaml_include

   http_fs = fsspec.filesystem("http", client_kwargs={"base_url": f"http://{HOST}:{PORT}"})

   ctor = yaml_include.Constructor(fs=http_fs, base_dir="/foo/baz")
   yaml.add_constructor("!inc", ctor, yaml.Loader)
   ```

1. then, write a [YAML][] document to include files from `http://${HOST}:${PORT}`:

   ```yaml
   key1: !inc doc1.yml    # relative path to "base_dir"
   key2: !inc ./doc2.yml  # relative path to "base_dir" also
   key3: !inc /doc3.yml   # absolute path, "base_dir" does not affect
   key3: !inc ../doc4.yml # relative path one level upper to "base_dir"
   ```

1. load it with [PyYAML][]:

   ```python
   yaml.load(yaml_string, yaml.Loader)
   ```

Above [YAML][] snippet will be loaded like:

- `key1`: pared YAML of `http://${HOST}:${PORT}/foo/baz/doc1.yml`
- `key2`: pared YAML of `http://${HOST}:${PORT}/foo/baz/doc2.yml`
- `key3`: pared YAML of `http://${HOST}:${PORT}/doc3.yml`
- `key4`: pared YAML of `http://${HOST}:${PORT}/foo/doc4.yml`

> 🔖 **Tip** \
> Check [fsspec][]'s documentation for more

---

> ℹ️ **Note** \
> If `fs` argument is omitted, a `"file"`/`"local"` [fsspec][] filesystem object will be used automatically. That is to say:
>
> ```yaml
> data: !inc: foo/baz.yaml
> ```
>
> is equivalent to (if no `base_dir` was set in `Constructor()`):
>
> ```yaml
> data: !inc: file://foo/baz.yaml
> ```
>
> and
>
> ```python
> yaml.add_constructor("!inc", Constructor())
> ```
>
> is equivalent to:
>
> ```python
> yaml.add_constructor("!inc", Constructor(fs=fsspec.filesystem("file")))
> ```

### Parameters in YAML

As a callable object, `Constructor` passes YAML tag parameters to [fsspec][] for more detailed operations.

The first argument is `urlpath`, it's fixed and must-required, either positional or named.
Normally, we put it as a string after the tag(eg: `!inc`), just like examples above.

However, there are more parameters.

- in a sequence way, parameters will be passed to python as positional arguments, like `*args` in python function. eg:

  ```yaml
  files: !inc [include.d/**/*.yaml, {maxdepth: 1}, {encoding: utf16}]
  ```

- in a mapping way, parameters will be passed to python as named arguments, like `**kwargs` in python function. eg:

  ```yaml
  files: !inc {urlpath: /foo/baz.yaml, encoding: utf16}
  ```

But the format of parameters has multiple cases, and differs variably in different [fsspec][] implementation backends.

- If a scheme/protocol(“`http://`”, “`sftp://`”, “`file://`”, etc.) is defined, and there is no wildcard in `urlpath`, `Constructor` will invoke [`fsspec.open`](https://filesystem-spec.readthedocs.io/en/stable/api.html#fsspec.open) directly to open it. Which means `Constructor`'s `fs` will be ignored, and a new standalone `fs` will be created implicitly.

  In this situation, `urlpath` will be passed to `fsspec.open`'s first argument, and all other parameters will also be passed to the function.

  For example,

  - the [YAML][] snippet

    ```yaml
    files: !inc [file:///foo/baz.yaml, r]
    ```

    will cause python code like

    ```python
    with fsspec.open("file:///foo/baz.yaml", "r") as f:
        yaml.load(f, Loader)
    ```

  - and the [YAML][] snippet

    ```yaml
    files: !inc {urlpath: file:///foo/baz.yaml, encoding: utf16}
    ```

    will cause python code like

    ```python
    with fsspec.open("file:///foo/baz.yaml", encoding="utf16") as f:
        yaml.load(f, Loader)
    ```

- If `urlpath` has wildcard, and also scheme in it, `Constructor` will:

  Invoke [fsspec][]'s [`open_files`](https://filesystem-spec.readthedocs.io/en/stable/api.html#fsspec.open_files) function to search, open and load files, and return the results in a list. [YAML][] include statement's parameters are passed to `open_files` function.

- If `urlpath` has wildcard, and no scheme in it, `Constructor` will:

  1. invoke corresponding [fsspec][] implementation backend's [`glob`](https://filesystem-spec.readthedocs.io/en/stable/api.html#fsspec.spec.AbstractFileSystem.glob) method to search files,
  1. then call [`open`](https://filesystem-spec.readthedocs.io/en/stable/api.html#fsspec.spec.AbstractFileSystem.open) method to open each found file(s).

  `urlpath` will be passed as the first argument to both `glob` and `open` method of the corresponding [fsspec][] implementation backend, and other parameters will also be passed to `glob` and `open` method as their following arguments.

  In the case of wildcards, what need to pay special attention to is that there are **two separated parameters** after `urlpath`, the first is for `glob` method, and the second is for `open` method. Each of them could be either sequence, mapping or scalar, corresponds single, positional and named argument(s) in python. For example:

  - If we want to include every `.yml` file in directory `etc/app` recursively with max depth at 2, and open them in utf-16 codec, we shall write the [YAML][] as below:

    ```yaml
    files: !inc ["etc/app/**/*.yml", {maxdepth: !!int "2"}, {encoding: utf16}]
    ```

    it will cause python code like:

    ```python
    for file in local_fs.glob("etc/app/**/*.yml", maxdepth=2):
        with local_fs.open(file, encoding="utf16") as f:
            yaml.load(f, Loader)
    ```

  - Since `maxdepth` is the seconde argument after `path` in `glob` method, we can also write the [YAML][] like this:

    ```yaml
    files: !inc ["etc/app/**/*.yml", [!!int "2"]]
    ```

    The parameters for `open` is omitted, means no more arguments except `urlpath` is passed.

    it will cause python code like:

    ```python
    for file in local_fs.glob("etc/app/**/*.yml", 2):
        with local_fs.open(file) as f:
            yaml.load(f, Loader)
    ```

  - The two parameters can be in a mapping form, and name of the keys are `"glob"` and `"open"`. for example:

    ```yaml
    files: !inc {urlpath: "etc/app/**/*.yml", glob: [!!int "2"], open: {encoding: utf16}}
    ```

  > ❗ **Important** \
  > [PyYAML][] sometimes takes scalar parameter of custom constructor as string, we can use a ‘Standard YAML tag’ to ensure non-string data type in the situation.
  >
  > For example, following [YAML][] snippet may cause an error:
  >
  > ```yaml
  > files: !inc ["etc/app/**/*.yml", open: {intParam: 1}]
  > ```
  >
  > Because [PyYAML][] treats `{"intParam": 1}` as `{"intParam": "1"}`, which makes python code like `fs.open(path, intParam="1")`. To prevent this, we shall write the [YAML][] like:
  >
  > ```yaml
  > files: !inc ["etc/app/**/*.yml", open: {intParam: !!int 1}]
  > ```
  >
  > where `!!int` is a ‘Standard YAML tag’ to force integer type of `maxdepth` argument.
  >
  > > ℹ️ **Note** \
  > > `BaseLoader`, `SafeLoader`, `CBaseLoader`, `CSafeLoader` do **NOT** support ‘Standard YAML tag’.
  > ---
  > > 🔖 **Tip** \
  > > `maxdepth` argument of [fsspec][] `glob` method is already force converted by `Constructor`, no need to write a `!!int` tag on it.

- Else, `Constructor` will invoke corresponding [fsspec][] implementation backend's [`open`](https://filesystem-spec.readthedocs.io/en/stable/api.html#fsspec.spec.AbstractFileSystem.open) method to open the file, parameters beside `urlpath` will be passed to the method.

### Absolute and Relative URL/Path

When the path after include tag (eg: `!inc`) is not a full protocol/scheme URL and not starts with `"/"`, `Constructor` tries to join the path with `base_dir`, which is a argument of `Constructor.__init__()`.
If `base_dir` is omitted or `None`, the actually including file path is the path in defined in [YAML][] without a change, and different [fsspec][] filesystem will treat them differently. In local filesystem, it will be `cwd`.

For remote filesystem, `HTTP` for example, the `base_dir` can not be `None` and usually be set to `"/"`.

Relative path does not support full protocol/scheme URL format, `base_dir` does not effect for that.

For example, if we register such a `Constructor` to [PyYAML][]:

```python
import yaml
import fsspec
import yaml_include

yaml.add_constructor(
    "!http-include",
    yaml_include.Constructor(
        fsspec.filesystem("http", client_kwargs={"base_url": f"http://{HOST}:{PORT}"}),
        base_dir="/sub_1/sub_1_1"
    )
)
```

then, load following [YAML][]:

```yaml
xyz: !http-include xyz.yml
```

the actual URL to access is `http://$HOST:$PORT/sub_1/sub_1_1/xyz.yml`

### Flatten sequence object in multiple matched files

Consider we have such a YAML:

```yaml
items: !include "*.yaml"
```

If every file matches `*.yaml` contains a sequence object at the top level in it, what parsed and loaded will be:

```yaml
items: [
    [item 0 of 1st file, item 1 of 1st file, ... , item n of 1st file, ...],
    [item 0 of 2nd file, item 1 of 2nd file, ... , item n of 2nd file, ...],
    # ....
    [item 0 of nth file, item 1 of nth file, ... , item n of nth file, ...],
    # ...
]
```

It's a 2-dim array, because YAML content of each matched file is treated as a member of the list(sequence).

But if `flatten` parameter was set to `true`, like:

```yaml
items: !include {urlpath: "*.yaml", flatten: true}
```

we'll get:

```yaml
items: [
    item 0 of 1st file, item 1 of 1st file, ... , item n of 1st file,  # ...
    item 0 of 2nd file, item 1 of 2nd file, ... , item n of 2nd file,  # ...
    # ....
    item 0 of n-th file, item 1 of n-th file, ... , item n of n-th file,  # ...
    # ...
]
```

> ℹ️ **Note**
>
> - Only available when multiple files were matched.
> - **Every matched file should have a Sequence object in its top level**, or a `TypeError` exception may be thrown.

### Serialization

When load [YAML][] string with include statement, the including files are parsed into python objects by default. That is, if we call `yaml.dump()` on the object, what dumped is the parsed python object, and can not serialize the include statement itself.

To serialize the statement, we shall first create an `yaml_include.Constructor` object whose **`autoload` attribute is `False`**:

```python
import yaml
import yaml_include

ctor = yaml_include.Constructor(autoload=False)
```

then add both Constructor for Loader and Representer for Dumper:

```python
yaml.add_constructor("!inc", ctor)

rpr = yaml_include.Representer("inc")
yaml.add_representer(yaml_include.Data, rpr)
```

Now, the including files will not be loaded when call `yaml.load()`, and `yaml_include.Data` objects will be placed at the positions where include statements are.

continue above code:

```python
yaml_str = """
- !inc include.d/1.yaml
- !inc include.d/2.yaml
"""

d0 = yaml.load(yaml_str, yaml.Loader)
# Here, "include.d/1.yaml" and "include.d/2.yaml" not be opened or loaded.
# d0 is like:
# [Data(urlpath="include.d/1.yaml"), Data(urlpath="include.d/2.yaml")]

# serialize d0
s = yaml.dump(d0)
print(s)
# ‘s’ will be:
# - !inc 'include.d/1.yaml'
# - !inc 'include.d/2.yaml'

# de-serialization
ctor.autoload = True # re-open auto load
# then load, the file "include.d/1.yaml" and "include.d/2.yaml" will be opened and loaded.
d1 = yaml.load(s, yaml.Loader)

# Or perform a recursive opening / parsing on the object:
d2 = yaml_include.load(d0) # d2 is equal to d1
```

`autoload` can be used in a `with` statement:

```python
ctor = yaml_include.Constructor()
# autoload is True here

with ctor.managed_autoload(False):
    # temporary set autoload to False
    yaml.full_load(YAML_TEXT)
# autoload restore True automatic
```

### Include JSON or TOML

We can include files in different format other than [YAML][], like [JSON][] or [TOML][] -- ``custom_loader`` is for that.

> 📑 **Example** \
> For example:
>
> ```python
> import json
> import tomllib as toml
> import yaml
> import yaml_include
>
> # Define loader function
> def my_loader(urlpath, file, Loader):
>     if urlpath.endswith(".json"):
>         return json.load(file)
>     if urlpath.endswith(".toml"):
>         return toml.load(file)
>     return yaml.load(file, Loader)
>
> # Create the include constructor, with the custom loader
> ctor = yaml_include.Constructor(custom_loader=my_loader)
>
> # Add the constructor to YAML Loader
> yaml.add_constructor("!inc", ctor, yaml.Loader)
>
> # Then, json files will can be loaded by std-lib's json module, and the same to toml files.
> s = """
> json: !inc "*.json"
> toml: !inc "*.toml"
> yaml: !inc "*.yaml"
> """
>
> yaml.load(s, yaml.Loader)
> ```

## Develop

1. clone the repo:

   ```bash
   git clone https://github.com/tanbro/pyyaml-include.git
   cd pyyaml-include
   ```

1. create then activate a python virtual-env:

   ```bash
   python -m venv .venv
   .venv/bin/activate
   ```

1. install development requirements and the project itself in editable mode:

   ```bash
   pip install -r requirements.txt
   ```

Now you can work on it.

## Test

read: `tests/README.md`

[YAML]: http://yaml.org/ "YAML: YAML Ain't Markup Language™"
[PyYaml]: https://pypi.org/project/PyYAML/ "PyYAML is a full-featured YAML framework for the Python programming language."
[fsspec]: https://github.com/fsspec/filesystem_spec/ "Filesystem Spec (fsspec) is a project to provide a unified pythonic interface to local, remote and embedded file systems and bytes storage."
[JSON]: https://json.io/ "JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write"
[TOML]: https://toml.io/ "TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics."

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pyyaml-include",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "yaml, PyYAML, include, yml",
    "author": null,
    "author_email": "liu xue yan <liu_xue_yan@foxmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/0f/8c/4bdc1bd9676e9eb49237b3750562e9794b7585281448909fa1837c92ca27/pyyaml_include-2.2.tar.gz",
    "platform": null,
    "description": "# pyyaml-include\n\n[![GitHub tag](https://img.shields.io/github/tag/tanbro/pyyaml-include.svg)](https://github.com/tanbro/pyyaml-include)\n[![Python Package](https://github.com/tanbro/pyyaml-include/workflows/Python%20package/badge.svg)](https://github.com/tanbro/pyyaml-include/actions?query=workflow%3A%22Python+package%22)\n[![Documentation Status](https://readthedocs.org/projects/pyyaml-include/badge/?version=latest)](https://pyyaml-include.readthedocs.io/en/latest/)\n[![PyPI](https://img.shields.io/pypi/v/pyyaml-include.svg)](https://pypi.org/project/pyyaml-include/)\n[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=tanbro_pyyaml-include&metric=alert_status)](https://sonarcloud.io/dashboard?id=tanbro_pyyaml-include)\n\nAn extending constructor of [PyYAML][]: include other [YAML][] files into current [YAML][] document.\n\nIn version `2.0`, [fsspec][] was introduced. With it, we can even include files by HTTP, SFTP, S3 ...\n\n> \u26a0\ufe0f **Warning** \\\n> \u201cpyyaml-include\u201d `2.0` is **NOT compatible** with `1.0`\n\n## Install\n\n```bash\npip install \"pyyaml-include\"\n```\n\nBecause [fsspec][] was introduced to open the including files since v2.0, an installation can be performed like below, if want to open remote files:\n\n- for files on website:\n\n  ```bash\n  pip install \"pyyaml-include\" fsspec[http]\n  ```\n\n- for files on S3:\n\n  ```bash\n  pip install \"pyyaml-include\" fsspec[s3]\n  ```\n\n- see [fsspec][]'s documentation for more\n\n> \ud83d\udd16 **Tip** \\\n> \u201cpyyaml-include\u201d depends on [fsspec][], it will be installed no matter including local or remote files.\n\n## Basic usages\n\nConsider we have such [YAML][] files:\n\n```\n\u251c\u2500\u2500 0.yml\n\u2514\u2500\u2500 include.d\n    \u251c\u2500\u2500 1.yml\n    \u2514\u2500\u2500 2.yml\n```\n\n- `1.yml` 's content:\n\n  ```yaml\n  name: \"1\"\n  ```\n\n- `2.yml` 's content:\n\n  ```yaml\n  name: \"2\"\n  ```\n\nTo include `1.yml`, `2.yml` in `0.yml`, we shall:\n\n1. Register a `yaml_include.Constructor` to [PyYAML][]'s loader class, with `!inc`(or any other tags start with `!` character) as it's tag:\n\n   ```python\n   import yaml\n   import yaml_include\n\n   # add the tag\n   yaml.add_constructor(\"!inc\", yaml_include.Constructor(base_dir='/your/conf/dir'))\n   ```\n\n1. Use `!inc` tag(s) in `0.yaml`:\n\n   ```yaml\n   file1: !inc include.d/1.yml\n   file2: !inc include.d/2.yml\n   ```\n\n1. Load `0.yaml` in your Python program\n\n   ```python\n   with open('0.yml') as f:\n      data = yaml.full_load(f)\n   print(data)\n   ```\n\n   we'll get:\n\n   ```python\n   {'file1': {'name': '1'}, 'file2': {'name': '2'}}\n   ```\n\n1. (optional) the constructor can be unregistered:\n\n   ```python\n   del yaml.Loader.yaml_constructors[\"!inc\"]\n   del yaml.UnSafeLoader.yaml_constructors[\"!inc\"]\n   del yaml.FullLoader.yaml_constructors[\"!inc\"]\n   ```\n\n### Include in Mapping\n\nIf `0.yml` was:\n\n```yaml\nfile1: !inc include.d/1.yml\nfile2: !inc include.d/2.yml\n```\n\nWe'll get:\n\n```yaml\nfile1:\n  name: \"1\"\nfile2:\n  name: \"2\"\n```\n\n### Include in Sequence\n\nIf `0.yml` was:\n\n```yaml\nfiles:\n  - !inc include.d/1.yml\n  - !inc include.d/2.yml\n```\n\nWe'll get:\n\n```yaml\nfiles:\n  - name: \"1\"\n  - name: \"2\"\n```\n\n## Advanced usages\n\n### Wildcards\n\nFile name can contain shell-style wildcards. Data loaded from the file(s) found by wildcards will be set in a sequence.\n\nThat is, a list will be returned when including file name contains wildcards.\nLength of the returned list equals number of matched files:\n\nIf `0.yml` was:\n\n```yaml\nfiles: !inc include.d/*.yml\n```\n\nWe'll get:\n\n```yaml\nfiles:\n  - name: \"1\"\n  - name: \"2\"\n```\n\n- when only 1 file matched, length of list will be 1\n- when there are no files matched, an empty list will be returned\n\nWe support `**`, `?` and `[..]`. We do not support `^` for pattern negation.\nThe `maxdepth` option is applied on the first `**` found in the path.\n\n> \u2757 **Important**\n>\n> - Using the `**` pattern in large directory trees or remote file system (S3, HTTP ...) may consume an inordinate amount of time.\n> - There is no method like lazy-load or iteration, all data of found files returned to the YAML doc-tree are fully loaded in memory, large amount of memory may be needed if there were many or big files.\n\n### Work with fsspec\n\nIn `v2.0`, we use [fsspec][] to open including files, thus we can include files from many different sources, such as local file system, S3, HTTP, SFTP ...\n\nFor example, we can include a file from website in YAML:\n\n```yaml\nconf:\n  logging: !inc http://domain/etc/app/conf.d/logging.yml\n```\n\nIn such situations, when creating a `Constructor` constructor, a [fsspec][] filesystem object shall be set to `fs` argument.\n\nFor example, if want to include files from website, we shall:\n\n1. create a `Constructor` with a [fsspec][] HTTP filesystem object as it's `fs`:\n\n   ```python\n   import yaml\n   import fsspec\n   import yaml_include\n\n   http_fs = fsspec.filesystem(\"http\", client_kwargs={\"base_url\": f\"http://{HOST}:{PORT}\"})\n\n   ctor = yaml_include.Constructor(fs=http_fs, base_dir=\"/foo/baz\")\n   yaml.add_constructor(\"!inc\", ctor, yaml.Loader)\n   ```\n\n1. then, write a [YAML][] document to include files from `http://${HOST}:${PORT}`:\n\n   ```yaml\n   key1: !inc doc1.yml    # relative path to \"base_dir\"\n   key2: !inc ./doc2.yml  # relative path to \"base_dir\" also\n   key3: !inc /doc3.yml   # absolute path, \"base_dir\" does not affect\n   key3: !inc ../doc4.yml # relative path one level upper to \"base_dir\"\n   ```\n\n1. load it with [PyYAML][]:\n\n   ```python\n   yaml.load(yaml_string, yaml.Loader)\n   ```\n\nAbove [YAML][] snippet will be loaded like:\n\n- `key1`: pared YAML of `http://${HOST}:${PORT}/foo/baz/doc1.yml`\n- `key2`: pared YAML of `http://${HOST}:${PORT}/foo/baz/doc2.yml`\n- `key3`: pared YAML of `http://${HOST}:${PORT}/doc3.yml`\n- `key4`: pared YAML of `http://${HOST}:${PORT}/foo/doc4.yml`\n\n> \ud83d\udd16 **Tip** \\\n> Check [fsspec][]'s documentation for more\n\n---\n\n> \u2139\ufe0f **Note** \\\n> If `fs` argument is omitted, a `\"file\"`/`\"local\"` [fsspec][] filesystem object will be used automatically. That is to say:\n>\n> ```yaml\n> data: !inc: foo/baz.yaml\n> ```\n>\n> is equivalent to (if no `base_dir` was set in `Constructor()`):\n>\n> ```yaml\n> data: !inc: file://foo/baz.yaml\n> ```\n>\n> and\n>\n> ```python\n> yaml.add_constructor(\"!inc\", Constructor())\n> ```\n>\n> is equivalent to:\n>\n> ```python\n> yaml.add_constructor(\"!inc\", Constructor(fs=fsspec.filesystem(\"file\")))\n> ```\n\n### Parameters in YAML\n\nAs a callable object, `Constructor` passes YAML tag parameters to [fsspec][] for more detailed operations.\n\nThe first argument is `urlpath`, it's fixed and must-required, either positional or named.\nNormally, we put it as a string after the tag(eg: `!inc`), just like examples above.\n\nHowever, there are more parameters.\n\n- in a sequence way, parameters will be passed to python as positional arguments, like `*args` in python function. eg:\n\n  ```yaml\n  files: !inc [include.d/**/*.yaml, {maxdepth: 1}, {encoding: utf16}]\n  ```\n\n- in a mapping way, parameters will be passed to python as named arguments, like `**kwargs` in python function. eg:\n\n  ```yaml\n  files: !inc {urlpath: /foo/baz.yaml, encoding: utf16}\n  ```\n\nBut the format of parameters has multiple cases, and differs variably in different [fsspec][] implementation backends.\n\n- If a scheme/protocol(\u201c`http://`\u201d, \u201c`sftp://`\u201d, \u201c`file://`\u201d, etc.) is defined, and there is no wildcard in `urlpath`, `Constructor` will invoke [`fsspec.open`](https://filesystem-spec.readthedocs.io/en/stable/api.html#fsspec.open) directly to open it. Which means `Constructor`'s `fs` will be ignored, and a new standalone `fs` will be created implicitly.\n\n  In this situation, `urlpath` will be passed to `fsspec.open`'s first argument, and all other parameters will also be passed to the function.\n\n  For example,\n\n  - the [YAML][] snippet\n\n    ```yaml\n    files: !inc [file:///foo/baz.yaml, r]\n    ```\n\n    will cause python code like\n\n    ```python\n    with fsspec.open(\"file:///foo/baz.yaml\", \"r\") as f:\n        yaml.load(f, Loader)\n    ```\n\n  - and the [YAML][] snippet\n\n    ```yaml\n    files: !inc {urlpath: file:///foo/baz.yaml, encoding: utf16}\n    ```\n\n    will cause python code like\n\n    ```python\n    with fsspec.open(\"file:///foo/baz.yaml\", encoding=\"utf16\") as f:\n        yaml.load(f, Loader)\n    ```\n\n- If `urlpath` has wildcard, and also scheme in it, `Constructor` will:\n\n  Invoke [fsspec][]'s [`open_files`](https://filesystem-spec.readthedocs.io/en/stable/api.html#fsspec.open_files) function to search, open and load files, and return the results in a list. [YAML][] include statement's parameters are passed to `open_files` function.\n\n- If `urlpath` has wildcard, and no scheme in it, `Constructor` will:\n\n  1. invoke corresponding [fsspec][] implementation backend's [`glob`](https://filesystem-spec.readthedocs.io/en/stable/api.html#fsspec.spec.AbstractFileSystem.glob) method to search files,\n  1. then call [`open`](https://filesystem-spec.readthedocs.io/en/stable/api.html#fsspec.spec.AbstractFileSystem.open) method to open each found file(s).\n\n  `urlpath` will be passed as the first argument to both `glob` and `open` method of the corresponding [fsspec][] implementation backend, and other parameters will also be passed to `glob` and `open` method as their following arguments.\n\n  In the case of wildcards, what need to pay special attention to is that there are **two separated parameters** after `urlpath`, the first is for `glob` method, and the second is for `open` method. Each of them could be either sequence, mapping or scalar, corresponds single, positional and named argument(s) in python. For example:\n\n  - If we want to include every `.yml` file in directory `etc/app` recursively with max depth at 2, and open them in utf-16 codec, we shall write the [YAML][] as below:\n\n    ```yaml\n    files: !inc [\"etc/app/**/*.yml\", {maxdepth: !!int \"2\"}, {encoding: utf16}]\n    ```\n\n    it will cause python code like:\n\n    ```python\n    for file in local_fs.glob(\"etc/app/**/*.yml\", maxdepth=2):\n        with local_fs.open(file, encoding=\"utf16\") as f:\n            yaml.load(f, Loader)\n    ```\n\n  - Since `maxdepth` is the seconde argument after `path` in `glob` method, we can also write the [YAML][] like this:\n\n    ```yaml\n    files: !inc [\"etc/app/**/*.yml\", [!!int \"2\"]]\n    ```\n\n    The parameters for `open` is omitted, means no more arguments except `urlpath` is passed.\n\n    it will cause python code like:\n\n    ```python\n    for file in local_fs.glob(\"etc/app/**/*.yml\", 2):\n        with local_fs.open(file) as f:\n            yaml.load(f, Loader)\n    ```\n\n  - The two parameters can be in a mapping form, and name of the keys are `\"glob\"` and `\"open\"`. for example:\n\n    ```yaml\n    files: !inc {urlpath: \"etc/app/**/*.yml\", glob: [!!int \"2\"], open: {encoding: utf16}}\n    ```\n\n  > \u2757 **Important** \\\n  > [PyYAML][] sometimes takes scalar parameter of custom constructor as string, we can use a \u2018Standard YAML tag\u2019 to ensure non-string data type in the situation.\n  >\n  > For example, following [YAML][] snippet may cause an error:\n  >\n  > ```yaml\n  > files: !inc [\"etc/app/**/*.yml\", open: {intParam: 1}]\n  > ```\n  >\n  > Because [PyYAML][] treats `{\"intParam\": 1}` as `{\"intParam\": \"1\"}`, which makes python code like `fs.open(path, intParam=\"1\")`. To prevent this, we shall write the [YAML][] like:\n  >\n  > ```yaml\n  > files: !inc [\"etc/app/**/*.yml\", open: {intParam: !!int 1}]\n  > ```\n  >\n  > where `!!int` is a \u2018Standard YAML tag\u2019 to force integer type of `maxdepth` argument.\n  >\n  > > \u2139\ufe0f **Note** \\\n  > > `BaseLoader`, `SafeLoader`, `CBaseLoader`, `CSafeLoader` do **NOT** support \u2018Standard YAML tag\u2019.\n  > ---\n  > > \ud83d\udd16 **Tip** \\\n  > > `maxdepth` argument of [fsspec][] `glob` method is already force converted by `Constructor`, no need to write a `!!int` tag on it.\n\n- Else, `Constructor` will invoke corresponding [fsspec][] implementation backend's [`open`](https://filesystem-spec.readthedocs.io/en/stable/api.html#fsspec.spec.AbstractFileSystem.open) method to open the file, parameters beside `urlpath` will be passed to the method.\n\n### Absolute and Relative URL/Path\n\nWhen the path after include tag (eg: `!inc`) is not a full protocol/scheme URL and not starts with `\"/\"`, `Constructor` tries to join the path with `base_dir`, which is a argument of `Constructor.__init__()`.\nIf `base_dir` is omitted or `None`, the actually including file path is the path in defined in [YAML][] without a change, and different [fsspec][] filesystem will treat them differently. In local filesystem, it will be `cwd`.\n\nFor remote filesystem, `HTTP` for example, the `base_dir` can not be `None` and usually be set to `\"/\"`.\n\nRelative path does not support full protocol/scheme URL format, `base_dir` does not effect for that.\n\nFor example, if we register such a `Constructor` to [PyYAML][]:\n\n```python\nimport yaml\nimport fsspec\nimport yaml_include\n\nyaml.add_constructor(\n    \"!http-include\",\n    yaml_include.Constructor(\n        fsspec.filesystem(\"http\", client_kwargs={\"base_url\": f\"http://{HOST}:{PORT}\"}),\n        base_dir=\"/sub_1/sub_1_1\"\n    )\n)\n```\n\nthen, load following [YAML][]:\n\n```yaml\nxyz: !http-include xyz.yml\n```\n\nthe actual URL to access is `http://$HOST:$PORT/sub_1/sub_1_1/xyz.yml`\n\n### Flatten sequence object in multiple matched files\n\nConsider we have such a YAML:\n\n```yaml\nitems: !include \"*.yaml\"\n```\n\nIf every file matches `*.yaml` contains a sequence object at the top level in it, what parsed and loaded will be:\n\n```yaml\nitems: [\n    [item 0 of 1st file, item 1 of 1st file, ... , item n of 1st file, ...],\n    [item 0 of 2nd file, item 1 of 2nd file, ... , item n of 2nd file, ...],\n    # ....\n    [item 0 of nth file, item 1 of nth file, ... , item n of nth file, ...],\n    # ...\n]\n```\n\nIt's a 2-dim array, because YAML content of each matched file is treated as a member of the list(sequence).\n\nBut if `flatten` parameter was set to `true`, like:\n\n```yaml\nitems: !include {urlpath: \"*.yaml\", flatten: true}\n```\n\nwe'll get:\n\n```yaml\nitems: [\n    item 0 of 1st file, item 1 of 1st file, ... , item n of 1st file,  # ...\n    item 0 of 2nd file, item 1 of 2nd file, ... , item n of 2nd file,  # ...\n    # ....\n    item 0 of n-th file, item 1 of n-th file, ... , item n of n-th file,  # ...\n    # ...\n]\n```\n\n> \u2139\ufe0f **Note**\n>\n> - Only available when multiple files were matched.\n> - **Every matched file should have a Sequence object in its top level**, or a `TypeError` exception may be thrown.\n\n### Serialization\n\nWhen load [YAML][] string with include statement, the including files are parsed into python objects by default. That is, if we call `yaml.dump()` on the object, what dumped is the parsed python object, and can not serialize the include statement itself.\n\nTo serialize the statement, we shall first create an `yaml_include.Constructor` object whose **`autoload` attribute is `False`**:\n\n```python\nimport yaml\nimport yaml_include\n\nctor = yaml_include.Constructor(autoload=False)\n```\n\nthen add both Constructor for Loader and Representer for Dumper:\n\n```python\nyaml.add_constructor(\"!inc\", ctor)\n\nrpr = yaml_include.Representer(\"inc\")\nyaml.add_representer(yaml_include.Data, rpr)\n```\n\nNow, the including files will not be loaded when call `yaml.load()`, and `yaml_include.Data` objects will be placed at the positions where include statements are.\n\ncontinue above code:\n\n```python\nyaml_str = \"\"\"\n- !inc include.d/1.yaml\n- !inc include.d/2.yaml\n\"\"\"\n\nd0 = yaml.load(yaml_str, yaml.Loader)\n# Here, \"include.d/1.yaml\" and \"include.d/2.yaml\" not be opened or loaded.\n# d0 is like:\n# [Data(urlpath=\"include.d/1.yaml\"), Data(urlpath=\"include.d/2.yaml\")]\n\n# serialize d0\ns = yaml.dump(d0)\nprint(s)\n# \u2018s\u2019 will be:\n# - !inc 'include.d/1.yaml'\n# - !inc 'include.d/2.yaml'\n\n# de-serialization\nctor.autoload = True # re-open auto load\n# then load, the file \"include.d/1.yaml\" and \"include.d/2.yaml\" will be opened and loaded.\nd1 = yaml.load(s, yaml.Loader)\n\n# Or perform a recursive opening / parsing on the object:\nd2 = yaml_include.load(d0) # d2 is equal to d1\n```\n\n`autoload` can be used in a `with` statement:\n\n```python\nctor = yaml_include.Constructor()\n# autoload is True here\n\nwith ctor.managed_autoload(False):\n    # temporary set autoload to False\n    yaml.full_load(YAML_TEXT)\n# autoload restore True automatic\n```\n\n### Include JSON or TOML\n\nWe can include files in different format other than [YAML][], like [JSON][] or [TOML][] -- ``custom_loader`` is for that.\n\n> \ud83d\udcd1 **Example** \\\n> For example:\n>\n> ```python\n> import json\n> import tomllib as toml\n> import yaml\n> import yaml_include\n>\n> # Define loader function\n> def my_loader(urlpath, file, Loader):\n>     if urlpath.endswith(\".json\"):\n>         return json.load(file)\n>     if urlpath.endswith(\".toml\"):\n>         return toml.load(file)\n>     return yaml.load(file, Loader)\n>\n> # Create the include constructor, with the custom loader\n> ctor = yaml_include.Constructor(custom_loader=my_loader)\n>\n> # Add the constructor to YAML Loader\n> yaml.add_constructor(\"!inc\", ctor, yaml.Loader)\n>\n> # Then, json files will can be loaded by std-lib's json module, and the same to toml files.\n> s = \"\"\"\n> json: !inc \"*.json\"\n> toml: !inc \"*.toml\"\n> yaml: !inc \"*.yaml\"\n> \"\"\"\n>\n> yaml.load(s, yaml.Loader)\n> ```\n\n## Develop\n\n1. clone the repo:\n\n   ```bash\n   git clone https://github.com/tanbro/pyyaml-include.git\n   cd pyyaml-include\n   ```\n\n1. create then activate a python virtual-env:\n\n   ```bash\n   python -m venv .venv\n   .venv/bin/activate\n   ```\n\n1. install development requirements and the project itself in editable mode:\n\n   ```bash\n   pip install -r requirements.txt\n   ```\n\nNow you can work on it.\n\n## Test\n\nread: `tests/README.md`\n\n[YAML]: http://yaml.org/ \"YAML: YAML Ain't Markup Language\u2122\"\n[PyYaml]: https://pypi.org/project/PyYAML/ \"PyYAML is a full-featured YAML framework for the Python programming language.\"\n[fsspec]: https://github.com/fsspec/filesystem_spec/ \"Filesystem Spec (fsspec) is a project to provide a unified pythonic interface to local, remote and embedded file systems and bytes storage.\"\n[JSON]: https://json.io/ \"JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write\"\n[TOML]: https://toml.io/ \"TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics.\"\n",
    "bugtrack_url": null,
    "license": "GPLv3+",
    "summary": "An extending constructor of PyYAML: include other YAML files into current YAML document",
    "version": "2.2",
    "project_urls": {
        "Changelog": "https://github.com/tanbro/pyyaml-include/blob/main/CHANGELOG.md",
        "Documentation": "https://pyyaml-include.readthedocs.io/en/latest/",
        "Homepage": "https://github.com/tanbro/pyyaml-include",
        "Issues": "https://github.com/tanbro/pyyaml-include/issues",
        "Repository": "https://github.com/tanbro/pyyaml-include.git"
    },
    "split_keywords": [
        "yaml",
        " pyyaml",
        " include",
        " yml"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "561fae83ff547e70cd3aeefe2ebaeeef41c4e1fd5ed2453396d249d17c3a7ead",
                "md5": "66d47fe13ce971f01e4a387952067dfa",
                "sha256": "489fff69f78bad8b9509d006297a0140fd91382a66775b8b1da0ce7e126c1815"
            },
            "downloads": -1,
            "filename": "pyyaml_include-2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "66d47fe13ce971f01e4a387952067dfa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 29565,
            "upload_time": "2024-11-09T09:36:15",
            "upload_time_iso_8601": "2024-11-09T09:36:15.241685Z",
            "url": "https://files.pythonhosted.org/packages/56/1f/ae83ff547e70cd3aeefe2ebaeeef41c4e1fd5ed2453396d249d17c3a7ead/pyyaml_include-2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0f8c4bdc1bd9676e9eb49237b3750562e9794b7585281448909fa1837c92ca27",
                "md5": "04fe338d67b76f4a07f97f10c34fcfcf",
                "sha256": "6f0c7e2ac56cdd9cc305b04122817b55514e6ce8584869fae2bc2a4ef2e0d40f"
            },
            "downloads": -1,
            "filename": "pyyaml_include-2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "04fe338d67b76f4a07f97f10c34fcfcf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 29854,
            "upload_time": "2024-11-09T09:36:16",
            "upload_time_iso_8601": "2024-11-09T09:36:16.915851Z",
            "url": "https://files.pythonhosted.org/packages/0f/8c/4bdc1bd9676e9eb49237b3750562e9794b7585281448909fa1837c92ca27/pyyaml_include-2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-09 09:36:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tanbro",
    "github_project": "pyyaml-include",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "lcname": "pyyaml-include"
}
        
Elapsed time: 5.03824s