tsdat-tools


Nametsdat-tools JSON
Version 0.3.1 PyPI version JSON
download
home_pageNone
SummaryUtility scripts and tools for tsdat.
upload_time2024-11-25 17:41:17
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseBSD 3-Clause License
keywords tsdat utils config
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Tsdat Tools

This repository contains helpful scripts and notes for several tsdat-related tools.

Some tools are available as jupyter notebooks, and others are available as a command-line utility.

To get access to the command-line utilities, just run:

```shell
pip install tsdat-tools
```

To use all the other tools, we recommend cloning this repository.

## Data to Yaml

The goal of this tool is to reduce the tediousness of writing tsdat configuration files for data that you can already
read and convert into an `xr.Dataset` object in tsdat. It generates two output files: `dataset.yaml` and
`retriever.yaml`, which are used by `tsdat` to define metadata and how the input variables should be mapped to output
variables.

If your file is in one of the following formats, this tool can already do this for you. Formats supported out-of-box:

* **`netCDF`**: Files ending with `.nc` or `.cdf` will use the `tsdat.NetCDFReader` class
* **`csv`**: Files ending with `.csv` will use the `tsdat.CSVReader` class
* **`parquet`**: Files ending with `.parquet` or `.pq` or `.pqt` will use the `tsdat.ParquetReader` class
* **`zarr`**: Files/folders ending with `.zarr` will use the `tsdat.ZarrReader` class

### Usage

Then you can run the tool with:

```shell
tsdat-tools data2yaml path/to/data/file --input-config path/to/current/dataset.yaml
```

Full usage instructions can be obtained using the `--help` flag:

```shell
>>> tsdat-tools data2yaml --help

Usage: tsdat-tools data2yaml [OPTIONS] DATAPATH

╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    datapath   PATH  Path to the input data file that should be used to generate tsdat configurations. │
│                       [default: None]                                                                   │
│                       [required]                                                                        │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────╮
│ --outdir                               DIRECTORY                      The path to the directory where   │
│                                                                       the 'dataset.yaml' and            │
│                                                                       'retriever.yaml' files should be  │
│                                                                       written.                          │
│                                                                       [default: .]                      │
│ --input-config                         PATH                           Path to a dataset.yaml file to be │
│                                                                       used in addition to               │
│                                                                       configurations derived from the   │
│                                                                       input data file. Configurations   │
│                                                                       defined here take priority over   │
│                                                                       auto-detected properties in the   │
│                                                                       input file.                       │
│                                                                       [default: None]                   │
│ --help                                                                Show this message and exit.       │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```

This tool is designed to be run in the following workflow:

1. Generate new ingest/pipeline from cookiecutter template (e.g., `make cookies` command)
2. Put an example data file for your pipeline in the `test/data/input` folder
3. Clean up the autogenerated `dataset.yaml` file.
    * Add metadata and remove any unused variables
    * Don't add additional variables yet; just make sure that the info in the current file is accurate
4. Commit your changes in `git` or back up your changes so you can compare before & after the script runs.
5. Run this script, passing it the path to your input data file and using the `--input-config` option to tell it where
your cleaned `dataset.yaml` file is. By default this will generate a new `dataset.yaml` file in the current working
directory (location of `pwd` on the command line), but you can also use the `--outdir` option to specify the path where
it should write to.
6. Review the changes the script made to each file. Note that it is not capable of standardizing units or other
metadata, so you will still need to clean those up manually.
7. Continue with the rest of the ingest/pipeline development steps

## Excel to Yaml

Please consult the documentation in the [excel2yaml/README.md](./excel2yaml/README.md) file for more information about
this tool.

## NetCDF to CSV

Please consult the documentation in the [netcdf2csv/README.md](./netcdf2csv/README.md) file for more information about
this tool.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tsdat-tools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "tsdat, utils, config",
    "author": null,
    "author_email": "Maxwell Levin <24307537+maxwelllevin@users.noreply.github.com>, James McVey <53623232+jmcvey3@users.noreply.github.com>",
    "download_url": "https://files.pythonhosted.org/packages/ef/ea/185256702507896aed715dcd29f8171a1534532fc6060e9f714a8e0ec951/tsdat_tools-0.3.1.tar.gz",
    "platform": null,
    "description": "# Tsdat Tools\n\nThis repository contains helpful scripts and notes for several tsdat-related tools.\n\nSome tools are available as jupyter notebooks, and others are available as a command-line utility.\n\nTo get access to the command-line utilities, just run:\n\n```shell\npip install tsdat-tools\n```\n\nTo use all the other tools, we recommend cloning this repository.\n\n## Data to Yaml\n\nThe goal of this tool is to reduce the tediousness of writing tsdat configuration files for data that you can already\nread and convert into an `xr.Dataset` object in tsdat. It generates two output files: `dataset.yaml` and\n`retriever.yaml`, which are used by `tsdat` to define metadata and how the input variables should be mapped to output\nvariables.\n\nIf your file is in one of the following formats, this tool can already do this for you. Formats supported out-of-box:\n\n* **`netCDF`**: Files ending with `.nc` or `.cdf` will use the `tsdat.NetCDFReader` class\n* **`csv`**: Files ending with `.csv` will use the `tsdat.CSVReader` class\n* **`parquet`**: Files ending with `.parquet` or `.pq` or `.pqt` will use the `tsdat.ParquetReader` class\n* **`zarr`**: Files/folders ending with `.zarr` will use the `tsdat.ZarrReader` class\n\n### Usage\n\nThen you can run the tool with:\n\n```shell\ntsdat-tools data2yaml path/to/data/file --input-config path/to/current/dataset.yaml\n```\n\nFull usage instructions can be obtained using the `--help` flag:\n\n```shell\n>>> tsdat-tools data2yaml --help\n\nUsage: tsdat-tools data2yaml [OPTIONS] DATAPATH\n\n\u256d\u2500 Arguments \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *    datapath   PATH  Path to the input data file that should be used to generate tsdat configurations. \u2502\n\u2502                       [default: None]                                                                   \u2502\n\u2502                       [required]                                                                        \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --outdir                               DIRECTORY                      The path to the directory where   \u2502\n\u2502                                                                       the 'dataset.yaml' and            \u2502\n\u2502                                                                       'retriever.yaml' files should be  \u2502\n\u2502                                                                       written.                          \u2502\n\u2502                                                                       [default: .]                      \u2502\n\u2502 --input-config                         PATH                           Path to a dataset.yaml file to be \u2502\n\u2502                                                                       used in addition to               \u2502\n\u2502                                                                       configurations derived from the   \u2502\n\u2502                                                                       input data file. Configurations   \u2502\n\u2502                                                                       defined here take priority over   \u2502\n\u2502                                                                       auto-detected properties in the   \u2502\n\u2502                                                                       input file.                       \u2502\n\u2502                                                                       [default: None]                   \u2502\n\u2502 --help                                                                Show this message and exit.       \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\nThis tool is designed to be run in the following workflow:\n\n1. Generate new ingest/pipeline from cookiecutter template (e.g., `make cookies` command)\n2. Put an example data file for your pipeline in the `test/data/input` folder\n3. Clean up the autogenerated `dataset.yaml` file.\n    * Add metadata and remove any unused variables\n    * Don't add additional variables yet; just make sure that the info in the current file is accurate\n4. Commit your changes in `git` or back up your changes so you can compare before & after the script runs.\n5. Run this script, passing it the path to your input data file and using the `--input-config` option to tell it where\nyour cleaned `dataset.yaml` file is. By default this will generate a new `dataset.yaml` file in the current working\ndirectory (location of `pwd` on the command line), but you can also use the `--outdir` option to specify the path where\nit should write to.\n6. Review the changes the script made to each file. Note that it is not capable of standardizing units or other\nmetadata, so you will still need to clean those up manually.\n7. Continue with the rest of the ingest/pipeline development steps\n\n## Excel to Yaml\n\nPlease consult the documentation in the [excel2yaml/README.md](./excel2yaml/README.md) file for more information about\nthis tool.\n\n## NetCDF to CSV\n\nPlease consult the documentation in the [netcdf2csv/README.md](./netcdf2csv/README.md) file for more information about\nthis tool.\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause License",
    "summary": "Utility scripts and tools for tsdat.",
    "version": "0.3.1",
    "project_urls": null,
    "split_keywords": [
        "tsdat",
        " utils",
        " config"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cc9c8b2e9b576981fcfe9b457e8a7bd37d000e20dd8b88ddb564c49d5e62b135",
                "md5": "fa97b24c12a4a99464724bd3bd461b10",
                "sha256": "f5a01575982a9b713e9693c0ae8018063c344e7b1e77f0483c66032fe0712197"
            },
            "downloads": -1,
            "filename": "tsdat_tools-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fa97b24c12a4a99464724bd3bd461b10",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 31650,
            "upload_time": "2024-11-25T17:41:15",
            "upload_time_iso_8601": "2024-11-25T17:41:15.972553Z",
            "url": "https://files.pythonhosted.org/packages/cc/9c/8b2e9b576981fcfe9b457e8a7bd37d000e20dd8b88ddb564c49d5e62b135/tsdat_tools-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "efea185256702507896aed715dcd29f8171a1534532fc6060e9f714a8e0ec951",
                "md5": "2f64f162a16d4931d5ce3acdfdbd03eb",
                "sha256": "80a28d9f565024cdce2d6f676a06ee305cf11ccf71793e8179dff5b3b96657fd"
            },
            "downloads": -1,
            "filename": "tsdat_tools-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "2f64f162a16d4931d5ce3acdfdbd03eb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 50635,
            "upload_time": "2024-11-25T17:41:17",
            "upload_time_iso_8601": "2024-11-25T17:41:17.547929Z",
            "url": "https://files.pythonhosted.org/packages/ef/ea/185256702507896aed715dcd29f8171a1534532fc6060e9f714a8e0ec951/tsdat_tools-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-25 17:41:17",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "tsdat-tools"
}
        
Elapsed time: 1.71423s