ncdata

Name	ncdata JSON
Version	0.1.0 JSON
	download
home_page
Summary	Abstract NetCDF data objects, providing fast data transfer between analysis packages.
upload_time	2024-01-24 13:02:19
maintainer
docs_url	None
author
requires_python	>=3.7
license	BSD-3-Clause
keywords	cf-metadata data-analysis netcdf iris xarray
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # ncdata
Generic NetCDF data in Python.

Provides fast data exchange between analysis packages, and full control of storage
formatting.

Especially : Ncdata **exchanges data between Xarray and Iris** as efficently as possible  
> "lossless, copy-free and lazy-preserving".

This enables the user to freely mix+match operations from both projects, getting the
"best of both worlds".
  > import xarray  
  > import ncdata.iris_xarray as nci  
  > import iris.quickplot as qplt  
  >   
  > ds = xarray.open_dataset(filepath)  
  > ds_resample = ds.rolling(time=3).mean()  
  > cubes = nci.cubes_from_xarray(ds_resample)  
  > temp_cube = cubes.extract_cube("air_temperature")  
  > qplt.contourf(temp_cube[0])

## Contents
  * [Motivation](#motivation)
    * [Primary Use](#primary-use)
    * [Secondary Uses](#secondary-uses)
  * [Principles](#principles)
  * [Working Usage Examples](#code-examples) 
  * [API documentation](#api-documentation)
  * [Installation](#installation)
  * [Project Status](#project-status)
    * [Code stability](#code-stability)
    * [Iris and Xarray version compatibility](#iris-and-xarray-compatibility)
    * [Current Limitations](#known-limitations)
    * [Known Problems](#known-problems)
  * [References](#references)
  * [Developer Notes](#developer-notes)

# Motivation
## Primary Use
Fast and efficient translation of data between Xarray and Iris objects.

This allows the user to mix+match features from either package in code. 

For example:
``` python
from ncdata.iris_xarray import cubes_to_xarray, cubes_from_xarray

# Apply Iris regridder to xarray data
dataset = xarray.open_dataset("file1.nc", chunks="auto")
(cube,) = cubes_from_xarray(dataset)
cube2 = cube.regrid(grid_cube, iris.analysis.PointInCell)
dataset2 = cubes_to_xarray(cube2)

# Apply Xarray statistic to Iris data
cubes = iris.load("file1.nc")
dataset = cubes_to_xarray(cubes)
dataset2 = dataset.group_by("time.dayofyear").argmin()
cubes2 = cubes_from_xarray(dataset2)
``` 
  * data conversion is equivalent to writing to a file with one library, and reading it
    back with the other ..
    * .. except that no actual files are written
  * both real (numpy) and lazy (dask) variable data arrays are transferred directly, 
    without copying or computing


## Secondary Uses
### Exact control of file formatting
Ncdata can also be used as a transfer layer between Iris or Xarray file i/o and the
exact format of data stored in files.  
I.E. adjustments can be made to file data before loading it into Iris/Xarray; or
Iris/Xarray saved output can be adjusted before writing to a file.

This allows the user to workaround any package limitations in controlling storage
aspects such as : data chunking; reserved attributes; missing-value processing; or 
dimension control.

For example:
``` python
from ncdata.xarray import from_xarray
from ncdata.iris import to_iris
from ncdata.netcdf4 import to_nc4, from_nc4

# Rename a dimension in xarray output
dataset = xr.open_dataset("file1.nc")
xr_ncdata = from_xarray(dataset)
dim = xr_ncdata.dimensions.pop("dim0")
dim.name = "newdim"
xr_ncdata.dimensions["newdim"] = dim
for var in xr_ncdata.variables.values():
    var.dimensions = ["newdim" if dim == "dim0" else dim for dim in var.dimensions]
to_nc4(ncdata, "file_2a.nc")

# Fix chunking in Iris input
ncdata = from_nc4("file1.nc")
for var in ncdata.variables:
    # custom chunking() mimics the file chunks we want
    var.chunking = lambda: (100.0e6 if dim == "dim0" else -1 for dim in var.dimensions)
cubes = to_iris(ncdata)
``` 

### Manipulation of data
ncdata can also be used for data extraction and modification, similar to the scope of
CDO and NCO command-line operators but without file operations.  
However, this type of usage is as yet still undeveloped :  There is no inbuilt support
for data consistency checking, or obviously useful operations such as indexing by
dimension. 
This could be added in future, but it is also true that many such operations (like
indexing) may be better done using Iris/Xarray.


# Principles
  * ncdata represents NetCDF data as Python objects
  * ncdata objects can be freely manipulated, independent of any data file
  * ncdata variables can contain either real (numpy) or lazy (Dask) arrays
  * ncdata can be losslessly converted to and from actual NetCDF files
  * Iris or Xarray objects can be converted to and from ncdata, in the same way that 
    they are read from and saved to NetCDF files
  * **_translation_** between Xarray and Iris is based on conversion to ncdata, which
    is in turn equivalent to file i/o
     * thus, Iris/Xarray translation is equivalent to _saving_ from one
       package into a file, then _loading_ the file in the other package
  * ncdata exchanges variable data directly with Iris/Xarray, with no copying of real
    data or computing of lazy data
  * ncdata exchanges lazy arrays with files using Dask 'streaming', thus allowing
    transfer of arrays larger than memory  


# Code Examples
  * mostly TBD
  * proof-of-concept script for
    [netCDF4 file i/o](https://github.com/pp-mo/ncdata/blob/main/tests/integration/example_scripts/ex_ncdata_netcdf_conversion.py)
  * proof-of-concept script for
    [iris-xarray conversions](https://github.com/pp-mo/ncdata/blob/main/tests/integration/example_scripts/ex_iris_xarray_conversion.py)    


# API documentation
  * see the [ReadTheDocs build](https://ncdata.readthedocs.io/en/latest/index.html)


# Installation
Install from conda-forge with conda
```
conda install -c conda-forge ncdata
```

Or from PyPI with pip
```
pip install ncdata
```

# Project Status
## Code Stability
We intend to follow [PEP 440](https://peps.python.org/pep-0440/) or (older) [SemVer](https://semver.org/) versioning principles.

Release version is at **"v0.1"**.  
This is a first complete implementation, with functional operational of all public APIs.  

The code is however still experimental, and APIs are not stable (hence no major version yet).  

## Iris and Xarray Compatibility
* C.I. tests GitHub PRs and merges, against latest releases of Iris and Xarray
* compatible with iris >= v3.7.0
  * see : [support added in v3.7.0](https://scitools-iris.readthedocs.io/en/stable/whatsnew/3.7.html#internal)

## Known limitations
Unsupported features : _not planned_ 
 * user-defined datatypes are not supported
   * this includes compound and variable-length types

Unsupported features : _planned for future release_ 
 * groups (not yet fully supported ?)
 * file output chunking control

## Known problems
As-of v0.1
 * in conversion from iris cubes with [`from_iris`](https://ncdata.readthedocs.io/en/latest/api/ncdata.iris.html#ncdata.iris.from_iris),
   use of an `unlimited_dims` key currently causes an exception
   * https://github.com/pp-mo/ncdata/issues/43

# References
  * Iris issue : https://github.com/SciTools/iris/issues/4994
  * planning presentation : https://github.com/SciTools/iris/files/10499677/Xarray-Iris.bridge.proposal.--.NcData.pdf
  * in-Iris code workings : https://github.com/pp-mo/iris/pull/75


# Developer Notes
## Documentation build
  * For a full docs-build, a simple `make html` will do for now.  
    * The ``docs/Makefile`` wipes the API docs and invokes sphinx-apidoc for a full rebuild
    * Results are then available at ``docs/_build/html/index.html``
  * The above is just for _local testing_ if required :
    We have automatic builds for releases and PRs via [ReadTheDocs](https://readthedocs.org/projects/ncdata/) 

## Release actions
   1. Cut a release on GitHub : this triggers a new docs version on [ReadTheDocs](https://readthedocs.org/projects/ncdata/) 
   1. Build the distribution
      1. if needed, get [build](https://github.com/pypa/build)
      2. run `python -m build`
   2. Push to PyPI
      1. if needed, get [twine](https://github.com/pypa/twine)
      2. run `python -m twine --repository testpypi upload dist/*`
         * this uploads to TestPyPI
      3. if that checks OK, _remove_ `--repository testpypi` _and repeat_
         * --> uploads to "real" PyPI
      4. check that `pip install ncdata` can now find the new version
   3. Update conda to source the new version from PyPI
      1. create a PR on the [ncdata feedstock](https://github.com/conda-forge/ncdata-feedstock)
      1. update :
         * [version number](https://github.com/conda-forge/ncdata-feedstock/blob/3f6b35cbdffd2ee894821500f76f2b0b66f55939/recipe/meta.yaml#L2) 
         * [SHA](https://github.com/conda-forge/ncdata-feedstock/blob/3f6b35cbdffd2ee894821500f76f2b0b66f55939/recipe/meta.yaml#L10)
         * Note : the [PyPI reference](https://github.com/conda-forge/ncdata-feedstock/blob/3f6b35cbdffd2ee894821500f76f2b0b66f55939/recipe/meta.yaml#L9) will normally look after itself
         * Also : make any required changes to [dependencies](https://github.com/conda-forge/ncdata-feedstock/blob/3f6b35cbdffd2ee894821500f76f2b0b66f55939/recipe/meta.yaml#L17-L29) -- normally _no change required_
      1. get PR merged ; wait a few hours ; check the new version appears in `conda search ncdata`

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "ncdata",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "cf-metadata,data-analysis,netcdf,iris,xarray",
    "author": "",
    "author_email": "Patrick Peglar <patrick.peglar@metoffice.gov.uk>",
    "download_url": "https://files.pythonhosted.org/packages/ee/3b/ef289053fd18c983455df2d337a00624e9e83ac03583f49e98ef255b58dd/ncdata-0.1.0.tar.gz",
    "platform": null,
    "description": "# ncdata\nGeneric NetCDF data in Python.\n\nProvides fast data exchange between analysis packages, and full control of storage\nformatting.\n\nEspecially : Ncdata **exchanges data between Xarray and Iris** as efficently as possible  \n> \"lossless, copy-free and lazy-preserving\".\n\nThis enables the user to freely mix+match operations from both projects, getting the\n\"best of both worlds\".\n  > import xarray  \n  > import ncdata.iris_xarray as nci  \n  > import iris.quickplot as qplt  \n  >   \n  > ds = xarray.open_dataset(filepath)  \n  > ds_resample = ds.rolling(time=3).mean()  \n  > cubes = nci.cubes_from_xarray(ds_resample)  \n  > temp_cube = cubes.extract_cube(\"air_temperature\")  \n  > qplt.contourf(temp_cube[0])\n\n## Contents\n  * [Motivation](#motivation)\n    * [Primary Use](#primary-use)\n    * [Secondary Uses](#secondary-uses)\n  * [Principles](#principles)\n  * [Working Usage Examples](#code-examples) \n  * [API documentation](#api-documentation)\n  * [Installation](#installation)\n  * [Project Status](#project-status)\n    * [Code stability](#code-stability)\n    * [Iris and Xarray version compatibility](#iris-and-xarray-compatibility)\n    * [Current Limitations](#known-limitations)\n    * [Known Problems](#known-problems)\n  * [References](#references)\n  * [Developer Notes](#developer-notes)\n\n# Motivation\n## Primary Use\nFast and efficient translation of data between Xarray and Iris objects.\n\nThis allows the user to mix+match features from either package in code. \n\nFor example:\n``` python\nfrom ncdata.iris_xarray import cubes_to_xarray, cubes_from_xarray\n\n# Apply Iris regridder to xarray data\ndataset = xarray.open_dataset(\"file1.nc\", chunks=\"auto\")\n(cube,) = cubes_from_xarray(dataset)\ncube2 = cube.regrid(grid_cube, iris.analysis.PointInCell)\ndataset2 = cubes_to_xarray(cube2)\n\n# Apply Xarray statistic to Iris data\ncubes = iris.load(\"file1.nc\")\ndataset = cubes_to_xarray(cubes)\ndataset2 = dataset.group_by(\"time.dayofyear\").argmin()\ncubes2 = cubes_from_xarray(dataset2)\n``` \n  * data conversion is equivalent to writing to a file with one library, and reading it\n    back with the other ..\n    * .. except that no actual files are written\n  * both real (numpy) and lazy (dask) variable data arrays are transferred directly, \n    without copying or computing\n\n\n## Secondary Uses\n### Exact control of file formatting\nNcdata can also be used as a transfer layer between Iris or Xarray file i/o and the\nexact format of data stored in files.  \nI.E. adjustments can be made to file data before loading it into Iris/Xarray; or\nIris/Xarray saved output can be adjusted before writing to a file.\n\nThis allows the user to workaround any package limitations in controlling storage\naspects such as : data chunking; reserved attributes; missing-value processing; or \ndimension control.\n\nFor example:\n``` python\nfrom ncdata.xarray import from_xarray\nfrom ncdata.iris import to_iris\nfrom ncdata.netcdf4 import to_nc4, from_nc4\n\n# Rename a dimension in xarray output\ndataset = xr.open_dataset(\"file1.nc\")\nxr_ncdata = from_xarray(dataset)\ndim = xr_ncdata.dimensions.pop(\"dim0\")\ndim.name = \"newdim\"\nxr_ncdata.dimensions[\"newdim\"] = dim\nfor var in xr_ncdata.variables.values():\n    var.dimensions = [\"newdim\" if dim == \"dim0\" else dim for dim in var.dimensions]\nto_nc4(ncdata, \"file_2a.nc\")\n\n# Fix chunking in Iris input\nncdata = from_nc4(\"file1.nc\")\nfor var in ncdata.variables:\n    # custom chunking() mimics the file chunks we want\n    var.chunking = lambda: (100.0e6 if dim == \"dim0\" else -1 for dim in var.dimensions)\ncubes = to_iris(ncdata)\n``` \n\n### Manipulation of data\nncdata can also be used for data extraction and modification, similar to the scope of\nCDO and NCO command-line operators but without file operations.  \nHowever, this type of usage is as yet still undeveloped :  There is no inbuilt support\nfor data consistency checking, or obviously useful operations such as indexing by\ndimension. \nThis could be added in future, but it is also true that many such operations (like\nindexing) may be better done using Iris/Xarray.\n\n\n# Principles\n  * ncdata represents NetCDF data as Python objects\n  * ncdata objects can be freely manipulated, independent of any data file\n  * ncdata variables can contain either real (numpy) or lazy (Dask) arrays\n  * ncdata can be losslessly converted to and from actual NetCDF files\n  * Iris or Xarray objects can be converted to and from ncdata, in the same way that \n    they are read from and saved to NetCDF files\n  * **_translation_** between Xarray and Iris is based on conversion to ncdata, which\n    is in turn equivalent to file i/o\n     * thus, Iris/Xarray translation is equivalent to _saving_ from one\n       package into a file, then _loading_ the file in the other package\n  * ncdata exchanges variable data directly with Iris/Xarray, with no copying of real\n    data or computing of lazy data\n  * ncdata exchanges lazy arrays with files using Dask 'streaming', thus allowing\n    transfer of arrays larger than memory  \n\n\n# Code Examples\n  * mostly TBD\n  * proof-of-concept script for\n    [netCDF4 file i/o](https://github.com/pp-mo/ncdata/blob/main/tests/integration/example_scripts/ex_ncdata_netcdf_conversion.py)\n  * proof-of-concept script for\n    [iris-xarray conversions](https://github.com/pp-mo/ncdata/blob/main/tests/integration/example_scripts/ex_iris_xarray_conversion.py)    \n\n\n# API documentation\n  * see the [ReadTheDocs build](https://ncdata.readthedocs.io/en/latest/index.html)\n\n\n# Installation\nInstall from conda-forge with conda\n```\nconda install -c conda-forge ncdata\n```\n\nOr from PyPI with pip\n```\npip install ncdata\n```\n\n# Project Status\n## Code Stability\nWe intend to follow [PEP 440](https://peps.python.org/pep-0440/) or (older) [SemVer](https://semver.org/) versioning principles.\n\nRelease version is at **\"v0.1\"**.  \nThis is a first complete implementation, with functional operational of all public APIs.  \n\nThe code is however still experimental, and APIs are not stable (hence no major version yet).  \n\n## Iris and Xarray Compatibility\n* C.I. tests GitHub PRs and merges, against latest releases of Iris and Xarray\n* compatible with iris >= v3.7.0\n  * see : [support added in v3.7.0](https://scitools-iris.readthedocs.io/en/stable/whatsnew/3.7.html#internal)\n\n## Known limitations\nUnsupported features : _not planned_ \n * user-defined datatypes are not supported\n   * this includes compound and variable-length types\n\nUnsupported features : _planned for future release_ \n * groups (not yet fully supported ?)\n * file output chunking control\n\n## Known problems\nAs-of v0.1\n * in conversion from iris cubes with [`from_iris`](https://ncdata.readthedocs.io/en/latest/api/ncdata.iris.html#ncdata.iris.from_iris),\n   use of an `unlimited_dims` key currently causes an exception\n   * https://github.com/pp-mo/ncdata/issues/43\n\n# References\n  * Iris issue : https://github.com/SciTools/iris/issues/4994\n  * planning presentation : https://github.com/SciTools/iris/files/10499677/Xarray-Iris.bridge.proposal.--.NcData.pdf\n  * in-Iris code workings : https://github.com/pp-mo/iris/pull/75\n\n\n# Developer Notes\n## Documentation build\n  * For a full docs-build, a simple `make html` will do for now.  \n    * The ``docs/Makefile`` wipes the API docs and invokes sphinx-apidoc for a full rebuild\n    * Results are then available at ``docs/_build/html/index.html``\n  * The above is just for _local testing_ if required :\n    We have automatic builds for releases and PRs via [ReadTheDocs](https://readthedocs.org/projects/ncdata/) \n\n## Release actions\n   1. Cut a release on GitHub : this triggers a new docs version on [ReadTheDocs](https://readthedocs.org/projects/ncdata/) \n   1. Build the distribution\n      1. if needed, get [build](https://github.com/pypa/build)\n      2. run `python -m build`\n   2. Push to PyPI\n      1. if needed, get [twine](https://github.com/pypa/twine)\n      2. run `python -m twine --repository testpypi upload dist/*`\n         * this uploads to TestPyPI\n      3. if that checks OK, _remove_ `--repository testpypi` _and repeat_\n         * --> uploads to \"real\" PyPI\n      4. check that `pip install ncdata` can now find the new version\n   3. Update conda to source the new version from PyPI\n      1. create a PR on the [ncdata feedstock](https://github.com/conda-forge/ncdata-feedstock)\n      1. update :\n         * [version number](https://github.com/conda-forge/ncdata-feedstock/blob/3f6b35cbdffd2ee894821500f76f2b0b66f55939/recipe/meta.yaml#L2) \n         * [SHA](https://github.com/conda-forge/ncdata-feedstock/blob/3f6b35cbdffd2ee894821500f76f2b0b66f55939/recipe/meta.yaml#L10)\n         * Note : the [PyPI reference](https://github.com/conda-forge/ncdata-feedstock/blob/3f6b35cbdffd2ee894821500f76f2b0b66f55939/recipe/meta.yaml#L9) will normally look after itself\n         * Also : make any required changes to [dependencies](https://github.com/conda-forge/ncdata-feedstock/blob/3f6b35cbdffd2ee894821500f76f2b0b66f55939/recipe/meta.yaml#L17-L29) -- normally _no change required_\n      1. get PR merged ; wait a few hours ; check the new version appears in `conda search ncdata`\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "Abstract NetCDF data objects, providing fast data transfer between analysis packages.",
    "version": "0.1.0",
    "project_urls": {
        "Code": "https://github.com/pp-mo/ncdata",
        "Discussions": "https://github.com/pp-mo/ncdata/discussions",
        "Documentation": "https://ncdata.readthedocs.io",
        "Issues": "https://github.com/pp-mo/ncdata/issues"
    },
    "split_keywords": [
        "cf-metadata",
        "data-analysis",
        "netcdf",
        "iris",
        "xarray"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "83234272d77946131a7f26b3f60e1e1e84b47b6623f1024a31bb5f39632d3cb9",
                "md5": "b551e80206e1c3be880d05eb2b312aca",
                "sha256": "20a2d367cddfe3990d8fee234c58d75f880391661bfd1086d0f66c1105d5d066"
            },
            "downloads": -1,
            "filename": "ncdata-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b551e80206e1c3be880d05eb2b312aca",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 25526,
            "upload_time": "2024-01-24T13:02:17",
            "upload_time_iso_8601": "2024-01-24T13:02:17.663992Z",
            "url": "https://files.pythonhosted.org/packages/83/23/4272d77946131a7f26b3f60e1e1e84b47b6623f1024a31bb5f39632d3cb9/ncdata-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ee3bef289053fd18c983455df2d337a00624e9e83ac03583f49e98ef255b58dd",
                "md5": "fdaa5371800d5e08022ebf57ccad7a90",
                "sha256": "8d75986c1a463153f3f23df7e7f71a9e7a91a060f15c5009f42b95f2f0391575"
            },
            "downloads": -1,
            "filename": "ncdata-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "fdaa5371800d5e08022ebf57ccad7a90",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 430284,
            "upload_time": "2024-01-24T13:02:19",
            "upload_time_iso_8601": "2024-01-24T13:02:19.872149Z",
            "url": "https://files.pythonhosted.org/packages/ee/3b/ef289053fd18c983455df2d337a00624e9e83ac03583f49e98ef255b58dd/ncdata-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-24 13:02:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pp-mo",
    "github_project": "ncdata",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "ncdata"
}