icoscp_core


Nameicoscp_core JSON
Version 0.3.3 PyPI version JSON
download
home_pageNone
Summaryicoscp_core
upload_time2024-03-20 14:40:46
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords environment research infrastructure data access
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # icoscp_core

A foundational ICOS Carbon Portal (CP) core products Python library for metadata and data access, designed to work with multiple data repositories who use ICOS Carbon Portal core server software stack to host and serve their data. At the moment, three repositories are supported: [ICOS](https://data.icos-cp.eu/portal/), [SITES](https://data.fieldsites.se/portal/), and [ICOS Cities](https://citydata.icos-cp.eu/portal/).

## Design goals

- offer basic functionality with good performance
- good alignment with the server APIs and ICOS metadata model
- minimise dependencies (only depend on `numpy` and `dacite`)
- aim for good integration with `pandas` without depending on this package
- provide a solid foundation for future versions of [icoscp](https://pypi.org/project/icoscp/)—an ICOS-specific meta- and data access library developed by the CP Elaborated Products team
- extensive use of type annotations and Python data classes, to safeguard against preventable bugs, both in the library itself, and in the tools and apps written on top of it; a goal is to satisfy the typechecker in strict mode
- usage of autogenerated data classes produced from Scala back end code representing various metadata entities (e.g. data objects, stations) and their parts
- simultaneous support for three cross-cutting concerns:
	- multiple repositories (ICOS, SITES, ICOS Cities)
	- multiple ways of authentication
	- data access through the HTTP API (on an arbitrary machine) and through file system (on a Jupyter notebook with "backdoor" data access); in the latter case the library is responsible for reporting the data usage event.

## Getting started

The library is available on PyPI, can be installed with `pip`:
```Bash
$ pip install icoscp_core
```

**The code examples below are usually provided for ICOS. For other Repositories (SITES or ICOS Cities), in the import directives, use `icoscp_core.sites` or `icoscp_core.cities`, respectively, instead of `icoscp_core.icos`.**

## Authentication

Metadata access does not require authentication, and is achieved by a simple import:
```Python
from icoscp_core.icos import meta
```
Additionally, when using the library on an accordingly configured Jupyter notebook service hosted by the ICOS Carbon Portal, authentication is not required when using two of the data access methods:
- `get_columns_as_arrays`
- `batch_get_columns_as_arrays`

available on `data` import from `icoscp_core.icos` package.

When using other data access methods, or when running the code outside ICOS Jupyter environment, or if the Jupyter environment has not been provisioned with file access to your Repository, authentication is required for the data access.

Authentication can be initialized in a number of ways.

### Credentials and token cache file (default)

This approach should only be used on machines the developer trusts.

A username/password account with the respective authentication service (links for: [ICOS](https://cpauth.icos-cp.eu/), [SITES](https://auth.fieldsites.se/), [ICOS Cities](https://cityauth.icos-cp.eu/)) is required for this. Obfuscated (not readable by humans) password is stored in a file on the local machine in a default user-specific folder. To initialize this file, run the following code interactively (only needs to be once for every machine):

```Python
from icoscp_core.icos import auth

auth.init_config_file()
```

After the initialization step is done, access to the metadata and data services is achieved by a simple import:
```Python
from icoscp_core.icos import meta, data
```

As an alternative, the developer may choose to use a specific file to store the credentials and token cache. In this scenario, `data` service needs to be initialized as follows:

```Python
from icoscp_core.icos import bootstrap
auth, meta, data = bootstrap.fromPasswordFile("<desired path to the file>")

# the next line needs to be run interactively (only once per file)
auth.init_config_file()
```

### Static authentication token (prototyping)

This option is good for testing, on a public machine or in general. Its only disadvantage is that the tokens have limited period of validity (100000 seconds, less than 28 hours), but this is precisely what makes it acceptable to include them directly in the Python source code.

The token can be obtained from the "My Account" page (links for: [ICOS](https://cpauth.icos-cp.eu/), [SITES](https://auth.fieldsites.se/), [ICOS Cities](https://cityauth.icos-cp.eu/)), which can be accessed by logging in using one of the supported authentication mechanisms (username/password, university sign-in, OAuth sign in). After this the bootstrapping can be done as follows:

```Python
from icoscp_core.icos import bootstrap
cookie_token = 'cpauthToken=WzE2OTY2NzQ5OD...'
meta, data = bootstrap.fromCookieToken(cookie_token)
```

### Explicit credentials (advanced option)

The user may choose to use their own mechanism of providing the credentials to initialize the authentication. This should be considered as an advanced option. **(Please do not put your password as clear text in your Python code!)** This can be achieved as follows:

```Python
from icoscp_core.icos import bootstrap
meta, data = bootstrap.fromCredentials(username_variable, password_containing_variable)
```

---

## Metadata access

Metadata access requires no authentication, and is performed using an instance of `MetadataClient` class easily obtainable through an import:
```Python
from icoscp_core.icos import meta
```

An important background information is that all the metadata-represented entities (data objects, data types, documents, collections, measurement stations, people, etc) are identified by URIs. The metadata-access methods usually accept these URIs as input arguments, and the returned values tend to be instances of [Python dataclasses](https://peps.python.org/pep-0557/), which brings:
 - better syntax in comparison with generic dictionaries (dot-notation attribute access instead of dictionary value access, for example `dobj_meta.specification.project.self.uri` instead of `dobj_meta["specification"]["project"]["self"]["uri"]`)
 - autocomplete of the dataclass attributes (works even in Jupyter notebooks)
 - type checking, when developing with type annotations and a type checker (typically available from an IDE, but not from Jupyter)

The following code showcases the main metadata access methods.

### Discover data types
```Python
# fetches the list of known data types, including metadata associated with them
all_datatypes = meta.list_datatypes()

# data types with structured data access
previewable_datatypes = [dt for dt in all_datatypes if dt.has_data_access]
```

### Discover stations
```Python
from icoscp_core.icos import meta, ATMO_STATION

# fetch lists of stations, with basic metadata
icos_stations = meta.list_stations()
atmo_stations = meta.list_stations(ATMO_STATION)
all_known_stations = meta.list_stations(False)

# get detailed metadata for a station
htm_uri = 'http://meta.icos-cp.eu/resources/stations/AS_HTM'
htm_station_meta = meta.get_station_meta(htm_uri)
```

### Find data objects

```Python
from icoscp_core.metaclient import TimeFilter, SizeFilter, SamplingHeightFilter

# list data objects with basic metadata
# a contrived, complicated example to demonstrate the possibilities
# all the arguments are optional
# see the Python help for the method for more details
filtered_atc_co2 = meta.list_data_objects(
	datatype = [
		"http://meta.icos-cp.eu/resources/cpmeta/atcCo2L2DataObject",
		"http://meta.icos-cp.eu/resources/cpmeta/atcCo2NrtGrowingDataObject"
	],
	station = "http://meta.icos-cp.eu/resources/stations/AS_GAT",
	filters = [
		TimeFilter("submTime", ">", "2023-07-01T12:00:00Z"),
		TimeFilter("submTime", "<", "2023-07-10T12:00:00Z"),
		SizeFilter(">", 50000),
		SamplingHeightFilter("=", 216)
	],
	include_deprecated = True,
	order_by = "fileName",
	limit = 50
)
```

### Geospatial filtering of data objects
Similarly to `TimeFilter` and `SizeFilter`, `GeoIntersectFilter` is available to filter the data objects by their geospatial coverage, specifically by filtering the objects whose geo covarage intersects a region of interest, which can be represented by a polygon. `GeoIntersectFilter` has a list of `Point`s as the only constructor argument `polygon`.

```Python
from icoscp_core.metaclient import Point, GeoIntersectFilter

la_reunion_co2 = meta.list_data_objects(
	datatype="http://meta.icos-cp.eu/resources/cpmeta/atcCo2Product",
	filters=[
		GeoIntersectFilter([
			Point(-21.46555, 54.90857),
			Point(-20.65176, 55.423563),
			Point(-21.408027, 56.231058)
		])
	]
)
```

For convenience of creation standard rectangular lat/lon bounding boxes, there is a helper method `box_intersect` that takes two points as arguments (south-western and north-eastern corners of the box):

```Python
from icoscp_core.metaclient import Point, box_intersect

sydney_model_data_archives = meta.list_data_objects(
	datatype="http://meta.icos-cp.eu/resources/cpmeta/modelDataArchive",
	filters=[box_intersect(Point(-40, 145), Point(-25, 155))]
)
```

### Fetch detailed metadata for a single data object
```Python
dobj_uri = 'https://meta.icos-cp.eu/objects/BbEO5i3rDLhS_vR-eNNLjp3Q'
dobj_meta = meta.get_dobj_meta(dobj_uri)
```

### Fetch metadata for a collection
Some data objects belong to collections. Collections can also contain other collections. Collections can be discovered on the data portal app, or from individual data object metadata (as parent collections), for example:
```Python
dobj = meta.get_dobj_meta('https://meta.icos-cp.eu/objects/hujSGCfmNIRdxtOcEvEJLxGM')
coll_uri = dobj.parentCollections[0].uri
coll_meta = meta.get_collection_meta(coll_uri)
```

### Note

Detailed help on the available metadata access methods can be obtained from `help(meta)` call.

## Repository-specific functionality

The majority of functionality of the library is common to all the supported data Repositories. However, in some cases Repository-specific reusable code may be useful. Such code is planned to be placed into corresponding packages. There is only one example of such code at the moment:
```Python
from icoscp_core.icos import station_class_lookup
htm_uri = 'http://meta.icos-cp.eu/resources/stations/AS_HTM'
htm_class = station_class_lookup()[htm_uri]
```

---

## Data access

After having identified an interesting data object or a list of objects in the previous step, one can access their data content in a few ways. Data access is provided by an instance of `DataClient` class most easily obtained by import
```Python
from icoscp_core.icos import data
```

The following are code examples showcasing the main data access methods.

### Downloading original data object content
Given basic data object metadata (or just the URI id) one can download the original data to a folder like so:
```Python
filename = data.save_to_folder(dobj_uri, '/myhome/icosdata/')
```
The method requires authentication, even on ICOS Jupyter instances. Works on all data objects (all kinds, and regardless of variable metadata availability)

### Station-specific time series
Station-specific time series, that have variable metadata associated with them, enjoy a higher level of support. The variables with metadata representation (which may be only a subset of the variables present in the original data) can be efficiently accessed using this library. For single-object access, a complete data object metadata is required. The output can be readily converted to a pandas `DataFrame`, but can be used as is (a dictionary of numpy arrays).  It is possible to explicitly limit variables for access, and to slice the time series.

 Authentication may be optional on ICOS Jupyter instances.

```Python
import pandas as pd
# get dataset columns as typed arrays, ready to be imported into pandas
dobj_arrays = data.get_columns_as_arrays(dobj_meta, ['TIMESTAMP', 'co2'])
df = pd.DataFrame(dobj_arrays)
```
One way to distinguish the objects with structured data access is that their data types (used for filtering the data objects, see the metadata access section) have `has_data_access` property equal to `True`.

### Batch data access
In many scripting scenarios, data objects are processed in batches of uniform data types. In this case, rather than using `get_columns_as_arrays` method in a loop, it is much more efficient to use a special batch-access method. This will significantly reduce the number of round trips to the HTTP metadata service, greatly speeding up the operation:
```Python
multi_dobjs = data.batch_get_columns_as_arrays(filtered_atc_co2, ['TIMESTAMP', 'co2'])
```
where `filtered_atc_co2` is a either a list from the metadata examples above, or just a list of plain data object URI IDs. The returned value is a generator of pairs, where first value is the basic data object metadata (or just a plain URI id, depending on what was used as the argument), and the second value is the same as the return value from `get_columns_as_arrays` method (a dictionary of numpy arrays, with variable names as keys)

If it is desirable to convert the data to pandas `DataFrame`s, it can be done like so:

```Python
import pandas as pd
multi_df = ( (dobj, pd.DataFrame(arrs)) for dobj, arrs in multi_dobjs)
```

### CSV representation access
The data server offers (partial) CSV representations for fully-supported time series datasets. That service can be used from this library as follows:
```Python
import pandas as pd
csv_stream = data.get_csv_byte_stream(dobj_uri)
df = pd.read_csv(csv_stream)
```
but using `get_columns_as_arrays` and `batch_get_columns_as_arrays` is to be preferred for performance reasons, especially on ICOS Jupyter instances. Authentication is always required to use this method.

## Advanced metadata access (SPARQL)

For general metadata enquiries not offered by the API explicitly, it is often possible to design a SPARQL query that would provide the required information. The query can be run with `sparql_select` method of `MetadataClient`, and the output of the latter can be parsed using "`as_<rdf_datatype>`"-named methods in `icoscp_core.sparql` module. For example:

```Python
from icoscp_core.icos import meta
from icoscp_core.sparql import as_string, as_uri

query = """prefix cpmeta: <http://meta.icos-cp.eu/ontologies/cpmeta/>
	select *
	from <http://meta.icos-cp.eu/documents/>
	where{
		?doc a cpmeta:DocumentObject .
		FILTER NOT EXISTS {[] cpmeta:isNextVersionOf ?doc}
		?doc cpmeta:hasDoi ?doi .
		?doc cpmeta:hasName ?filename .
	}"""
latest_docs_with_dois = [
	{
		"uri": as_uri("doc", row),
		"filename": as_string("filename", row),
		"doi": as_string("doi", row)
	}
	for row in meta.sparql_select(query).bindings
]
```
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "icoscp_core",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Klara Broman <klara.broman@nateko.lu.se>, Jonathan Schenk <jonathan.schenk@nateko.lu.se>",
    "keywords": "environment, research, infrastructure, data access",
    "author": null,
    "author_email": "Oleg Mirzov <oleg.mirzov@nateko.lu.se>",
    "download_url": "https://files.pythonhosted.org/packages/2b/9a/d8bdfa6c05a50e71b174e279c38a99a65c3d6d28bef1f3e37715fbc11d70/icoscp_core-0.3.3.tar.gz",
    "platform": null,
    "description": "# icoscp_core\n\nA foundational ICOS Carbon Portal (CP) core products Python library for metadata and data access, designed to work with multiple data repositories who use ICOS Carbon Portal core server software stack to host and serve their data. At the moment, three repositories are supported: [ICOS](https://data.icos-cp.eu/portal/), [SITES](https://data.fieldsites.se/portal/), and [ICOS Cities](https://citydata.icos-cp.eu/portal/).\n\n## Design goals\n\n- offer basic functionality with good performance\n- good alignment with the server APIs and ICOS metadata model\n- minimise dependencies (only depend on `numpy` and `dacite`)\n- aim for good integration with `pandas` without depending on this package\n- provide a solid foundation for future versions of [icoscp](https://pypi.org/project/icoscp/)&mdash;an ICOS-specific meta- and data access library developed by the CP Elaborated Products team\n- extensive use of type annotations and Python data classes, to safeguard against preventable bugs, both in the library itself, and in the tools and apps written on top of it; a goal is to satisfy the typechecker in strict mode\n- usage of autogenerated data classes produced from Scala back end code representing various metadata entities (e.g. data objects, stations) and their parts\n- simultaneous support for three cross-cutting concerns:\n\t- multiple repositories (ICOS, SITES, ICOS Cities)\n\t- multiple ways of authentication\n\t- data access through the HTTP API (on an arbitrary machine) and through file system (on a Jupyter notebook with \"backdoor\" data access); in the latter case the library is responsible for reporting the data usage event.\n\n## Getting started\n\nThe library is available on PyPI, can be installed with `pip`:\n```Bash\n$ pip install icoscp_core\n```\n\n**The code examples below are usually provided for ICOS. For other Repositories (SITES or ICOS Cities), in the import directives, use `icoscp_core.sites` or `icoscp_core.cities`, respectively, instead of `icoscp_core.icos`.**\n\n## Authentication\n\nMetadata access does not require authentication, and is achieved by a simple import:\n```Python\nfrom icoscp_core.icos import meta\n```\nAdditionally, when using the library on an accordingly configured Jupyter notebook service hosted by the ICOS Carbon Portal, authentication is not required when using two of the data access methods:\n- `get_columns_as_arrays`\n- `batch_get_columns_as_arrays`\n\navailable on `data` import from `icoscp_core.icos` package.\n\nWhen using other data access methods, or when running the code outside ICOS Jupyter environment, or if the Jupyter environment has not been provisioned with file access to your Repository, authentication is required for the data access.\n\nAuthentication can be initialized in a number of ways.\n\n### Credentials and token cache file (default)\n\nThis approach should only be used on machines the developer trusts.\n\nA username/password account with the respective authentication service (links for: [ICOS](https://cpauth.icos-cp.eu/), [SITES](https://auth.fieldsites.se/), [ICOS Cities](https://cityauth.icos-cp.eu/)) is required for this. Obfuscated (not readable by humans) password is stored in a file on the local machine in a default user-specific folder. To initialize this file, run the following code interactively (only needs to be once for every machine):\n\n```Python\nfrom icoscp_core.icos import auth\n\nauth.init_config_file()\n```\n\nAfter the initialization step is done, access to the metadata and data services is achieved by a simple import:\n```Python\nfrom icoscp_core.icos import meta, data\n```\n\nAs an alternative, the developer may choose to use a specific file to store the credentials and token cache. In this scenario, `data` service needs to be initialized as follows:\n\n```Python\nfrom icoscp_core.icos import bootstrap\nauth, meta, data = bootstrap.fromPasswordFile(\"<desired path to the file>\")\n\n# the next line needs to be run interactively (only once per file)\nauth.init_config_file()\n```\n\n### Static authentication token (prototyping)\n\nThis option is good for testing, on a public machine or in general. Its only disadvantage is that the tokens have limited period of validity (100000 seconds, less than 28 hours), but this is precisely what makes it acceptable to include them directly in the Python source code.\n\nThe token can be obtained from the \"My Account\" page (links for: [ICOS](https://cpauth.icos-cp.eu/), [SITES](https://auth.fieldsites.se/), [ICOS Cities](https://cityauth.icos-cp.eu/)), which can be accessed by logging in using one of the supported authentication mechanisms (username/password, university sign-in, OAuth sign in). After this the bootstrapping can be done as follows:\n\n```Python\nfrom icoscp_core.icos import bootstrap\ncookie_token = 'cpauthToken=WzE2OTY2NzQ5OD...'\nmeta, data = bootstrap.fromCookieToken(cookie_token)\n```\n\n### Explicit credentials (advanced option)\n\nThe user may choose to use their own mechanism of providing the credentials to initialize the authentication. This should be considered as an advanced option. **(Please do not put your password as clear text in your Python code!)** This can be achieved as follows:\n\n```Python\nfrom icoscp_core.icos import bootstrap\nmeta, data = bootstrap.fromCredentials(username_variable, password_containing_variable)\n```\n\n---\n\n## Metadata access\n\nMetadata access requires no authentication, and is performed using an instance of `MetadataClient` class easily obtainable through an import:\n```Python\nfrom icoscp_core.icos import meta\n```\n\nAn important background information is that all the metadata-represented entities (data objects, data types, documents, collections, measurement stations, people, etc) are identified by URIs. The metadata-access methods usually accept these URIs as input arguments, and the returned values tend to be instances of [Python dataclasses](https://peps.python.org/pep-0557/), which brings:\n - better syntax in comparison with generic dictionaries (dot-notation attribute access instead of dictionary value access, for example `dobj_meta.specification.project.self.uri` instead of `dobj_meta[\"specification\"][\"project\"][\"self\"][\"uri\"]`)\n - autocomplete of the dataclass attributes (works even in Jupyter notebooks)\n - type checking, when developing with type annotations and a type checker (typically available from an IDE, but not from Jupyter)\n\nThe following code showcases the main metadata access methods.\n\n### Discover data types\n```Python\n# fetches the list of known data types, including metadata associated with them\nall_datatypes = meta.list_datatypes()\n\n# data types with structured data access\npreviewable_datatypes = [dt for dt in all_datatypes if dt.has_data_access]\n```\n\n### Discover stations\n```Python\nfrom icoscp_core.icos import meta, ATMO_STATION\n\n# fetch lists of stations, with basic metadata\nicos_stations = meta.list_stations()\natmo_stations = meta.list_stations(ATMO_STATION)\nall_known_stations = meta.list_stations(False)\n\n# get detailed metadata for a station\nhtm_uri = 'http://meta.icos-cp.eu/resources/stations/AS_HTM'\nhtm_station_meta = meta.get_station_meta(htm_uri)\n```\n\n### Find data objects\n\n```Python\nfrom icoscp_core.metaclient import TimeFilter, SizeFilter, SamplingHeightFilter\n\n# list data objects with basic metadata\n# a contrived, complicated example to demonstrate the possibilities\n# all the arguments are optional\n# see the Python help for the method for more details\nfiltered_atc_co2 = meta.list_data_objects(\n\tdatatype = [\n\t\t\"http://meta.icos-cp.eu/resources/cpmeta/atcCo2L2DataObject\",\n\t\t\"http://meta.icos-cp.eu/resources/cpmeta/atcCo2NrtGrowingDataObject\"\n\t],\n\tstation = \"http://meta.icos-cp.eu/resources/stations/AS_GAT\",\n\tfilters = [\n\t\tTimeFilter(\"submTime\", \">\", \"2023-07-01T12:00:00Z\"),\n\t\tTimeFilter(\"submTime\", \"<\", \"2023-07-10T12:00:00Z\"),\n\t\tSizeFilter(\">\", 50000),\n\t\tSamplingHeightFilter(\"=\", 216)\n\t],\n\tinclude_deprecated = True,\n\torder_by = \"fileName\",\n\tlimit = 50\n)\n```\n\n### Geospatial filtering of data objects\nSimilarly to `TimeFilter` and `SizeFilter`, `GeoIntersectFilter` is available to filter the data objects by their geospatial coverage, specifically by filtering the objects whose geo covarage intersects a region of interest, which can be represented by a polygon. `GeoIntersectFilter` has a list of `Point`s as the only constructor argument `polygon`.\n\n```Python\nfrom icoscp_core.metaclient import Point, GeoIntersectFilter\n\nla_reunion_co2 = meta.list_data_objects(\n\tdatatype=\"http://meta.icos-cp.eu/resources/cpmeta/atcCo2Product\",\n\tfilters=[\n\t\tGeoIntersectFilter([\n\t\t\tPoint(-21.46555, 54.90857),\n\t\t\tPoint(-20.65176, 55.423563),\n\t\t\tPoint(-21.408027, 56.231058)\n\t\t])\n\t]\n)\n```\n\nFor convenience of creation standard rectangular lat/lon bounding boxes, there is a helper method `box_intersect` that takes two points as arguments (south-western and north-eastern corners of the box):\n\n```Python\nfrom icoscp_core.metaclient import Point, box_intersect\n\nsydney_model_data_archives = meta.list_data_objects(\n\tdatatype=\"http://meta.icos-cp.eu/resources/cpmeta/modelDataArchive\",\n\tfilters=[box_intersect(Point(-40, 145), Point(-25, 155))]\n)\n```\n\n### Fetch detailed metadata for a single data object\n```Python\ndobj_uri = 'https://meta.icos-cp.eu/objects/BbEO5i3rDLhS_vR-eNNLjp3Q'\ndobj_meta = meta.get_dobj_meta(dobj_uri)\n```\n\n### Fetch metadata for a collection\nSome data objects belong to collections. Collections can also contain other collections. Collections can be discovered on the data portal app, or from individual data object metadata (as parent collections), for example:\n```Python\ndobj = meta.get_dobj_meta('https://meta.icos-cp.eu/objects/hujSGCfmNIRdxtOcEvEJLxGM')\ncoll_uri = dobj.parentCollections[0].uri\ncoll_meta = meta.get_collection_meta(coll_uri)\n```\n\n### Note\n\nDetailed help on the available metadata access methods can be obtained from `help(meta)` call.\n\n## Repository-specific functionality\n\nThe majority of functionality of the library is common to all the supported data Repositories. However, in some cases Repository-specific reusable code may be useful. Such code is planned to be placed into corresponding packages. There is only one example of such code at the moment:\n```Python\nfrom icoscp_core.icos import station_class_lookup\nhtm_uri = 'http://meta.icos-cp.eu/resources/stations/AS_HTM'\nhtm_class = station_class_lookup()[htm_uri]\n```\n\n---\n\n## Data access\n\nAfter having identified an interesting data object or a list of objects in the previous step, one can access their data content in a few ways. Data access is provided by an instance of `DataClient` class most easily obtained by import\n```Python\nfrom icoscp_core.icos import data\n```\n\nThe following are code examples showcasing the main data access methods.\n\n### Downloading original data object content\nGiven basic data object metadata (or just the URI id) one can download the original data to a folder like so:\n```Python\nfilename = data.save_to_folder(dobj_uri, '/myhome/icosdata/')\n```\nThe method requires authentication, even on ICOS Jupyter instances. Works on all data objects (all kinds, and regardless of variable metadata availability)\n\n### Station-specific time series\nStation-specific time series, that have variable metadata associated with them, enjoy a higher level of support. The variables with metadata representation (which may be only a subset of the variables present in the original data) can be efficiently accessed using this library. For single-object access, a complete data object metadata is required. The output can be readily converted to a pandas `DataFrame`, but can be used as is (a dictionary of numpy arrays).  It is possible to explicitly limit variables for access, and to slice the time series.\n\n Authentication may be optional on ICOS Jupyter instances.\n\n```Python\nimport pandas as pd\n# get dataset columns as typed arrays, ready to be imported into pandas\ndobj_arrays = data.get_columns_as_arrays(dobj_meta, ['TIMESTAMP', 'co2'])\ndf = pd.DataFrame(dobj_arrays)\n```\nOne way to distinguish the objects with structured data access is that their data types (used for filtering the data objects, see the metadata access section) have `has_data_access` property equal to `True`.\n\n### Batch data access\nIn many scripting scenarios, data objects are processed in batches of uniform data types. In this case, rather than using `get_columns_as_arrays` method in a loop, it is much more efficient to use a special batch-access method. This will significantly reduce the number of round trips to the HTTP metadata service, greatly speeding up the operation:\n```Python\nmulti_dobjs = data.batch_get_columns_as_arrays(filtered_atc_co2, ['TIMESTAMP', 'co2'])\n```\nwhere `filtered_atc_co2` is a either a list from the metadata examples above, or just a list of plain data object URI IDs. The returned value is a generator of pairs, where first value is the basic data object metadata (or just a plain URI id, depending on what was used as the argument), and the second value is the same as the return value from `get_columns_as_arrays` method (a dictionary of numpy arrays, with variable names as keys)\n\nIf it is desirable to convert the data to pandas `DataFrame`s, it can be done like so:\n\n```Python\nimport pandas as pd\nmulti_df = ( (dobj, pd.DataFrame(arrs)) for dobj, arrs in multi_dobjs)\n```\n\n### CSV representation access\nThe data server offers (partial) CSV representations for fully-supported time series datasets. That service can be used from this library as follows:\n```Python\nimport pandas as pd\ncsv_stream = data.get_csv_byte_stream(dobj_uri)\ndf = pd.read_csv(csv_stream)\n```\nbut using `get_columns_as_arrays` and `batch_get_columns_as_arrays` is to be preferred for performance reasons, especially on ICOS Jupyter instances. Authentication is always required to use this method.\n\n## Advanced metadata access (SPARQL)\n\nFor general metadata enquiries not offered by the API explicitly, it is often possible to design a SPARQL query that would provide the required information. The query can be run with `sparql_select` method of `MetadataClient`, and the output of the latter can be parsed using \"`as_<rdf_datatype>`\"-named methods in `icoscp_core.sparql` module. For example:\n\n```Python\nfrom icoscp_core.icos import meta\nfrom icoscp_core.sparql import as_string, as_uri\n\nquery = \"\"\"prefix cpmeta: <http://meta.icos-cp.eu/ontologies/cpmeta/>\n\tselect *\n\tfrom <http://meta.icos-cp.eu/documents/>\n\twhere{\n\t\t?doc a cpmeta:DocumentObject .\n\t\tFILTER NOT EXISTS {[] cpmeta:isNextVersionOf ?doc}\n\t\t?doc cpmeta:hasDoi ?doi .\n\t\t?doc cpmeta:hasName ?filename .\n\t}\"\"\"\nlatest_docs_with_dois = [\n\t{\n\t\t\"uri\": as_uri(\"doc\", row),\n\t\t\"filename\": as_string(\"filename\", row),\n\t\t\"doi\": as_string(\"doi\", row)\n\t}\n\tfor row in meta.sparql_select(query).bindings\n]\n```",
    "bugtrack_url": null,
    "license": null,
    "summary": "icoscp_core",
    "version": "0.3.3",
    "project_urls": {
        "Source": "https://github.com/ICOS-Carbon-Portal/data/tree/master/src/main/python/icoscp_core"
    },
    "split_keywords": [
        "environment",
        " research",
        " infrastructure",
        " data access"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0601069d8b562117883a23a02eb3810fa1c5cedc59d214c6ba86b972f7a85d97",
                "md5": "eb0b6c2a4c2740c45fe0df0d301b0f93",
                "sha256": "04dcda9ed7906ae30a5ebb86863f97bbf71c1c3188b2c3d877a52d11edefbfa9"
            },
            "downloads": -1,
            "filename": "icoscp_core-0.3.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eb0b6c2a4c2740c45fe0df0d301b0f93",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 45845,
            "upload_time": "2024-03-20T14:40:44",
            "upload_time_iso_8601": "2024-03-20T14:40:44.306130Z",
            "url": "https://files.pythonhosted.org/packages/06/01/069d8b562117883a23a02eb3810fa1c5cedc59d214c6ba86b972f7a85d97/icoscp_core-0.3.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2b9ad8bdfa6c05a50e71b174e279c38a99a65c3d6d28bef1f3e37715fbc11d70",
                "md5": "b7b8157b961282b68ed57d2ec4ae800d",
                "sha256": "cbaef9bade8797dc3643752f971a0d06c9270a51804a43b58fcc84ebc6956ed7"
            },
            "downloads": -1,
            "filename": "icoscp_core-0.3.3.tar.gz",
            "has_sig": false,
            "md5_digest": "b7b8157b961282b68ed57d2ec4ae800d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 43738,
            "upload_time": "2024-03-20T14:40:46",
            "upload_time_iso_8601": "2024-03-20T14:40:46.152221Z",
            "url": "https://files.pythonhosted.org/packages/2b/9a/d8bdfa6c05a50e71b174e279c38a99a65c3d6d28bef1f3e37715fbc11d70/icoscp_core-0.3.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-20 14:40:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ICOS-Carbon-Portal",
    "github_project": "data",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "icoscp_core"
}
        
Elapsed time: 0.21623s