# `tsg-xr`: A tool for loading TSG datasets into Xarray
The file format associated with [The Spectral Geologistâ„¢](https://research.csiro.au/thespectralgeologist/)
(and specifically [Hyloggerâ„¢](https://corescan.com.au/products/hylogger/) datasets which
have been processed with the software) consists of an ensemble of files:
* Binary data files containing spectra, high resolutoin imagery and profilometer data
* Configuration files (principally text, similar in format to TOML)
* Low resolution core imagery exports (hole overview, per-tray imagery; as JPEG images with associated markup)
`tsg-xr` heavily leverages the filereader of [`pytsg`](https://github.com/FractalGeoAnalytics/pytsg) to
provide access to these data, and presents data in an [Xarray](xarray.pydata.org) format to condense the
otherwise complex arrangement. Here `pytsg` provides an efficient interface to the
binary components of the TSG file format, and `tsg-xr` is largely just arranging this into a condensed
data structure which allows easier subseqent use (and serialization to indexable formats, e.g.
[Zarr](https://zarr.readthedocs.io)).
## Usage
`tsg-xr` is intended to be used to read directories containing ensembles of TSG files;
to do so just point the `load_tsg` funnction at the appropriate directory:
```python
from tsgxr import load_tsg
ds = load_tsg("./Hylogger_Hole_42")
```
---
Key array-based data can be accessed directly from this `xarray.Dataset` object:
```python
ds.Spectra
ds.Image
ds.Lidar
```
For example, to extract and plot the first metre of core imagery:
```python
import matplotlib.pyplot as plt
ds.Image.sel(depth=slice(0, 1)).plot.imshow(yincrease=False)
plt.gca().set(aspect="equal"); # fix the aspect ratio
```
Similarly, to plot the spectra from a specific interval (e.g. 9.2 to 9.3m here) against wavelength, you can provide a slice to the `xarray.DataArray.sel` method (note here the `holedepth` coordinate which is associated with spectral samples, as opposed to the `depth` coordinate assocaited with RGB imagery - they are thus far separate indexes):
```python
ds.Spectra.sel(holedepth=slice(9.2, 9.3)).plot.scatter(x='wavelength', add_legend=False, color='k', alpha=0.5, s=2)
```
---
Scalars and other spectral features are also available; spectral feature (centre, depth, width) data is grouped
for brevity:
```python
ds.Centres
ds.Depths
ds.Widths
ds["Grp1 sTSAS"]
...
ds["Min1 sTSAS"]
...
ds["Wt1 sTSAS"]
...
```
---
Configuration related to integer-encoding of sample data is also included in the dataset attributes:
```python
ds.attrs
````
## Installation
The `tsg-xr` pacakge can be installed standalone into your local environment using `pip`, or you can create an
environment with related dependencies using Anaconda (useful for a development scenario, or if you're only using
the tool for a singular project).
**Option 1: Standalone Installation**
The package is also directly installable from GitHub using `pip` with:
```bash
pip install git+https://github.com/CSIRO-GeoscienceAnalytics/tsg-xr
```
**Option 2: Setup an Environment**
An `environment.yml` file is included in this repository, allowing the creation of a `conda` environment
where an Anaconda distribution of some form is used. After cloning this repository and navigating to this
directory, the environment can be created as follows:
```bash
conda env create -f environment.yml
```
Alternatively, if you have `mamba` installed locally (encouraged), you can get there faster with:
```bash
mamba env create -f environment.yml
```
## Command Line Interface
### Converting TSG files to Zarr
A minimal command line interface exists for converstion of TSG files to Zarr archives. A selection of configuration options are avialable from the commandline, which can be found under the help menu:
```bash
tsgxr tsg2zarr --help
```
Basic usage is as follows, where `<Path>` refers to either i) an individual TSG scalars file (`.tsg`), ii) a Hylogger TSG directory, or iii) a directory containing multiple Hylogger TSG directories (multiple datasets can be converted simultaneously):
```bash
tsgxr tsg2zarr <Path>
```
Outputs are by default added to the Hylogger TSG directories themselves, but can be optionally collated into a separate directory; outputs will use the hole name extracted from the TSG dataset and be specific to the spectra specified (NIR or TIR):
```bash
tsgxr tsg2zarr <Path> --output_dir "./collated_zarr_archives/"
```
Note that by default, this will create zipped Zarr archives. These can be directly opened in e.g. Xarray.
Raw data
{
"_id": null,
"home_page": "https://github.com/CSIRO-GeoscienceAnalytics/tsg-xr",
"name": "tsgxr",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "TSG,spectra",
"author": "Morgan Williams",
"author_email": "morgan.williams@csiro.au",
"download_url": "https://files.pythonhosted.org/packages/32/b8/a3a9b405f1b54314ff20e6e97c531c334934968fc49aa2309e9333e73fdc/tsgxr-0.2.5.tar.gz",
"platform": null,
"description": "# `tsg-xr`: A tool for loading TSG datasets into Xarray\r\n\r\nThe file format associated with [The Spectral Geologist\u00e2\u201e\u00a2](https://research.csiro.au/thespectralgeologist/) \r\n(and specifically [Hylogger\u00e2\u201e\u00a2](https://corescan.com.au/products/hylogger/) datasets which\r\nhave been processed with the software) consists of an ensemble of files:\r\n* Binary data files containing spectra, high resolutoin imagery and profilometer data\r\n* Configuration files (principally text, similar in format to TOML)\r\n* Low resolution core imagery exports (hole overview, per-tray imagery; as JPEG images with associated markup)\r\n\r\n`tsg-xr` heavily leverages the filereader of [`pytsg`](https://github.com/FractalGeoAnalytics/pytsg) to \r\nprovide access to these data, and presents data in an [Xarray](xarray.pydata.org) format to condense the \r\notherwise complex arrangement. Here `pytsg` provides an efficient interface to the \r\nbinary components of the TSG file format, and `tsg-xr` is largely just arranging this into a condensed \r\ndata structure which allows easier subseqent use (and serialization to indexable formats, e.g. \r\n[Zarr](https://zarr.readthedocs.io)).\r\n\r\n## Usage\r\n\r\n`tsg-xr` is intended to be used to read directories containing ensembles of TSG files; \r\nto do so just point the `load_tsg` funnction at the appropriate directory:\r\n```python\r\nfrom tsgxr import load_tsg\r\n\r\nds = load_tsg(\"./Hylogger_Hole_42\")\r\n```\r\n---\r\nKey array-based data can be accessed directly from this `xarray.Dataset` object:\r\n```python\r\nds.Spectra\r\nds.Image\r\nds.Lidar\r\n```\r\n\r\nFor example, to extract and plot the first metre of core imagery:\r\n```python\r\nimport matplotlib.pyplot as plt \t\t\t\t\t\t\t\t\r\nds.Image.sel(depth=slice(0, 1)).plot.imshow(yincrease=False)\r\nplt.gca().set(aspect=\"equal\"); # fix the aspect ratio\r\n```\r\n\r\nSimilarly, to plot the spectra from a specific interval (e.g. 9.2 to 9.3m here) against wavelength, you can provide a slice to the `xarray.DataArray.sel` method (note here the `holedepth` coordinate which is associated with spectral samples, as opposed to the `depth` coordinate assocaited with RGB imagery - they are thus far separate indexes):\r\n```python\r\nds.Spectra.sel(holedepth=slice(9.2, 9.3)).plot.scatter(x='wavelength', add_legend=False, color='k', alpha=0.5, s=2)\r\n```\r\n\r\n---\r\n\r\nScalars and other spectral features are also available; spectral feature (centre, depth, width) data is grouped \r\nfor brevity:\r\n```python\r\nds.Centres\r\nds.Depths\r\nds.Widths\r\nds[\"Grp1 sTSAS\"]\r\n...\r\nds[\"Min1 sTSAS\"]\r\n...\r\nds[\"Wt1 sTSAS\"]\r\n...\r\n```\r\n---\r\nConfiguration related to integer-encoding of sample data is also included in the dataset attributes:\r\n```python\r\nds.attrs\r\n````\r\n\r\n## Installation \r\n\r\nThe `tsg-xr` pacakge can be installed standalone into your local environment using `pip`, or you can create an \r\nenvironment with related dependencies using Anaconda (useful for a development scenario, or if you're only using\r\nthe tool for a singular project).\r\n\r\n**Option 1: Standalone Installation**\r\n\r\nThe package is also directly installable from GitHub using `pip` with:\r\n```bash\r\npip install git+https://github.com/CSIRO-GeoscienceAnalytics/tsg-xr\r\n```\r\n\r\n**Option 2: Setup an Environment**\r\n\r\nAn `environment.yml` file is included in this repository, allowing the creation of a `conda` environment \r\nwhere an Anaconda distribution of some form is used. After cloning this repository and navigating to this \r\ndirectory, the environment can be created as follows:\r\n\r\n```bash\r\nconda env create -f environment.yml\r\n```\r\n\r\nAlternatively, if you have `mamba` installed locally (encouraged), you can get there faster with:\r\n```bash\r\nmamba env create -f environment.yml\r\n```\r\n\r\n## Command Line Interface\r\n\r\n### Converting TSG files to Zarr\r\n\r\nA minimal command line interface exists for converstion of TSG files to Zarr archives. A selection of configuration options are avialable from the commandline, which can be found under the help menu:\r\n```bash\r\ntsgxr tsg2zarr --help\r\n```\r\nBasic usage is as follows, where `<Path>` refers to either i) an individual TSG scalars file (`.tsg`), ii) a Hylogger TSG directory, or iii) a directory containing multiple Hylogger TSG directories (multiple datasets can be converted simultaneously):\r\n```bash\r\ntsgxr tsg2zarr <Path>\r\n```\r\nOutputs are by default added to the Hylogger TSG directories themselves, but can be optionally collated into a separate directory; outputs will use the hole name extracted from the TSG dataset and be specific to the spectra specified (NIR or TIR):\r\n```bash\r\ntsgxr tsg2zarr <Path> --output_dir \"./collated_zarr_archives/\"\r\n```\r\nNote that by default, this will create zipped Zarr archives. These can be directly opened in e.g. Xarray.\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A tool for reading TSG spectra data into Xarray.",
"version": "0.2.5",
"project_urls": {
"Code": "https://github.com/CSIRO-GeoscienceAnalytics/tsg-xr",
"Homepage": "https://github.com/CSIRO-GeoscienceAnalytics/tsg-xr",
"Issue tracker": "https://github.com/CSIRO-GeoscienceAnalytics/tsg-xr/issues"
},
"split_keywords": [
"tsg",
"spectra"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c5c01ae3cf54d84762271b63bfae2dac28e78d08c732a8f5906a44deab52e46d",
"md5": "cf47dbe44674bf786a3f42eb4af2139d",
"sha256": "8065076151bb737ff3d58f07cc32387b55f1c8074699b9149e9f630f1bdd0c21"
},
"downloads": -1,
"filename": "tsgxr-0.2.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cf47dbe44674bf786a3f42eb4af2139d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 10985,
"upload_time": "2024-03-18T02:52:08",
"upload_time_iso_8601": "2024-03-18T02:52:08.011002Z",
"url": "https://files.pythonhosted.org/packages/c5/c0/1ae3cf54d84762271b63bfae2dac28e78d08c732a8f5906a44deab52e46d/tsgxr-0.2.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "32b8a3a9b405f1b54314ff20e6e97c531c334934968fc49aa2309e9333e73fdc",
"md5": "4b85a6c3f1d8c1c5e1997c7cec983062",
"sha256": "fa886c2eef324927d50df2857fd9431b5a3f53e22b263b20c47e0692d1ed52c0"
},
"downloads": -1,
"filename": "tsgxr-0.2.5.tar.gz",
"has_sig": false,
"md5_digest": "4b85a6c3f1d8c1c5e1997c7cec983062",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 30663,
"upload_time": "2024-03-18T02:52:11",
"upload_time_iso_8601": "2024-03-18T02:52:11.367976Z",
"url": "https://files.pythonhosted.org/packages/32/b8/a3a9b405f1b54314ff20e6e97c531c334934968fc49aa2309e9333e73fdc/tsgxr-0.2.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-18 02:52:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "CSIRO-GeoscienceAnalytics",
"github_project": "tsg-xr",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "tsgxr"
}