MetaArray
=========
MetaArray is a class that extends ndarray, adding support for per-axis metadata storage. This class is useful for
storing data arrays along with units, axis names, column names, axis values, etc. MetaArray objects can be indexed and
sliced arbitrarily using named axes and columns.
Justification
-------------
Consider data in the following shape:
![3x5x3 cube. X: Signal(Voltage 0, Voltage 1, Current 0). Y: Time(0.0-0.5). Z: Trial(0-2)](https://raw.githubusercontent.com/outofculture/metaarray/main/example.png "3 signals across time and trial")
Notice that each axis has a name and can store different types of meta information:
* The Signal axis has named columns with different units for each column
* The Time axis associates a numerical value with each row
* The Trial axis uses normal integer indexes
Data from this array would best be accessed variously using those names:
```python
initial_v1s = data[:, "Voltage 1", 0]
trial1_v0 = data["Trial": 1, "Signal": "Voltage 0"]
time3_to_7 = data["Time": slice(3, 7)]
```
Features
--------
* Per axis meta-information:
* Named axes
* Numerical values with units (e.g., "Time" axis above)
* Column names/units (e.g., "Signal" axis above)
* Indexing by name:
* Index each axis by name, so there is no need to remember order of axes
* Within an axis, index each column by name, so there is no need to remember the order of columns
* Read/write files easily (in HDF5 format)
* Append, extend, and sort convenience functions
Documentation
-------------
### Installation
`pip install MetaArray`
### Instantiation
Accepted Syntaxes:
```python
# Constructs MetaArray from a preexisting ndarray with the provided info
MetaArray(ndarray, info)
# Constructs MetaArray from file written using MetaArray.write()
MetaArray(file='fileName')
```
`info` parameter: This parameter specifies the entire set of metadata for this MetaArray and must follow a specific
format. First, info is a list of axis descriptions:`
```python
info = [axis1, axis2, axis3, ...]
```
Each axis description is a dict which may contain:
* "name": the name of the axis
* "values": a list or 1D ndarray of values, one per index in the axis
* "cols": a list of column descriptions `[col1, col2, col3, ...]`
* "units": the units associated with the numbers listed in "values"
All of these parameters are optional. A column description, likewise, is a dict which may contain:
* "name": the name of the column
* "units": the units for all values under this column
In the case where meta information is to apply to the entire array, (for example, if the entire array uses the same
units) simply add an extra axis description to the end of the info list. All dicts may contain any extra information you
want.
For example, the data set depicted above would look like:
```python
MetaArray((3, 6, 3), dtype=float, info=[
{"name": "Signal", "cols": [
{"name": "Voltage 0", "units": "V"},
{"name": "Voltage 1", "units": "V"},
{"name": "Current 0", "units": "A"}
]
},
{"name": "Time", "units": "msec", "values": [0.0, 0.1, 0.2, 0.3, 0.4, 0.5]},
{"name": "Trial"},
{"note": "Just some extra info"}
]
```
### Accessing Data
Data can be accessed through a variety of methods:
* Standard indexing -- You may always just index the array exactly as you would any ndarray
* Named axes -- If you don't remember the order of axes, you may specify the axis to be indexed or sliced like this:
```python
data["AxisName": index]
data["AxisName": slice(...)]
```
Note that since this syntax hijacks the original slice mechanism, you must specify a slice using slice() if you want to
use named axes.
* Column selection--If you don't remember the index of a column you wish to select, you may substitute the column's name
for the index number. Lists of column names are also acceptable. For example:
```python
data["AxisName": "ColumnName"]
data["ColumnName"] # Works only if the named column exists for this axis
data[["ColumnName1", "ColumnName2"]]
```
* Boolean selection--works as you might normally expect, for example:
```python
sel = data["ColumnName", 0, 0] > 0.2
data[sel]
```
* Access axis values using MetaArray.axisValues(), or .xvals() for short.
* Access axis units using .axisUnits(), column units using .columnUnits()
* Access any other parameter directly through the info list with .infoCopy()
### File I/O
```python
data.write('fileName')
newData = MetaArray(file='fileName')
```
### Performance Tips
MetaArray is a subclass of ndarray which overrides the `__getitem__` and `__setitem__` methods. Since these methods must
alter the structure of the meta information for each access, they are quite slow compared to the native methods. As a
result, many builtin functions will run very slowly when operating on a MetaArray. It is recommended, therefore, that
you recast your arrays before performing these operations like this:
```python
data = MetaArray(...)
data.mean() # very slow
data.view(ndarray).mean() # native speed
```
### More Examples
A 2D array of altitude values for a topographical map might look like
```python
info = [
{'name': 'lat', 'title': 'Latitude'},
{'name': 'lon', 'title': 'Longitude'},
{'title': 'Altitude', 'units': 'm'}
]
```
In this case, every value in the array represents the altitude in feet at the lat, lon position represented by the array
index. All of the following return the value at lat=10, lon=5:
```python
array[10, 5]
array['lon': 5, 'lat': 10]
array['lat': 10][5]
```
Now suppose we want to combine this data with another array of equal dimensions that represents the average rainfall for
each location. We could easily store these as two separate arrays or combine them into a 3D array with this description:
```python
info = [
{
'name': 'vals',
'cols': [
{'name': 'altitude', 'units': 'm'},
{'name': 'rainfall', 'units': 'cm/year'},
],
},
{'name': 'lat', 'title': 'Latitude'},
{'name': 'lon', 'title': 'Longitude'},
]
```
We can now access the altitude values with `array[0]` or `array['altitude']`, and the rainfall values with `array[1]`
or `array['rainfall']`. All of the following return the rainfall value at lat=10, lon=5:
```python
array[1, 10, 5]
array['lon': 5, 'lat': 10, 'val': 'rainfall']
array['rainfall', 'lon': 5, 'lat': 10]
```
Notice that in the second example, there is no need for an extra (4th) axis description since the actual values are
described (name and units) in the column info for the first axis.
Contact
---------
Luke Campagnola - `[firstname][lastname]@gmail.com`
Changelog
---------
### 2.1.1
Fix writeable HDF5.
#### 2.1.0
Force hdf5 format when writing unless USE_HDF5 is explicitly set to False.
#### 2.0.3
Fixes install dependency (thanks @spahlimi).
#### 2.0.0
Initial independent release.
Raw data
{
"_id": null,
"home_page": "https://github.com/outofculture/metaarray",
"name": "MetaArray",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": null,
"author": "Luke Campagnola",
"author_email": "luke.campagnola@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e0/5a/286a9dbbe546e8bc0db8edbc940bd277edab24d3138100051fef2c4341ae/MetaArray-2.1.1.tar.gz",
"platform": null,
"description": "MetaArray\n=========\n\nMetaArray is a class that extends ndarray, adding support for per-axis metadata storage. This class is useful for\nstoring data arrays along with units, axis names, column names, axis values, etc. MetaArray objects can be indexed and\nsliced arbitrarily using named axes and columns.\n\nJustification\n-------------\n\nConsider data in the following shape:\n\n![3x5x3 cube. X: Signal(Voltage 0, Voltage 1, Current 0). Y: Time(0.0-0.5). Z: Trial(0-2)](https://raw.githubusercontent.com/outofculture/metaarray/main/example.png \"3 signals across time and trial\")\n\nNotice that each axis has a name and can store different types of meta information:\n\n* The Signal axis has named columns with different units for each column\n* The Time axis associates a numerical value with each row\n* The Trial axis uses normal integer indexes\n\nData from this array would best be accessed variously using those names:\n\n```python\ninitial_v1s = data[:, \"Voltage 1\", 0]\ntrial1_v0 = data[\"Trial\": 1, \"Signal\": \"Voltage 0\"]\ntime3_to_7 = data[\"Time\": slice(3, 7)]\n```\n\nFeatures\n--------\n\n* Per axis meta-information:\n * Named axes\n * Numerical values with units (e.g., \"Time\" axis above)\n * Column names/units (e.g., \"Signal\" axis above)\n* Indexing by name:\n * Index each axis by name, so there is no need to remember order of axes\n * Within an axis, index each column by name, so there is no need to remember the order of columns\n* Read/write files easily (in HDF5 format)\n* Append, extend, and sort convenience functions\n\nDocumentation\n-------------\n\n### Installation\n\n`pip install MetaArray`\n\n### Instantiation\n\nAccepted Syntaxes:\n\n```python\n# Constructs MetaArray from a preexisting ndarray with the provided info\nMetaArray(ndarray, info)\n\n# Constructs MetaArray from file written using MetaArray.write()\nMetaArray(file='fileName')\n```\n\n`info` parameter: This parameter specifies the entire set of metadata for this MetaArray and must follow a specific\nformat. First, info is a list of axis descriptions:`\n\n```python\ninfo = [axis1, axis2, axis3, ...]\n```\n\nEach axis description is a dict which may contain:\n\n* \"name\": the name of the axis\n* \"values\": a list or 1D ndarray of values, one per index in the axis\n* \"cols\": a list of column descriptions `[col1, col2, col3, ...]`\n* \"units\": the units associated with the numbers listed in \"values\"\n\nAll of these parameters are optional. A column description, likewise, is a dict which may contain:\n\n* \"name\": the name of the column\n* \"units\": the units for all values under this column\n\nIn the case where meta information is to apply to the entire array, (for example, if the entire array uses the same\nunits) simply add an extra axis description to the end of the info list. All dicts may contain any extra information you\nwant.\n\nFor example, the data set depicted above would look like:\n\n```python\nMetaArray((3, 6, 3), dtype=float, info=[\n {\"name\": \"Signal\", \"cols\": [\n {\"name\": \"Voltage 0\", \"units\": \"V\"},\n {\"name\": \"Voltage 1\", \"units\": \"V\"},\n {\"name\": \"Current 0\", \"units\": \"A\"}\n ]\n },\n {\"name\": \"Time\", \"units\": \"msec\", \"values\": [0.0, 0.1, 0.2, 0.3, 0.4, 0.5]},\n {\"name\": \"Trial\"},\n {\"note\": \"Just some extra info\"}\n]\n```\n\n### Accessing Data\n\nData can be accessed through a variety of methods:\n\n* Standard indexing -- You may always just index the array exactly as you would any ndarray\n* Named axes -- If you don't remember the order of axes, you may specify the axis to be indexed or sliced like this:\n\n```python\ndata[\"AxisName\": index]\ndata[\"AxisName\": slice(...)]\n```\n\nNote that since this syntax hijacks the original slice mechanism, you must specify a slice using slice() if you want to\nuse named axes.\n\n* Column selection--If you don't remember the index of a column you wish to select, you may substitute the column's name\n for the index number. Lists of column names are also acceptable. For example:\n\n```python\ndata[\"AxisName\": \"ColumnName\"]\ndata[\"ColumnName\"] # Works only if the named column exists for this axis\ndata[[\"ColumnName1\", \"ColumnName2\"]]\n```\n\n* Boolean selection--works as you might normally expect, for example:\n\n```python\nsel = data[\"ColumnName\", 0, 0] > 0.2\ndata[sel]\n```\n\n* Access axis values using MetaArray.axisValues(), or .xvals() for short.\n* Access axis units using .axisUnits(), column units using .columnUnits()\n* Access any other parameter directly through the info list with .infoCopy()\n\n### File I/O\n\n```python\ndata.write('fileName')\nnewData = MetaArray(file='fileName')\n```\n\n### Performance Tips\n\nMetaArray is a subclass of ndarray which overrides the `__getitem__` and `__setitem__` methods. Since these methods must\nalter the structure of the meta information for each access, they are quite slow compared to the native methods. As a\nresult, many builtin functions will run very slowly when operating on a MetaArray. It is recommended, therefore, that\nyou recast your arrays before performing these operations like this:\n\n```python\ndata = MetaArray(...)\ndata.mean() # very slow\ndata.view(ndarray).mean() # native speed\n```\n\n### More Examples\n\nA 2D array of altitude values for a topographical map might look like\n\n```python\ninfo = [\n {'name': 'lat', 'title': 'Latitude'},\n {'name': 'lon', 'title': 'Longitude'},\n {'title': 'Altitude', 'units': 'm'}\n]\n```\n\nIn this case, every value in the array represents the altitude in feet at the lat, lon position represented by the array\nindex. All of the following return the value at lat=10, lon=5:\n\n```python\narray[10, 5]\narray['lon': 5, 'lat': 10]\narray['lat': 10][5]\n```\n\nNow suppose we want to combine this data with another array of equal dimensions that represents the average rainfall for\neach location. We could easily store these as two separate arrays or combine them into a 3D array with this description:\n\n```python\ninfo = [\n {\n 'name': 'vals',\n 'cols': [\n {'name': 'altitude', 'units': 'm'},\n {'name': 'rainfall', 'units': 'cm/year'},\n ],\n },\n {'name': 'lat', 'title': 'Latitude'},\n {'name': 'lon', 'title': 'Longitude'},\n]\n```\n\nWe can now access the altitude values with `array[0]` or `array['altitude']`, and the rainfall values with `array[1]`\nor `array['rainfall']`. All of the following return the rainfall value at lat=10, lon=5:\n\n```python\narray[1, 10, 5]\narray['lon': 5, 'lat': 10, 'val': 'rainfall']\narray['rainfall', 'lon': 5, 'lat': 10]\n```\n\nNotice that in the second example, there is no need for an extra (4th) axis description since the actual values are\ndescribed (name and units) in the column info for the first axis.\n\nContact\n---------\nLuke Campagnola - `[firstname][lastname]@gmail.com`\n\n\nChangelog\n---------\n\n### 2.1.1\nFix writeable HDF5.\n\n#### 2.1.0\nForce hdf5 format when writing unless USE_HDF5 is explicitly set to False.\n\n#### 2.0.3\nFixes install dependency (thanks @spahlimi).\n\n#### 2.0.0\nInitial independent release.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "N-dimensional array with metadata such as axis titles, units, and column names.",
"version": "2.1.1",
"project_urls": {
"Homepage": "https://github.com/outofculture/metaarray"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e05a286a9dbbe546e8bc0db8edbc940bd277edab24d3138100051fef2c4341ae",
"md5": "66e1af793e74894ba6c69b260e258d09",
"sha256": "4d8b06902625afcd7bdda558a41d47f9b9df6bfb8b1e37ba84c897769050537a"
},
"downloads": -1,
"filename": "MetaArray-2.1.1.tar.gz",
"has_sig": false,
"md5_digest": "66e1af793e74894ba6c69b260e258d09",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 51903,
"upload_time": "2024-09-10T15:50:48",
"upload_time_iso_8601": "2024-09-10T15:50:48.049246Z",
"url": "https://files.pythonhosted.org/packages/e0/5a/286a9dbbe546e8bc0db8edbc940bd277edab24d3138100051fef2c4341ae/MetaArray-2.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-10 15:50:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "outofculture",
"github_project": "metaarray",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "metaarray"
}