katdal
======
This package serves as a data access library to interact with the chunk stores
and HDF5 files produced by the MeerKAT radio telescope and its predecessors
(KAT-7 and Fringe Finder), which are collectively known as *MeerKAT Visibility
Format (MVF)* data sets. It uses memory carefully, allowing data sets to be
inspected and partially loaded into memory. Data sets may be concatenated and
split via a flexible selection mechanism. In addition, it provides a script to
convert these data sets to CASA MeasurementSets.
Quick Tutorial
--------------
Open any data set through a single function to obtain a data set object:
.. code:: python
import katdal
d = katdal.open('1234567890.h5')
The ``open`` function automatically determines the version and storage location
of the data set. The versions roughly map to the various instruments::
- v1 : Fringe Finder (HDF5 file)
- v2 : KAT-7 (HDF5 file)
- v3 : MeerKAT (HDF5 file)
- v4 : MeerKAT (RDB file + chunk store based on objects in Ceph)
Each MVFv4 data set is split into a Redis dump (aka *RDB*) file containing the
metadata in the form of a *telescope state* database, and a *chunk store*
containing the visibility data split into many small blocks or chunks (typically
served by a Ceph object store over the network). The RDB file is the main entry
point to the data set and it can be accessed directly from the MeerKAT SDP
archive if you have the appropriate permissions:
.. code:: python
# This is just for illustration - the real URL looks a bit different
d = katdal.open('https://archive/1234567890/1234567890_sdp_l0.rdb?token=AsD3')
Multiple data sets (even of different versions) may also be concatenated
together (as long as they have the same dump rate):
.. code:: python
d = katdal.open(['1234567890.h5', '1234567891.h5'])
Inspect the contents of the data set by printing the object:
.. code:: python
print(d)
Here is a typical output::
===============================================================================
Name: 1313067732.h5 (version 2.0)
===============================================================================
Observer: someone Experiment ID: 2118d346-c41a-11e0-b2df-a4badb44fe9f
Description: 'Track on Hyd A,Vir A, 3C 286 and 3C 273'
Observed from 2011-08-11 15:02:14.072 SAST to 2011-08-11 15:19:47.810 SAST
Dump rate: 1.00025 Hz
Subarrays: 1
ID Antennas Inputs Corrprods
0 ant1,ant2,ant3,ant4,ant5,ant6,ant7 14 112
Spectral Windows: 1
ID CentreFreq(MHz) Bandwidth(MHz) Channels ChannelWidth(kHz)
0 1822.000 400.000 1024 390.625
-------------------------------------------------------------------------------
Data selected according to the following criteria:
subarray=0
ants=['ant1', 'ant2', 'ant3', 'ant4', 'ant5', 'ant6', 'ant7']
spw=0
-------------------------------------------------------------------------------
Shape: (1054 dumps, 1024 channels, 112 correlation products) => Size: 967.049 MB
Antennas: *ant1,ant2,ant3,ant4,ant5,ant6,ant7 Inputs: 14 Autocorr: yes Crosscorr: yes
Channels: 1024 (index 0 - 1023, 2021.805 MHz - 1622.195 MHz), each 390.625 kHz wide
Targets: 4 selected out of 4 in catalogue
ID Name Type RA(J2000) DEC(J2000) Tags Dumps ModelFlux(Jy)
0 Hyd A radec 9:18:05.28 -12:05:48.9 333 33.63
1 Vir A radec 12:30:49.42 12:23:28.0 251 166.50
2 3C 286 radec 13:31:08.29 30:30:33.0 230 12.97
3 3C 273 radec 12:29:06.70 2:03:08.6 240 39.96
Scans: 8 selected out of 8 total Compscans: 1 selected out of 1 total
Date Timerange(UTC) ScanState CompScanLabel Dumps Target
11-Aug-2011/13:02:14 - 13:04:26 0:slew 0: 133 0:Hyd A
13:04:27 - 13:07:46 1:track 0: 200 0:Hyd A
13:07:47 - 13:08:37 2:slew 0: 51 1:Vir A
13:08:38 - 13:11:57 3:track 0: 200 1:Vir A
13:11:58 - 13:12:27 4:slew 0: 30 2:3C 286
13:12:28 - 13:15:47 5:track 0: 200 2:3C 286
13:15:48 - 13:16:27 6:slew 0: 40 3:3C 273
13:16:28 - 13:19:47 7:track 0: 200 3:3C 273
The first segment of the printout displays the static information of the data
set, including observer, dump rate and all the available subarrays and spectral
windows in the data set. The second segment (between the dashed lines) highlights
the active selection criteria. The last segment displays dynamic information
that is influenced by the selection, including the overall visibility array
shape, antennas, channel frequencies, targets and scan info.
The data set is built around the concept of a three-dimensional visibility array
with dimensions of time, frequency and correlation product. This is reflected in
the *shape* of the dataset:
.. code:: python
d.shape
which returns ``(1054, 1024, 112)``, meaning 1054 dumps by 1024 channels by 112
correlation products.
Let's select a subset of the data set:
.. code:: python
d.select(scans='track', channels=slice(200, 300), ants='ant4')
print(d)
This results in the following printout::
===============================================================================
Name: /Users/schwardt/Downloads/1313067732.h5 (version 2.0)
===============================================================================
Observer: siphelele Experiment ID: 2118d346-c41a-11e0-b2df-a4badb44fe9f
Description: 'track on Hyd A,Vir A, 3C 286 and 3C 273 for Lud'
Observed from 2011-08-11 15:02:14.072 SAST to 2011-08-11 15:19:47.810 SAST
Dump rate: 1.00025 Hz
Subarrays: 1
ID Antennas Inputs Corrprods
0 ant1,ant2,ant3,ant4,ant5,ant6,ant7 14 112
Spectral Windows: 1
ID CentreFreq(MHz) Bandwidth(MHz) Channels ChannelWidth(kHz)
0 1822.000 400.000 1024 390.625
-------------------------------------------------------------------------------
Data selected according to the following criteria:
channels=slice(200, 300, None)
subarray=0
scans='track'
ants='ant4'
spw=0
-------------------------------------------------------------------------------
Shape: (800 dumps, 100 channels, 4 correlation products) => Size: 2.560 MB
Antennas: ant4 Inputs: 2 Autocorr: yes Crosscorr: no
Channels: 100 (index 200 - 299, 1943.680 MHz - 1905.008 MHz), each 390.625 kHz wide
Targets: 4 selected out of 4 in catalogue
ID Name Type RA(J2000) DEC(J2000) Tags Dumps ModelFlux(Jy)
0 Hyd A radec 9:18:05.28 -12:05:48.9 200 31.83
1 Vir A radec 12:30:49.42 12:23:28.0 200 159.06
2 3C 286 radec 13:31:08.29 30:30:33.0 200 12.61
3 3C 273 radec 12:29:06.70 2:03:08.6 200 39.32
Scans: 4 selected out of 8 total Compscans: 1 selected out of 1 total
Date Timerange(UTC) ScanState CompScanLabel Dumps Target
11-Aug-2011/13:04:27 - 13:07:46 1:track 0: 200 0:Hyd A
13:08:38 - 13:11:57 3:track 0: 200 1:Vir A
13:12:28 - 13:15:47 5:track 0: 200 2:3C 286
13:16:28 - 13:19:47 7:track 0: 200 3:3C 273
Compared to the first printout, the static information has remained the same
while the dynamic information now reflects the selected subset. There are many
possible selection criteria, as illustrated below:
.. code:: python
d.select(timerange=('2011-08-11 13:10:00', '2011-08-11 13:15:00'), targets=[1, 2])
d.select(spw=0, subarray=0)
d.select(ants='ant1,ant2', pol='H', scans=(0,1,2), freqrange=(1700e6, 1800e6))
See the docstring of ``DataSet.select`` for more detailed information (i.e.
do ``d.select?`` in IPython). Take note that only one subarray and one spectral
window must be selected.
Once a subset of the data has been selected, you can access the data and
timestamps on the data set object:
.. code:: python
vis = d.vis[:]
timestamps = d.timestamps[:]
Note the ``[:]`` indexing, as the ``vis`` and ``timestamps`` properties are
special ``LazyIndexer`` objects that only give you the actual data when
you use indexing, in order not to inadvertently load the entire array into memory.
For the example dataset and no selection the ``vis`` array will have a shape of
``(1054, 1024, 112)``. The time dimension is labelled by ``d.timestamps``, the
frequency dimension by ``d.channel_freqs`` and the correlation product dimension
by ``d.corr_products``.
Another key concept in the data set object is that of *sensors*. These are named
time series of arbitrary data that are either loaded from the data set
(*actual* sensors) or calculated on the fly (*virtual* sensors). Both variants
are accessed through the *sensor cache* (available as ``d.sensor``) and cached
there after the first access. The data set object also provides convenient
properties to expose commonly-used sensors, as shown in the plot example below:
.. code:: python
import matplotlib.pyplot as plt
plt.plot(d.az, d.el, 'o')
plt.xlabel('Azimuth (degrees)')
plt.ylabel('Elevation (degrees)')
Other useful attributes include ``ra``, ``dec``, ``lst``, ``mjd``, ``u``,
``v``, ``w``, ``target_x`` and ``target_y``. These are all one-dimensional
NumPy arrays that dynamically change length depending on the active selection.
As in katdal's predecessor (scape) there is a ``DataSet.scans`` generator
that allows you to step through the scans in the data set. It returns the
scan index, scan state and target object on each iteration, and updates
the active selection on the data set to include only the current scan.
It is also possible to iterate through the compound scans with the
``DataSet.compscans`` generator, which yields the compound scan index, label
and first target on each iteration for convenience. These two iterators may also
be used together to traverse the data set structure:
.. code:: python
for compscan, label, target in d.compscans():
plt.figure()
for scan, state, target in d.scans():
if state in ('scan', 'track'):
plt.plot(d.ra, d.dec, 'o')
plt.xlabel('Right ascension (J2000 degrees)')
plt.ylabel('Declination (J2000 degrees)')
plt.title(target.name)
Finally, all the targets (or fields) in the data set are stored in a catalogue
available at ``d.catalogue``, and the original HDF5 file is still accessible via
a back door installed at ``d.file`` in the case of a single-file data set (v3
or older). On a v4 data set, ``d.source`` provides access to the underlying
telstate for metadata and the chunk store for data.
History
=======
0.23 (2024-06-28)
-----------------
* New `mvf_download` script (also promote `mvf_copy` and remove junk) (#380)
* Select targets by their tags (#377)
* Rename `np.product` to support numpy >= 2.0 and make unit tests more robust (#372)
0.22 (2023-11-28)
-----------------
* Restore np.bool in Numba averaging function to prevent mvftoms crash (#370)
* Replace underscores with dashes when loading old buckets from RDBs (#370)
* Select multiple targets with same name to avoid dropped scans in MS (#369)
* Support on-the-fly (OTF) scans in mvftoms (#366)
0.21 (2023-05-12)
-----------------
* Fix support for numpy >= 1.24 and move unit tests from nose to pytest (#361)
* Complete rewrite of S3ChunkStore retries for more robust archive downloads (#363)
* Remove IMAGING_WEIGHT column full of zeroes from MS (#356)
* Improve tests with ES256-encoded JWT tokens and more robust MinIO health check (#360)
0.20.1 (2022-04-29)
-------------------
* Fix broken `dataset.vis[n]` due to DaskLazyIndexer / ChunkStore interaction (#355)
0.20 (2022-04-14)
-----------------
* Fix support for dask >= 2022.01.1 in ChunkStore (#351)
* Allow mvftoms to continue with partial MS after an interruption (#348)
* New mvf_copy.py script that can be used to extract autocorrelations only (#349)
* Treat Ceph 403 errors properly in S3ChunkStore (#352)
0.19 (2021-11-23)
-----------------
* Support scans and non-radec targets like planets in mvftoms (#333)
* Expose the raw flags of MVF4 datasets (#335)
* Expose CBF F-engine sensors: applied delays, phases and gains (#338)
* Verify that S3 bucket is not empty to detect datasets archived to tape (#344)
* Populate SIGMA_SPECTRUM and redo SIGMA and WEIGHT in mvftoms (#347)
* Have a sensible DataSet.name and also add a separate DataSet.url (#337)
* Allow deselection of antennas using '~m0XX' (#340)
* Allow nested DaskLazyIndexers (#336)
* Fix mvftoms on macOS and Python 3.8+ (#339)
0.18 (2021-04-20)
-----------------
* Switch to PyJWT 2 and Python 3.6, cleaning up Python 2 relics (#321 - #323)
* Allow preselection of channels and dumps upon katdal.open() to save time and memory (#324)
* Allow user to select fields, scans and antennas in mvftoms (#269)
* Support h5py 3.0 string handling in MVF3 (#331)
* Refactor requirement files to remove recursive dependencies (#329)
0.17 (2021-01-27)
-----------------
* This is the last release that will support Python 3.5
* Pin PyJWT version to 1.x to avoid breaking API changes (#320)
* Van Vleck correction! (autocorrelations only, though) (#316)
* Expose excision, aka raw weights (#308)
* Better unit testing of DataSource and S3ChunkStore in general (#319)
* Support indexed telstate keys (the 1000th cut that killed Python 2) (#304)
* Split out separate utility classes for Minio (#310)
* Fix filtering of sensor events with invalid status (#306)
0.16 (2020-08-28)
-----------------
* This is the last release that will support Python 2 (python2 maintenance branch)
* New 'time_offset' sensor property that adjusts timestamps of any sensor (#307)
* Fix calculation of cbf_dump_period for 'wide' / 'narrowN' instruments (#301)
* Increase katstore search window by 600 seconds to find infrequent updates (#302)
* Refactor SensorData to become a lazy abstract interface without caching (#292)
* Refactor SensorCache to use MutableMapping (#300)
* Fix rx_serial sensor use and file mode warning in MVFv3 files (#298, #299)
0.15 (2020-03-13)
-----------------
* Improve S3 chunk store: check tokens, improve timeouts and retries (#272 - #277)
* Retry truncated reads and 50x errors due to S3 server overload (#274)
* Apply flux calibration if available (#278, #279)
* Improve mvf_rechunk and mvf_read_benchmark scripts (#280, #281, #284)
* Fix selection by target description (#271)
* Mark Python 2 support as deprecated (#282)
0.14 (2019-10-02)
-----------------
* Make L2 product by applying self-calibration corrections (#253 - #256)
* Speed up uvw calculations (#252, #262)
* Produce documentation on readthedocs.org (#244, #245, #247, #250, #261)
* Clean up mvftoms and fix REST_FREQUENCY in SOURCE sub-table (#258)
* Support katstore64 API (#265)
* Improve chunk store: detect short reads, speed up handling of lost data (#259, #260)
* Use katpoint 0.9 and dask 1.2.1 features (#262, #243)
0.13 (2019-05-09)
-----------------
* Load RDB files straight from archive (#233, #241)
* Retrieve raw sensor data from CAM katstore (#234)
* Work around one-CBF-dump offset issue (#238)
* Improved MS output: fixed RECEPTOR_ANGLE (#230), added WEIGHT_SPECTRUM (#231)
* Various optimisations to applycal (#224), weights (#226), S3 reads (#229)
* Use katsdptelstate 0.8 and dask 1.1 features (#228, #233, #240)
0.12 (2019-02-12)
-----------------
* Optionally make L1 product by applying calibration corrections (#194 - #198)
* Let default reference antenna in v4 datasets be "array" antenna (#202, #220)
* Use katsdptelstate v0.7: generic encodings, memory backend (#196, #201, #212)
* Prepare for multi-dump chunks (#213, #214, #216, #217, #219)
* Allow L1 flags to be ignored (#209, #210)
* Deal with deprecated dask features (#204, #215)
* Remove RADOS chunk store (it's all via S3 from here on)
0.11 (2018-10-15)
-----------------
* Python 3 support via python-future (finally!)
* Load L1 flags if available (#164)
* Reduced memory usage (#165) and speedups (#155, #169, #170, #171, #182)
* S3 chunk store now uses requests directly instead of via botocore (#166)
* Let lazy indexer use oindex semantics like in the past (#180)
* Fix concatenated data sets (#161)
* Fix IPython / Jupyter tab completion for sensor cache (#176)
0.10.1 (2018-05-18)
-------------------
* Restore NumPy 1.14 support (all data flagged otherwise)
0.10 (2018-05-17)
-----------------
* Rally around the MeerKAT Visibility Format (MVF)
* First optimised converter from MVF v4 to MS: mvftoms
* Latest v4 fixes (synthetic timestamps, autodetection, NPY files in Ceph)
* Flag and zero missing chunks
* Now requires katsdptelstate (released), dask, h5py 2.3 and Python 2.7
* Restore S3 unit tests and NumPy 1.11 (on Ubuntu 16.04) support
0.9.5 (2018-02-22)
------------------
* New HDF5 v3.9 file format in anticipation of v4 (affects obs_params)
* Fix receiver serial numbers in recent MeerKAT data sets
* Add dask support to ChunkStore
* katdal.open() works on v4 RDB files
0.9 (2018-01-16)
----------------
* New ChunkStore and telstate-based parser for future v4 format
* Use python-casacore (>=2.2.1) to create Measurement Sets instead of blank.ms
* Read new-style noise diode sensor names, serial numbers and L0 stream metadata
* Select multiple polarisations (useful for cross-pol)
* Relax the "expected number of dumps" check to avoid spurious warnings
* Fix NumPy 1.14 warnings
0.8 (2017-08-08)
----------------
* Fix upside-down MeerKAT images
* SensorData rework to load gain solutions and access telstate efficiently
* Improve mapping of sensor events onto dumps, especially for long (8 s) dumps
* Fix NumPy 1.13 warnings and errors
* Support UHF receivers
0.7.1 (2017-01-19)
------------------
* Fix MODEL_DATA / CORRECTED_DATA shapes in h5toms
* Produce calibration solution tables in h5toms and improve error messages
* Autodetect receiver band on older RTS files
0.7 (2016-12-14)
----------------
* Support weights in file and improve vis / weights / flags API
* Support multiple receivers and improve centre frequency extraction
* Speed up h5toms by ordering visibilities by time
* Fix band selection and corr products for latest SDP (cam2telstate)
* Allow explicit MS names in h5toms
0.6 (2016-09-16)
----------------
* Initial release of katdal
Raw data
{
"_id": null,
"home_page": "https://github.com/ska-sa/katdal",
"name": "katdal",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "meerkat ska",
"author": "Ludwig Schwardt",
"author_email": "ludwig@ska.ac.za",
"download_url": "https://files.pythonhosted.org/packages/54/e6/a4a45edc53cd703bd9a1532a5e9f298f0b236a9c07a1ef0ee668d8488270/katdal-0.23.tar.gz",
"platform": "OS Independent",
"description": "katdal\n======\n\nThis package serves as a data access library to interact with the chunk stores\nand HDF5 files produced by the MeerKAT radio telescope and its predecessors\n(KAT-7 and Fringe Finder), which are collectively known as *MeerKAT Visibility\nFormat (MVF)* data sets. It uses memory carefully, allowing data sets to be\ninspected and partially loaded into memory. Data sets may be concatenated and\nsplit via a flexible selection mechanism. In addition, it provides a script to\nconvert these data sets to CASA MeasurementSets.\n\nQuick Tutorial\n--------------\n\nOpen any data set through a single function to obtain a data set object:\n\n.. code:: python\n\n import katdal\n d = katdal.open('1234567890.h5')\n\nThe ``open`` function automatically determines the version and storage location\nof the data set. The versions roughly map to the various instruments::\n\n - v1 : Fringe Finder (HDF5 file)\n - v2 : KAT-7 (HDF5 file)\n - v3 : MeerKAT (HDF5 file)\n - v4 : MeerKAT (RDB file + chunk store based on objects in Ceph)\n\nEach MVFv4 data set is split into a Redis dump (aka *RDB*) file containing the\nmetadata in the form of a *telescope state* database, and a *chunk store*\ncontaining the visibility data split into many small blocks or chunks (typically\nserved by a Ceph object store over the network). The RDB file is the main entry\npoint to the data set and it can be accessed directly from the MeerKAT SDP\narchive if you have the appropriate permissions:\n\n.. code:: python\n\n # This is just for illustration - the real URL looks a bit different\n d = katdal.open('https://archive/1234567890/1234567890_sdp_l0.rdb?token=AsD3')\n\nMultiple data sets (even of different versions) may also be concatenated\ntogether (as long as they have the same dump rate):\n\n.. code:: python\n\n d = katdal.open(['1234567890.h5', '1234567891.h5'])\n\nInspect the contents of the data set by printing the object:\n\n.. code:: python\n\n print(d)\n\nHere is a typical output::\n\n ===============================================================================\n Name: 1313067732.h5 (version 2.0)\n ===============================================================================\n Observer: someone Experiment ID: 2118d346-c41a-11e0-b2df-a4badb44fe9f\n Description: 'Track on Hyd A,Vir A, 3C 286 and 3C 273'\n Observed from 2011-08-11 15:02:14.072 SAST to 2011-08-11 15:19:47.810 SAST\n Dump rate: 1.00025 Hz\n Subarrays: 1\n ID Antennas Inputs Corrprods\n 0 ant1,ant2,ant3,ant4,ant5,ant6,ant7 14 112\n Spectral Windows: 1\n ID CentreFreq(MHz) Bandwidth(MHz) Channels ChannelWidth(kHz)\n 0 1822.000 400.000 1024 390.625\n -------------------------------------------------------------------------------\n Data selected according to the following criteria:\n subarray=0\n ants=['ant1', 'ant2', 'ant3', 'ant4', 'ant5', 'ant6', 'ant7']\n spw=0\n -------------------------------------------------------------------------------\n Shape: (1054 dumps, 1024 channels, 112 correlation products) => Size: 967.049 MB\n Antennas: *ant1,ant2,ant3,ant4,ant5,ant6,ant7 Inputs: 14 Autocorr: yes Crosscorr: yes\n Channels: 1024 (index 0 - 1023, 2021.805 MHz - 1622.195 MHz), each 390.625 kHz wide\n Targets: 4 selected out of 4 in catalogue\n ID Name Type RA(J2000) DEC(J2000) Tags Dumps ModelFlux(Jy)\n 0 Hyd A radec 9:18:05.28 -12:05:48.9 333 33.63\n 1 Vir A radec 12:30:49.42 12:23:28.0 251 166.50\n 2 3C 286 radec 13:31:08.29 30:30:33.0 230 12.97\n 3 3C 273 radec 12:29:06.70 2:03:08.6 240 39.96\n Scans: 8 selected out of 8 total Compscans: 1 selected out of 1 total\n Date Timerange(UTC) ScanState CompScanLabel Dumps Target\n 11-Aug-2011/13:02:14 - 13:04:26 0:slew 0: 133 0:Hyd A\n 13:04:27 - 13:07:46 1:track 0: 200 0:Hyd A\n 13:07:47 - 13:08:37 2:slew 0: 51 1:Vir A\n 13:08:38 - 13:11:57 3:track 0: 200 1:Vir A\n 13:11:58 - 13:12:27 4:slew 0: 30 2:3C 286\n 13:12:28 - 13:15:47 5:track 0: 200 2:3C 286\n 13:15:48 - 13:16:27 6:slew 0: 40 3:3C 273\n 13:16:28 - 13:19:47 7:track 0: 200 3:3C 273\n\nThe first segment of the printout displays the static information of the data\nset, including observer, dump rate and all the available subarrays and spectral\nwindows in the data set. The second segment (between the dashed lines) highlights\nthe active selection criteria. The last segment displays dynamic information\nthat is influenced by the selection, including the overall visibility array\nshape, antennas, channel frequencies, targets and scan info.\n\nThe data set is built around the concept of a three-dimensional visibility array\nwith dimensions of time, frequency and correlation product. This is reflected in\nthe *shape* of the dataset:\n\n.. code:: python\n\n d.shape\n\nwhich returns ``(1054, 1024, 112)``, meaning 1054 dumps by 1024 channels by 112\ncorrelation products.\n\nLet's select a subset of the data set:\n\n.. code:: python\n\n d.select(scans='track', channels=slice(200, 300), ants='ant4')\n print(d)\n\nThis results in the following printout::\n\n ===============================================================================\n Name: /Users/schwardt/Downloads/1313067732.h5 (version 2.0)\n ===============================================================================\n Observer: siphelele Experiment ID: 2118d346-c41a-11e0-b2df-a4badb44fe9f\n Description: 'track on Hyd A,Vir A, 3C 286 and 3C 273 for Lud'\n Observed from 2011-08-11 15:02:14.072 SAST to 2011-08-11 15:19:47.810 SAST\n Dump rate: 1.00025 Hz\n Subarrays: 1\n ID Antennas Inputs Corrprods\n 0 ant1,ant2,ant3,ant4,ant5,ant6,ant7 14 112\n Spectral Windows: 1\n ID CentreFreq(MHz) Bandwidth(MHz) Channels ChannelWidth(kHz)\n 0 1822.000 400.000 1024 390.625\n -------------------------------------------------------------------------------\n Data selected according to the following criteria:\n channels=slice(200, 300, None)\n subarray=0\n scans='track'\n ants='ant4'\n spw=0\n -------------------------------------------------------------------------------\n Shape: (800 dumps, 100 channels, 4 correlation products) => Size: 2.560 MB\n Antennas: ant4 Inputs: 2 Autocorr: yes Crosscorr: no\n Channels: 100 (index 200 - 299, 1943.680 MHz - 1905.008 MHz), each 390.625 kHz wide\n Targets: 4 selected out of 4 in catalogue\n ID Name Type RA(J2000) DEC(J2000) Tags Dumps ModelFlux(Jy)\n 0 Hyd A radec 9:18:05.28 -12:05:48.9 200 31.83\n 1 Vir A radec 12:30:49.42 12:23:28.0 200 159.06\n 2 3C 286 radec 13:31:08.29 30:30:33.0 200 12.61\n 3 3C 273 radec 12:29:06.70 2:03:08.6 200 39.32\n Scans: 4 selected out of 8 total Compscans: 1 selected out of 1 total\n Date Timerange(UTC) ScanState CompScanLabel Dumps Target\n 11-Aug-2011/13:04:27 - 13:07:46 1:track 0: 200 0:Hyd A\n 13:08:38 - 13:11:57 3:track 0: 200 1:Vir A\n 13:12:28 - 13:15:47 5:track 0: 200 2:3C 286\n 13:16:28 - 13:19:47 7:track 0: 200 3:3C 273\n\nCompared to the first printout, the static information has remained the same\nwhile the dynamic information now reflects the selected subset. There are many\npossible selection criteria, as illustrated below:\n\n.. code:: python\n\n d.select(timerange=('2011-08-11 13:10:00', '2011-08-11 13:15:00'), targets=[1, 2])\n d.select(spw=0, subarray=0)\n d.select(ants='ant1,ant2', pol='H', scans=(0,1,2), freqrange=(1700e6, 1800e6))\n\nSee the docstring of ``DataSet.select`` for more detailed information (i.e.\ndo ``d.select?`` in IPython). Take note that only one subarray and one spectral\nwindow must be selected.\n\nOnce a subset of the data has been selected, you can access the data and\ntimestamps on the data set object:\n\n.. code:: python\n\n vis = d.vis[:]\n timestamps = d.timestamps[:]\n\nNote the ``[:]`` indexing, as the ``vis`` and ``timestamps`` properties are\nspecial ``LazyIndexer`` objects that only give you the actual data when\nyou use indexing, in order not to inadvertently load the entire array into memory.\n\nFor the example dataset and no selection the ``vis`` array will have a shape of\n``(1054, 1024, 112)``. The time dimension is labelled by ``d.timestamps``, the\nfrequency dimension by ``d.channel_freqs`` and the correlation product dimension\nby ``d.corr_products``.\n\nAnother key concept in the data set object is that of *sensors*. These are named\ntime series of arbitrary data that are either loaded from the data set\n(*actual* sensors) or calculated on the fly (*virtual* sensors). Both variants\nare accessed through the *sensor cache* (available as ``d.sensor``) and cached\nthere after the first access. The data set object also provides convenient\nproperties to expose commonly-used sensors, as shown in the plot example below:\n\n.. code:: python\n\n import matplotlib.pyplot as plt\n plt.plot(d.az, d.el, 'o')\n plt.xlabel('Azimuth (degrees)')\n plt.ylabel('Elevation (degrees)')\n\nOther useful attributes include ``ra``, ``dec``, ``lst``, ``mjd``, ``u``,\n``v``, ``w``, ``target_x`` and ``target_y``. These are all one-dimensional\nNumPy arrays that dynamically change length depending on the active selection.\n\nAs in katdal's predecessor (scape) there is a ``DataSet.scans`` generator\nthat allows you to step through the scans in the data set. It returns the\nscan index, scan state and target object on each iteration, and updates\nthe active selection on the data set to include only the current scan.\nIt is also possible to iterate through the compound scans with the\n``DataSet.compscans`` generator, which yields the compound scan index, label\nand first target on each iteration for convenience. These two iterators may also\nbe used together to traverse the data set structure:\n\n.. code:: python\n\n for compscan, label, target in d.compscans():\n plt.figure()\n for scan, state, target in d.scans():\n if state in ('scan', 'track'):\n plt.plot(d.ra, d.dec, 'o')\n plt.xlabel('Right ascension (J2000 degrees)')\n plt.ylabel('Declination (J2000 degrees)')\n plt.title(target.name)\n\nFinally, all the targets (or fields) in the data set are stored in a catalogue\navailable at ``d.catalogue``, and the original HDF5 file is still accessible via\na back door installed at ``d.file`` in the case of a single-file data set (v3\nor older). On a v4 data set, ``d.source`` provides access to the underlying\ntelstate for metadata and the chunk store for data.\n\n\nHistory\n=======\n\n0.23 (2024-06-28)\n-----------------\n* New `mvf_download` script (also promote `mvf_copy` and remove junk) (#380)\n* Select targets by their tags (#377)\n* Rename `np.product` to support numpy >= 2.0 and make unit tests more robust (#372)\n\n0.22 (2023-11-28)\n-----------------\n* Restore np.bool in Numba averaging function to prevent mvftoms crash (#370)\n* Replace underscores with dashes when loading old buckets from RDBs (#370)\n* Select multiple targets with same name to avoid dropped scans in MS (#369)\n* Support on-the-fly (OTF) scans in mvftoms (#366)\n\n0.21 (2023-05-12)\n-----------------\n* Fix support for numpy >= 1.24 and move unit tests from nose to pytest (#361)\n* Complete rewrite of S3ChunkStore retries for more robust archive downloads (#363)\n* Remove IMAGING_WEIGHT column full of zeroes from MS (#356)\n* Improve tests with ES256-encoded JWT tokens and more robust MinIO health check (#360)\n\n0.20.1 (2022-04-29)\n-------------------\n* Fix broken `dataset.vis[n]` due to DaskLazyIndexer / ChunkStore interaction (#355)\n\n0.20 (2022-04-14)\n-----------------\n* Fix support for dask >= 2022.01.1 in ChunkStore (#351)\n* Allow mvftoms to continue with partial MS after an interruption (#348)\n* New mvf_copy.py script that can be used to extract autocorrelations only (#349)\n* Treat Ceph 403 errors properly in S3ChunkStore (#352)\n\n0.19 (2021-11-23)\n-----------------\n* Support scans and non-radec targets like planets in mvftoms (#333)\n* Expose the raw flags of MVF4 datasets (#335)\n* Expose CBF F-engine sensors: applied delays, phases and gains (#338)\n* Verify that S3 bucket is not empty to detect datasets archived to tape (#344)\n* Populate SIGMA_SPECTRUM and redo SIGMA and WEIGHT in mvftoms (#347)\n* Have a sensible DataSet.name and also add a separate DataSet.url (#337)\n* Allow deselection of antennas using '~m0XX' (#340)\n* Allow nested DaskLazyIndexers (#336)\n* Fix mvftoms on macOS and Python 3.8+ (#339)\n\n0.18 (2021-04-20)\n-----------------\n* Switch to PyJWT 2 and Python 3.6, cleaning up Python 2 relics (#321 - #323)\n* Allow preselection of channels and dumps upon katdal.open() to save time and memory (#324)\n* Allow user to select fields, scans and antennas in mvftoms (#269)\n* Support h5py 3.0 string handling in MVF3 (#331)\n* Refactor requirement files to remove recursive dependencies (#329)\n\n0.17 (2021-01-27)\n-----------------\n* This is the last release that will support Python 3.5\n* Pin PyJWT version to 1.x to avoid breaking API changes (#320)\n* Van Vleck correction! (autocorrelations only, though) (#316)\n* Expose excision, aka raw weights (#308)\n* Better unit testing of DataSource and S3ChunkStore in general (#319)\n* Support indexed telstate keys (the 1000th cut that killed Python 2) (#304)\n* Split out separate utility classes for Minio (#310)\n* Fix filtering of sensor events with invalid status (#306)\n\n0.16 (2020-08-28)\n-----------------\n* This is the last release that will support Python 2 (python2 maintenance branch)\n* New 'time_offset' sensor property that adjusts timestamps of any sensor (#307)\n* Fix calculation of cbf_dump_period for 'wide' / 'narrowN' instruments (#301)\n* Increase katstore search window by 600 seconds to find infrequent updates (#302)\n* Refactor SensorData to become a lazy abstract interface without caching (#292)\n* Refactor SensorCache to use MutableMapping (#300)\n* Fix rx_serial sensor use and file mode warning in MVFv3 files (#298, #299)\n\n0.15 (2020-03-13)\n-----------------\n* Improve S3 chunk store: check tokens, improve timeouts and retries (#272 - #277)\n* Retry truncated reads and 50x errors due to S3 server overload (#274)\n* Apply flux calibration if available (#278, #279)\n* Improve mvf_rechunk and mvf_read_benchmark scripts (#280, #281, #284)\n* Fix selection by target description (#271)\n* Mark Python 2 support as deprecated (#282)\n\n0.14 (2019-10-02)\n-----------------\n* Make L2 product by applying self-calibration corrections (#253 - #256)\n* Speed up uvw calculations (#252, #262)\n* Produce documentation on readthedocs.org (#244, #245, #247, #250, #261)\n* Clean up mvftoms and fix REST_FREQUENCY in SOURCE sub-table (#258)\n* Support katstore64 API (#265)\n* Improve chunk store: detect short reads, speed up handling of lost data (#259, #260)\n* Use katpoint 0.9 and dask 1.2.1 features (#262, #243)\n\n0.13 (2019-05-09)\n-----------------\n* Load RDB files straight from archive (#233, #241)\n* Retrieve raw sensor data from CAM katstore (#234)\n* Work around one-CBF-dump offset issue (#238)\n* Improved MS output: fixed RECEPTOR_ANGLE (#230), added WEIGHT_SPECTRUM (#231)\n* Various optimisations to applycal (#224), weights (#226), S3 reads (#229)\n* Use katsdptelstate 0.8 and dask 1.1 features (#228, #233, #240)\n\n0.12 (2019-02-12)\n-----------------\n* Optionally make L1 product by applying calibration corrections (#194 - #198)\n* Let default reference antenna in v4 datasets be \"array\" antenna (#202, #220)\n* Use katsdptelstate v0.7: generic encodings, memory backend (#196, #201, #212)\n* Prepare for multi-dump chunks (#213, #214, #216, #217, #219)\n* Allow L1 flags to be ignored (#209, #210)\n* Deal with deprecated dask features (#204, #215)\n* Remove RADOS chunk store (it's all via S3 from here on)\n\n0.11 (2018-10-15)\n-----------------\n* Python 3 support via python-future (finally!)\n* Load L1 flags if available (#164)\n* Reduced memory usage (#165) and speedups (#155, #169, #170, #171, #182)\n* S3 chunk store now uses requests directly instead of via botocore (#166)\n* Let lazy indexer use oindex semantics like in the past (#180)\n* Fix concatenated data sets (#161)\n* Fix IPython / Jupyter tab completion for sensor cache (#176)\n\n0.10.1 (2018-05-18)\n-------------------\n* Restore NumPy 1.14 support (all data flagged otherwise)\n\n0.10 (2018-05-17)\n-----------------\n* Rally around the MeerKAT Visibility Format (MVF)\n* First optimised converter from MVF v4 to MS: mvftoms\n* Latest v4 fixes (synthetic timestamps, autodetection, NPY files in Ceph)\n* Flag and zero missing chunks\n* Now requires katsdptelstate (released), dask, h5py 2.3 and Python 2.7\n* Restore S3 unit tests and NumPy 1.11 (on Ubuntu 16.04) support\n\n0.9.5 (2018-02-22)\n------------------\n* New HDF5 v3.9 file format in anticipation of v4 (affects obs_params)\n* Fix receiver serial numbers in recent MeerKAT data sets\n* Add dask support to ChunkStore\n* katdal.open() works on v4 RDB files\n\n0.9 (2018-01-16)\n----------------\n* New ChunkStore and telstate-based parser for future v4 format\n* Use python-casacore (>=2.2.1) to create Measurement Sets instead of blank.ms\n* Read new-style noise diode sensor names, serial numbers and L0 stream metadata\n* Select multiple polarisations (useful for cross-pol)\n* Relax the \"expected number of dumps\" check to avoid spurious warnings\n* Fix NumPy 1.14 warnings\n\n0.8 (2017-08-08)\n----------------\n* Fix upside-down MeerKAT images\n* SensorData rework to load gain solutions and access telstate efficiently\n* Improve mapping of sensor events onto dumps, especially for long (8 s) dumps\n* Fix NumPy 1.13 warnings and errors\n* Support UHF receivers\n\n0.7.1 (2017-01-19)\n------------------\n\n* Fix MODEL_DATA / CORRECTED_DATA shapes in h5toms\n* Produce calibration solution tables in h5toms and improve error messages\n* Autodetect receiver band on older RTS files\n\n0.7 (2016-12-14)\n----------------\n\n* Support weights in file and improve vis / weights / flags API\n* Support multiple receivers and improve centre frequency extraction\n* Speed up h5toms by ordering visibilities by time\n* Fix band selection and corr products for latest SDP (cam2telstate)\n* Allow explicit MS names in h5toms\n\n0.6 (2016-09-16)\n----------------\n\n* Initial release of katdal\n",
"bugtrack_url": null,
"license": "Modified BSD",
"summary": "Karoo Array Telescope data access library for interacting with data sets in the MeerKAT Visibility Format (MVF)",
"version": "0.23",
"project_urls": {
"Homepage": "https://github.com/ska-sa/katdal"
},
"split_keywords": [
"meerkat",
"ska"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5eece8f95b16a1f043545fdfc9d474ec1f5d0ff5c0e0e671fa657a454e3e04cb",
"md5": "7c78b731df4279e8a9ad52c53840bc75",
"sha256": "4b454ef7327f6fb08e0b831f9c70d4264e641ec5bf77b683bb3955877c6d4169"
},
"downloads": -1,
"filename": "katdal-0.23-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7c78b731df4279e8a9ad52c53840bc75",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 247215,
"upload_time": "2024-06-28T15:30:55",
"upload_time_iso_8601": "2024-06-28T15:30:55.168519Z",
"url": "https://files.pythonhosted.org/packages/5e/ec/e8f95b16a1f043545fdfc9d474ec1f5d0ff5c0e0e671fa657a454e3e04cb/katdal-0.23-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "54e6a4a45edc53cd703bd9a1532a5e9f298f0b236a9c07a1ef0ee668d8488270",
"md5": "80e44046b0a619ac09db1ae97fd73d4b",
"sha256": "c99ae75fc36b2f199db9cff66d8e7a5fcaa1e4ffbf5706157a5d9f2046825f3f"
},
"downloads": -1,
"filename": "katdal-0.23.tar.gz",
"has_sig": false,
"md5_digest": "80e44046b0a619ac09db1ae97fd73d4b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 223346,
"upload_time": "2024-06-28T15:30:05",
"upload_time_iso_8601": "2024-06-28T15:30:05.205461Z",
"url": "https://files.pythonhosted.org/packages/54/e6/a4a45edc53cd703bd9a1532a5e9f298f0b236a9c07a1ef0ee668d8488270/katdal-0.23.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-28 15:30:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ska-sa",
"github_project": "katdal",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "katdal"
}