dl1-data-handler


Namedl1-data-handler JSON
Version 0.10.10 PyPI version JSON
download
home_pagehttp://github.com/cta-observatory/dl1-data-handler
Summarydl1 HDF5 data writer + reader + processor
upload_time2023-05-08 14:41:18
maintainer
docs_urlNone
authorDL1DH Team
requires_python
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            DL1 Data Handler
================


.. image:: https://zenodo.org/badge/72042185.svg
   :target: https://zenodo.org/badge/latestdoi/72042185
   :alt: DOI


.. image:: https://travis-ci.org/cta-observatory/dl1-data-handler.svg?branch=master
   :target: https://travis-ci.org/cta-observatory/dl1-data-handler.svg?branch=master
   :alt: build status


.. image:: https://anaconda.org/ctlearn-project/dl1_data_handler/badges/installer/conda.svg
   :target: https://anaconda.org/ctlearn-project/dl1_data_handler/
   :alt: Anaconda-Server Badge


.. image:: https://img.shields.io/pypi/v/dl1-data-handler
    :target: https://pypi.org/project/dl1-data-handler/
    :alt: Latest Release


.. image:: https://coveralls.io/repos/github/cta-observatory/dl1-data-handler/badge.svg?branch=master
   :target: https://coveralls.io/github/cta-observatory/dl1-data-handler?branch=master
   :alt: Coverage Status


A package of utilities for writing (deprecated), reading, and applying image processing to `Cherenkov Telescope Array (CTA) <https://www.cta-observatory.org/>`_ DL1 data (calibrated images) in a standardized format. Created primarily for testing machine learning image analysis techniques on IACT data.

Currently supports data in the CTA pyhessio sim_telarray format, with the possibility of supporting other IACT data formats in the future. Built using ctapipe and PyTables.

Previously named image-extractor (v0.1.0 - v0.6.0). Currently under development, intended for internal use only.

Data Format
-----------

[Deprecated] DL1DataWriter implements a standardized format for storing simulated CTA DL1 event data into Pytables files. CTAMLDataDumper is the class which implements the conversion from ctapipe containers to the CTA ML data format. See the wiki page `here <https://github.com/cta-observatory/dl1-data-handler/wiki/CTA-ML-Data-Format>`_ for a full description of this data format and an FAQ.

ctapipe process tool should be used instead.

Installation
------------

The following installation method (for Linux) is recommended:

Installing as a conda package
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To install dl1-data-handler as a conda package, first install Anaconda by following the instructions here: https://www.anaconda.com/distribution/.

The following command will set up a conda virtual environment, add the
necessary package channels, and install dl1-data-handler specified version and its dependencies:

.. code-block:: bash

   DL1DH_VER=0.10.10
   wget https://raw.githubusercontent.com/cta-observatory/dl1-data-handler/v$DL1DH_VER/environment.yml
   conda env create -n [ENVIRONMENT_NAME] -f environment.yml
   conda activate [ENVIRONMENT_NAME]
   conda install -c ctlearn-project dl1_data_handler=$DL1DH_VER

This should automatically install all dependencies (NOTE: this may take some time, as by default MKL is included as a dependency of NumPy and it is very large).

If you want to import any functionality from dl1-data-handler into your own Python scripts, then you are all set. However, if you wish to make use of any of the scripts in dl1-data-handler/scripts (like write_data.py), you should also clone the repository locally and checkout the corresponding tag (i.e. for version v0.10.10):

.. code-block:: bash

   git clone https://github.com/cta-observatory/dl1-data-handler.git
   git checkout v0.10.10

dl1-data-handler should already have been installed in your environment by Conda, so no further installation steps (i.e. with setuptools or pip) are necessary and you should be able to run scripts/write_data.py directly.

Dependencies
------------

The main dependencies are:


* PyTables >= 3.7
* NumPy >= 1.16.0
* ctapipe == 0.19.0

Also see setup.py.

Usage
-----

[Deprecated] DL1DataWriter
^^^^^^^^^^^^^^^^^^^^^^^^^^
The DL1DataWriter is not supported by the default installation. Please follow the custom installation instructions:

.. code-block:: bash

   git clone https://github.com/cta-observatory/dl1-data-handler.git
   git checkout magic # for MAGIC data
   conda env create -n [ENVIRONMENT_NAME] -f environment-magic.yml
   conda activate [ENVIRONMENT_NAME]
   python setup_magic.py install

From the Command Line:
~~~~~~~~~~~~~~~~~~~~~~

To process data files into a desired format:

.. code-block:: bash

   dl1dh-write_data [runlist] [--config_file,-c CONFIG_FILE_PATH] [--output_dir,-o OUTPUT_DIR] [--debug]

on the command line.

ex:

.. code-block:: bash

   dl1dh-write_data runlist.yml -c example_config.yml --debug


* runlist - A YAML file containing groups of input files to load data from and output files to write to. See example runlist for format.
* config_file - The path to a YAML configuration file specifying all of the settings for data loading and writing. See example config file and documentation for details on each setting. If none is provided, default settings are used for everything.
* output_dir - Path to directory to write all output files to. If not provided, defaults to the current directory.
* debug - Optional flag to print additional debug information from the logger.

In a Python script:
~~~~~~~~~~~~~~~~~~~

If the package was installed with pip as described above, you can import and use it in Python like:

ex:

.. code-block:: python

   from dl1_data_handler import dl1_data_writer

   event_source_class = MyEventSourceClass
   event_source_settings = {'setting1': 'value1'}

   data_dumper_class = MyDataDumperClass
   data_dumper_settings = {'setting2': 'value2'}

   def my_cut_function(event):
       # custom cut logic here
       return True

   data_writer = dl1_data_writer.DL1DataWriter(event_source_class=event_source_class,
       event_source_settings=event_source_settings,
       data_dumper_class=data_dumper_class,
       data_dumper_settings=dumper_settings,
       preselection_cut_function=my_cut_function,
       output_file_size=10737418240,
       events_per_file=500)

   run_list = [
    {'inputs': ['file1.simtel.gz', 'file2.simtel.gz'],
     'target': 'output.h5'}
   ]

   data_writer.process_data(run_list)

Generating a run list
~~~~~~~~~~~~~~~~~~~~~

If processing data from simtel.gz files, as long as their filenames have the format ``[particle_type]_[ze]deg_[az]deg_run[run_number]___[production info].simtel.gz`` or ``[particle_type]_[ze]deg_[az]deg_run[run_number]___[production info]_cone[cone_num].simtel.gz`` the dl1dh-generate_runlist can be used to automatically generate a runlist in the correct format. The script can also generate a run list with the MAGIC-MARS superstar files.

It can be called as:

.. code-block:: bash

   dl1dh-generate_runlist [file_dir] [--num_inputs_per_run,-n NUM_INPUTS_PER_RUN] [--output_file_name,-f OUTPUT_FILE_NAME] [--output_dir,-o OUTPUT_DIR]


* file_dir - Path to a directory containing simtel.gz files with the filename format specified above.
* num_inputs_per_run - Number of input files with the same particle type, ze, az, and production info to group together into each run (defaults to 10).
* output_file - Path/filename of output runlist file without a postfix. Defaults to ./runlist
* output_dir - Path where to save generated files. By default, the input directory is used.

It will automatically sort the simtel files in the file_dir directory into groups with matching particle_type, zenith, azimuth, and production parameters. Within each of these groups, it will group together input files in sequential order into runs of size NUM_INPUTS_PER_RUN. The output filename for each run will be automatically generated as ``[particle_type]_[ze]deg_[az]deg_runs[run_number_range]___[production info].h5``. The output YAML file will be written to output_file.

ImageMapper
^^^^^^^^^^^

The ImageMapper class transforms the hexagonal input pixels into a 2D Cartesian output image. The basic usage is demonstrated in the `ImageMapper tutorial <https://github.com/cta-observatory/dl1-data-handler/blob/master/notebooks/test_image_mapper.ipynb>`_. It requires `ctapipe-extra <https://github.com/cta-observatory/ctapipe-extra>`_ outside of the dl1-data-handler. See this publication for a detailed description: `arXiv:1912.09898 <https://arxiv.org/abs/1912.09898>`_

Other scripts
^^^^^^^^^^^^^

All other scripts located in the scripts/deprecated directory are not currently updated to be compatible with dl1-data-handler >= 0.7.0 and should not be used.

Examples/Tips
-------------

* Vitables is very helpful for viewing and debugging PyTables-style HDF5 files. Installation/download instructions can be found in the link below.

Known Issues/Troubleshooting
----------------------------

Links
-----


* `Cherenkov Telescope Array (CTA) <https://www.cta-observatory.org/>`_ - Homepage of the CTA collaboration 
* `CTLearn <https://github.com/ctlearn-project/ctlearn/>`_ and `GammaLearn <https://gitlab.lapp.in2p3.fr/GammaLearn/GammaLearn>`_ - Repository of code for studies on applying deep learning to IACT analysis tasks. Maintained by groups at Columbia University, Universidad Complutense de Madrid, Barnard College (CTLearn) and LAPP (GammaLearn).
* `ctapipe <https://cta-observatory.github.io/ctapipe/>`_ - Official documentation for the ctapipe analysis package (in development)
* `ViTables <http://vitables.org/>`_ - Homepage for ViTables application for Pytables HDF5 file visualization


            

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/cta-observatory/dl1-data-handler",
    "name": "dl1-data-handler",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "DL1DH Team",
    "author_email": "d.nieto@ucm.es",
    "download_url": "",
    "platform": null,
    "description": "DL1 Data Handler\n================\n\n\n.. image:: https://zenodo.org/badge/72042185.svg\n   :target: https://zenodo.org/badge/latestdoi/72042185\n   :alt: DOI\n\n\n.. image:: https://travis-ci.org/cta-observatory/dl1-data-handler.svg?branch=master\n   :target: https://travis-ci.org/cta-observatory/dl1-data-handler.svg?branch=master\n   :alt: build status\n\n\n.. image:: https://anaconda.org/ctlearn-project/dl1_data_handler/badges/installer/conda.svg\n   :target: https://anaconda.org/ctlearn-project/dl1_data_handler/\n   :alt: Anaconda-Server Badge\n\n\n.. image:: https://img.shields.io/pypi/v/dl1-data-handler\n    :target: https://pypi.org/project/dl1-data-handler/\n    :alt: Latest Release\n\n\n.. image:: https://coveralls.io/repos/github/cta-observatory/dl1-data-handler/badge.svg?branch=master\n   :target: https://coveralls.io/github/cta-observatory/dl1-data-handler?branch=master\n   :alt: Coverage Status\n\n\nA package of utilities for writing (deprecated), reading, and applying image processing to `Cherenkov Telescope Array (CTA) <https://www.cta-observatory.org/>`_ DL1 data (calibrated images) in a standardized format. Created primarily for testing machine learning image analysis techniques on IACT data.\n\nCurrently supports data in the CTA pyhessio sim_telarray format, with the possibility of supporting other IACT data formats in the future. Built using ctapipe and PyTables.\n\nPreviously named image-extractor (v0.1.0 - v0.6.0). Currently under development, intended for internal use only.\n\nData Format\n-----------\n\n[Deprecated] DL1DataWriter implements a standardized format for storing simulated CTA DL1 event data into Pytables files. CTAMLDataDumper is the class which implements the conversion from ctapipe containers to the CTA ML data format. See the wiki page `here <https://github.com/cta-observatory/dl1-data-handler/wiki/CTA-ML-Data-Format>`_ for a full description of this data format and an FAQ.\n\nctapipe process tool should be used instead.\n\nInstallation\n------------\n\nThe following installation method (for Linux) is recommended:\n\nInstalling as a conda package\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo install dl1-data-handler as a conda package, first install Anaconda by following the instructions here: https://www.anaconda.com/distribution/.\n\nThe following command will set up a conda virtual environment, add the\nnecessary package channels, and install dl1-data-handler specified version and its dependencies:\n\n.. code-block:: bash\n\n   DL1DH_VER=0.10.10\n   wget https://raw.githubusercontent.com/cta-observatory/dl1-data-handler/v$DL1DH_VER/environment.yml\n   conda env create -n [ENVIRONMENT_NAME] -f environment.yml\n   conda activate [ENVIRONMENT_NAME]\n   conda install -c ctlearn-project dl1_data_handler=$DL1DH_VER\n\nThis should automatically install all dependencies (NOTE: this may take some time, as by default MKL is included as a dependency of NumPy and it is very large).\n\nIf you want to import any functionality from dl1-data-handler into your own Python scripts, then you are all set. However, if you wish to make use of any of the scripts in dl1-data-handler/scripts (like write_data.py), you should also clone the repository locally and checkout the corresponding tag (i.e. for version v0.10.10):\n\n.. code-block:: bash\n\n   git clone https://github.com/cta-observatory/dl1-data-handler.git\n   git checkout v0.10.10\n\ndl1-data-handler should already have been installed in your environment by Conda, so no further installation steps (i.e. with setuptools or pip) are necessary and you should be able to run scripts/write_data.py directly.\n\nDependencies\n------------\n\nThe main dependencies are:\n\n\n* PyTables >= 3.7\n* NumPy >= 1.16.0\n* ctapipe == 0.19.0\n\nAlso see setup.py.\n\nUsage\n-----\n\n[Deprecated] DL1DataWriter\n^^^^^^^^^^^^^^^^^^^^^^^^^^\nThe DL1DataWriter is not supported by the default installation. Please follow the custom installation instructions:\n\n.. code-block:: bash\n\n   git clone https://github.com/cta-observatory/dl1-data-handler.git\n   git checkout magic # for MAGIC data\n   conda env create -n [ENVIRONMENT_NAME] -f environment-magic.yml\n   conda activate [ENVIRONMENT_NAME]\n   python setup_magic.py install\n\nFrom the Command Line:\n~~~~~~~~~~~~~~~~~~~~~~\n\nTo process data files into a desired format:\n\n.. code-block:: bash\n\n   dl1dh-write_data [runlist] [--config_file,-c CONFIG_FILE_PATH] [--output_dir,-o OUTPUT_DIR] [--debug]\n\non the command line.\n\nex:\n\n.. code-block:: bash\n\n   dl1dh-write_data runlist.yml -c example_config.yml --debug\n\n\n* runlist - A YAML file containing groups of input files to load data from and output files to write to. See example runlist for format.\n* config_file - The path to a YAML configuration file specifying all of the settings for data loading and writing. See example config file and documentation for details on each setting. If none is provided, default settings are used for everything.\n* output_dir - Path to directory to write all output files to. If not provided, defaults to the current directory.\n* debug - Optional flag to print additional debug information from the logger.\n\nIn a Python script:\n~~~~~~~~~~~~~~~~~~~\n\nIf the package was installed with pip as described above, you can import and use it in Python like:\n\nex:\n\n.. code-block:: python\n\n   from dl1_data_handler import dl1_data_writer\n\n   event_source_class = MyEventSourceClass\n   event_source_settings = {'setting1': 'value1'}\n\n   data_dumper_class = MyDataDumperClass\n   data_dumper_settings = {'setting2': 'value2'}\n\n   def my_cut_function(event):\n       # custom cut logic here\n       return True\n\n   data_writer = dl1_data_writer.DL1DataWriter(event_source_class=event_source_class,\n       event_source_settings=event_source_settings,\n       data_dumper_class=data_dumper_class,\n       data_dumper_settings=dumper_settings,\n       preselection_cut_function=my_cut_function,\n       output_file_size=10737418240,\n       events_per_file=500)\n\n   run_list = [\n    {'inputs': ['file1.simtel.gz', 'file2.simtel.gz'],\n     'target': 'output.h5'}\n   ]\n\n   data_writer.process_data(run_list)\n\nGenerating a run list\n~~~~~~~~~~~~~~~~~~~~~\n\nIf processing data from simtel.gz files, as long as their filenames have the format ``[particle_type]_[ze]deg_[az]deg_run[run_number]___[production info].simtel.gz`` or ``[particle_type]_[ze]deg_[az]deg_run[run_number]___[production info]_cone[cone_num].simtel.gz`` the dl1dh-generate_runlist can be used to automatically generate a runlist in the correct format. The script can also generate a run list with the MAGIC-MARS superstar files.\n\nIt can be called as:\n\n.. code-block:: bash\n\n   dl1dh-generate_runlist [file_dir] [--num_inputs_per_run,-n NUM_INPUTS_PER_RUN] [--output_file_name,-f OUTPUT_FILE_NAME] [--output_dir,-o OUTPUT_DIR]\n\n\n* file_dir - Path to a directory containing simtel.gz files with the filename format specified above.\n* num_inputs_per_run - Number of input files with the same particle type, ze, az, and production info to group together into each run (defaults to 10).\n* output_file - Path/filename of output runlist file without a postfix. Defaults to ./runlist\n* output_dir - Path where to save generated files. By default, the input directory is used.\n\nIt will automatically sort the simtel files in the file_dir directory into groups with matching particle_type, zenith, azimuth, and production parameters. Within each of these groups, it will group together input files in sequential order into runs of size NUM_INPUTS_PER_RUN. The output filename for each run will be automatically generated as ``[particle_type]_[ze]deg_[az]deg_runs[run_number_range]___[production info].h5``. The output YAML file will be written to output_file.\n\nImageMapper\n^^^^^^^^^^^\n\nThe ImageMapper class transforms the hexagonal input pixels into a 2D Cartesian output image. The basic usage is demonstrated in the `ImageMapper tutorial <https://github.com/cta-observatory/dl1-data-handler/blob/master/notebooks/test_image_mapper.ipynb>`_. It requires `ctapipe-extra <https://github.com/cta-observatory/ctapipe-extra>`_ outside of the dl1-data-handler. See this publication for a detailed description: `arXiv:1912.09898 <https://arxiv.org/abs/1912.09898>`_\n\nOther scripts\n^^^^^^^^^^^^^\n\nAll other scripts located in the scripts/deprecated directory are not currently updated to be compatible with dl1-data-handler >= 0.7.0 and should not be used.\n\nExamples/Tips\n-------------\n\n* Vitables is very helpful for viewing and debugging PyTables-style HDF5 files. Installation/download instructions can be found in the link below.\n\nKnown Issues/Troubleshooting\n----------------------------\n\nLinks\n-----\n\n\n* `Cherenkov Telescope Array (CTA) <https://www.cta-observatory.org/>`_ - Homepage of the CTA collaboration \n* `CTLearn <https://github.com/ctlearn-project/ctlearn/>`_ and `GammaLearn <https://gitlab.lapp.in2p3.fr/GammaLearn/GammaLearn>`_ - Repository of code for studies on applying deep learning to IACT analysis tasks. Maintained by groups at Columbia University, Universidad Complutense de Madrid, Barnard College (CTLearn) and LAPP (GammaLearn).\n* `ctapipe <https://cta-observatory.github.io/ctapipe/>`_ - Official documentation for the ctapipe analysis package (in development)\n* `ViTables <http://vitables.org/>`_ - Homepage for ViTables application for Pytables HDF5 file visualization\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "dl1 HDF5 data writer + reader + processor",
    "version": "0.10.10",
    "project_urls": {
        "Homepage": "http://github.com/cta-observatory/dl1-data-handler"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "57c761eda26f6ca2972c723e728bcbdb3d0890be0cb0b941ee0403d9e9249f5c",
                "md5": "7ae78821bc00f686327537c58cde819c",
                "sha256": "afad076473eed886ce58a30efc7dac10befd2f27dd3b6165dd556f0b8d233f39"
            },
            "downloads": -1,
            "filename": "dl1_data_handler-0.10.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7ae78821bc00f686327537c58cde819c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 61890,
            "upload_time": "2023-05-08T14:41:18",
            "upload_time_iso_8601": "2023-05-08T14:41:18.557353Z",
            "url": "https://files.pythonhosted.org/packages/57/c7/61eda26f6ca2972c723e728bcbdb3d0890be0cb0b941ee0403d9e9249f5c/dl1_data_handler-0.10.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-08 14:41:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cta-observatory",
    "github_project": "dl1-data-handler",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "landscape": true,
    "lcname": "dl1-data-handler"
}
        
Elapsed time: 0.06185s