scitrack


Namescitrack JSON
Version 2024.10.8 PyPI version JSON
download
home_pagehttps://github.com/HuttleyLab/scitrack
SummarySciTrack provides basic logging capabilities to track scientific computations.
upload_time2024-10-07 23:58:47
maintainerNone
docs_urlNone
authorGavin Huttley
requires_python>=3.9
licenseBSD-3
keywords science logging
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            |CI| |coverall| |Using Ruff| |Python 3.9+|

.. |CI| image:: https://github.com/HuttleyLab/scitrack/actions/workflows/testing_develop.yml/badge.svg
   :target: https://github.com/HuttleyLab/scitrack/actions/workflows/testing_develop.yml

.. |coverall| image:: https://coveralls.io/repos/github/GavinHuttley/scitrack/badge.svg?branch=develop
    :target: https://coveralls.io/github/GavinHuttley/scitrack?branch=develop

.. |Using Ruff| image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json
    :target: https://github.com/astral-sh/ruff

.. |Python 3.9+| image:: https://img.shields.io/badge/python-3.9+-blue.svg
    :target: https://www.python.org/downloads/release/python-390/


##################
About ``scitrack``
##################

One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, tracking a minimal set of system specific information.

``scitrack`` is a simple package aimed at researchers writing scripts, or more substantial scientific software, to support the tracking of scientific computation.  The package provides elementary functionality to support logging. The primary capabilities concern generating checksums on input and output files and facilitating logging of the computational environment.

To see some projects using ``scitrack``, see the "Used by" link at the top of the `project GitHub page <https://github.com/HuttleyLab/scitrack>`_.

**********
Installing
**********

For the released version::

    $ pip install scitrack

For the very latest version::

    $ pip install git+https://github.com/HuttleyLab/scitrack

Or clone it::

    $ git clone git@github.com:HuttleyLab/scitrack.git

And then install::

    $ pip install ~/path/to/scitrack

*****************
``CachingLogger``
*****************

There is a single object provided by ``scitrack``, ``CachingLogger``. This object is basically a wrapper around the Python standard library ``logging`` module. On invocation, ``CachingLogger`` captures basic information regarding the system and the command line call that was made to invoke the application.

In addition, the class provides convenience methods for logging both the path and the md5 hexdigest checksum [1]_ of input/output files. A method is also provided for producing checksums of text data. The latter is useful for the case when data are from a stream or a database, for instance.

All logging calls are cached until a path for a logfile is provided. The logger can also, optionally, create directories.

**********************************
Simple instantiation of the logger
**********************************

Creating the logger. Setting ``create_dir=True`` means on creation of the logfile, the directory path will be created also.

.. code:: python

    from scitrack import CachingLogger
    LOGGER = CachingLogger(create_dir=True)
    LOGGER.log_file_path = "somedir/some_path.log"

The last assignment triggers creation of ``somedir/some_path.log``.

.. warning::

    Once set, a loggers ``.log_file_path`` cannot be changed.

******************************************
Capturing a programs arguments and options
******************************************

``scitrack`` will write the contents of ``sys.argv`` to the log file, prefixed by ``command_string``. However, this only captures arguments specified on the command line. Tracking the value of optional arguments not specified, which may have default values, is critical to tracking the full command set. Doing this is now easy with the simple statement ``LOGGER.log_args()``. The logger can also record the versions of named dependencies.

Here's one approach to incorporating ``scitrack`` into a command line application built using the ``click`` `command line interface library <http://click.pocoo.org/>`_. Below we create a simple ``click`` app and capture the required and optional argument values.

.. note::

    ``LOGGER.log_args()`` should be called immediately after the function definition, or after "true" default values have been configured.

.. code:: python

    import click

    from scitrack import CachingLogger

    LOGGER = CachingLogger()


    @click.command()
    @click.option("-i", "--infile", type=click.Path(exists=True))
    @click.option("-t", "--test", is_flag=True, help="Run test.")
    def main(infile, test):
        # capture the local variables, at this point just provided arguments
        LOGGER.log_args()
        LOGGER.log_versions("numpy")
        LOGGER.input_file(infile)
        LOGGER.log_file_path = "some_path.log"


    if __name__ == "__main__":
        main()


The ``CachingLogger.write()`` method takes a message and a label. All other logging methods wrap ``log_message()``, providing a specific label. For instance, the method ``input_file()`` writes out two lines in the log.

- ``input_file_path``, the absolute path to the intput file
- ``input_file_path md5sum``, the hex digest of the file

``output_file()`` behaves analogously. An additional method ``text_data()`` is useful for other data input/output sources (e.g. records from a database). For this to have value for arbitrary data types requires a systematic approach to ensuring the text conversion is robust across platforms.

The ``log_args()`` method captures all local variables within a scope.

The ``log_versions()`` method captures versions for the current file and that of a list of named packages, e.g. ``LOGGER.log_versions(['numpy', 'sklearn'])``.


Some sample output
==================

::

    2020-05-25 13:32:07	Eratosthenes:98447	INFO	system_details : system=Darwin Kernel Version 19.4.0: Wed Mar  4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64
    2020-05-25 13:32:07	Eratosthenes:98447	INFO	python : 3.8.2
    2020-05-25 13:32:07	Eratosthenes:98447	INFO	user : gavin
    2020-05-25 13:32:07	Eratosthenes:98447	INFO	command_string : ./demo.py -i /Users/gavin/repos/SciTrack/tests/sample-lf.fasta
    2020-05-25 13:32:07	Eratosthenes:98447	INFO	params : {'infile': '/Users/gavin/repos/SciTrack/tests/sample-lf.fasta', 'test': False}
    2020-05-25 13:32:07	Eratosthenes:98447	INFO	version : __main__==None
    2020-05-25 13:32:07	Eratosthenes:98447	INFO	version : numpy==1.18.4
    2020-05-25 13:32:07	Eratosthenes:98447	INFO	input_file_path : /Users/gavin/repos/SciTrack/tests/sample-lf.fasta
    2020-05-25 13:32:07	Eratosthenes:98447	INFO	input_file_path md5sum : 96eb2c2632bae19eb65ea9224aaafdad

**********************
Other useful functions
**********************

Two other useful functions are ``get_file_hexdigest`` and ``get_text_hexdigest``.

****************
Reporting issues
****************

Use the project `issue tracker <https://github.com/HuttleyLab/scitrack/issues>`_.

**************
For Developers
**************

We use flit_ for package building. Having cloned the repository onto your machine. Install ``flit``::

$ python3 -m pip install flit

Do a developer install of ``scitrack`` using flit as::

$ cd path/to/cloned/repo
$ flit install -s --python `which python`

.. note:: This installs a symlink into ``site-packages`` of the python identified by ``which python``.

.. [1] The hexdigest serves as a unique signature of a files contents.
.. _flit: https://flit.readthedocs.io/en/latest/


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/HuttleyLab/scitrack",
    "name": "scitrack",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "science, logging",
    "author": "Gavin Huttley",
    "author_email": "Gavin.Huttley@anu.edu.au",
    "download_url": "https://files.pythonhosted.org/packages/cf/16/f83a00cb3252d7e3f5b913b9a03afdb299b0d43a1bb4f1e6b7e13ff1312e/scitrack-2024.10.8.tar.gz",
    "platform": null,
    "description": "|CI| |coverall| |Using Ruff| |Python 3.9+|\n\n.. |CI| image:: https://github.com/HuttleyLab/scitrack/actions/workflows/testing_develop.yml/badge.svg\n   :target: https://github.com/HuttleyLab/scitrack/actions/workflows/testing_develop.yml\n\n.. |coverall| image:: https://coveralls.io/repos/github/GavinHuttley/scitrack/badge.svg?branch=develop\n    :target: https://coveralls.io/github/GavinHuttley/scitrack?branch=develop\n\n.. |Using Ruff| image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json\n    :target: https://github.com/astral-sh/ruff\n\n.. |Python 3.9+| image:: https://img.shields.io/badge/python-3.9+-blue.svg\n    :target: https://www.python.org/downloads/release/python-390/\n\n\n##################\nAbout ``scitrack``\n##################\n\nOne of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, tracking a minimal set of system specific information.\n\n``scitrack`` is a simple package aimed at researchers writing scripts, or more substantial scientific software, to support the tracking of scientific computation.  The package provides elementary functionality to support logging. The primary capabilities concern generating checksums on input and output files and facilitating logging of the computational environment.\n\nTo see some projects using ``scitrack``, see the \"Used by\" link at the top of the `project GitHub page <https://github.com/HuttleyLab/scitrack>`_.\n\n**********\nInstalling\n**********\n\nFor the released version::\n\n    $ pip install scitrack\n\nFor the very latest version::\n\n    $ pip install git+https://github.com/HuttleyLab/scitrack\n\nOr clone it::\n\n    $ git clone git@github.com:HuttleyLab/scitrack.git\n\nAnd then install::\n\n    $ pip install ~/path/to/scitrack\n\n*****************\n``CachingLogger``\n*****************\n\nThere is a single object provided by ``scitrack``, ``CachingLogger``. This object is basically a wrapper around the Python standard library ``logging`` module. On invocation, ``CachingLogger`` captures basic information regarding the system and the command line call that was made to invoke the application.\n\nIn addition, the class provides convenience methods for logging both the path and the md5 hexdigest checksum [1]_ of input/output files. A method is also provided for producing checksums of text data. The latter is useful for the case when data are from a stream or a database, for instance.\n\nAll logging calls are cached until a path for a logfile is provided. The logger can also, optionally, create directories.\n\n**********************************\nSimple instantiation of the logger\n**********************************\n\nCreating the logger. Setting ``create_dir=True`` means on creation of the logfile, the directory path will be created also.\n\n.. code:: python\n\n    from scitrack import CachingLogger\n    LOGGER = CachingLogger(create_dir=True)\n    LOGGER.log_file_path = \"somedir/some_path.log\"\n\nThe last assignment triggers creation of ``somedir/some_path.log``.\n\n.. warning::\n\n    Once set, a loggers ``.log_file_path`` cannot be changed.\n\n******************************************\nCapturing a programs arguments and options\n******************************************\n\n``scitrack`` will write the contents of ``sys.argv`` to the log file, prefixed by ``command_string``. However, this only captures arguments specified on the command line. Tracking the value of optional arguments not specified, which may have default values, is critical to tracking the full command set. Doing this is now easy with the simple statement ``LOGGER.log_args()``. The logger can also record the versions of named dependencies.\n\nHere's one approach to incorporating ``scitrack`` into a command line application built using the ``click`` `command line interface library <http://click.pocoo.org/>`_. Below we create a simple ``click`` app and capture the required and optional argument values.\n\n.. note::\n\n    ``LOGGER.log_args()`` should be called immediately after the function definition, or after \"true\" default values have been configured.\n\n.. code:: python\n\n    import click\n\n    from scitrack import CachingLogger\n\n    LOGGER = CachingLogger()\n\n\n    @click.command()\n    @click.option(\"-i\", \"--infile\", type=click.Path(exists=True))\n    @click.option(\"-t\", \"--test\", is_flag=True, help=\"Run test.\")\n    def main(infile, test):\n        # capture the local variables, at this point just provided arguments\n        LOGGER.log_args()\n        LOGGER.log_versions(\"numpy\")\n        LOGGER.input_file(infile)\n        LOGGER.log_file_path = \"some_path.log\"\n\n\n    if __name__ == \"__main__\":\n        main()\n\n\nThe ``CachingLogger.write()`` method takes a message and a label. All other logging methods wrap ``log_message()``, providing a specific label. For instance, the method ``input_file()`` writes out two lines in the log.\n\n- ``input_file_path``, the absolute path to the intput file\n- ``input_file_path md5sum``, the hex digest of the file\n\n``output_file()`` behaves analogously. An additional method ``text_data()`` is useful for other data input/output sources (e.g. records from a database). For this to have value for arbitrary data types requires a systematic approach to ensuring the text conversion is robust across platforms.\n\nThe ``log_args()`` method captures all local variables within a scope.\n\nThe ``log_versions()`` method captures versions for the current file and that of a list of named packages, e.g. ``LOGGER.log_versions(['numpy', 'sklearn'])``.\n\n\nSome sample output\n==================\n\n::\n\n    2020-05-25 13:32:07\tEratosthenes:98447\tINFO\tsystem_details : system=Darwin Kernel Version 19.4.0: Wed Mar  4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64\n    2020-05-25 13:32:07\tEratosthenes:98447\tINFO\tpython : 3.8.2\n    2020-05-25 13:32:07\tEratosthenes:98447\tINFO\tuser : gavin\n    2020-05-25 13:32:07\tEratosthenes:98447\tINFO\tcommand_string : ./demo.py -i /Users/gavin/repos/SciTrack/tests/sample-lf.fasta\n    2020-05-25 13:32:07\tEratosthenes:98447\tINFO\tparams : {'infile': '/Users/gavin/repos/SciTrack/tests/sample-lf.fasta', 'test': False}\n    2020-05-25 13:32:07\tEratosthenes:98447\tINFO\tversion : __main__==None\n    2020-05-25 13:32:07\tEratosthenes:98447\tINFO\tversion : numpy==1.18.4\n    2020-05-25 13:32:07\tEratosthenes:98447\tINFO\tinput_file_path : /Users/gavin/repos/SciTrack/tests/sample-lf.fasta\n    2020-05-25 13:32:07\tEratosthenes:98447\tINFO\tinput_file_path md5sum : 96eb2c2632bae19eb65ea9224aaafdad\n\n**********************\nOther useful functions\n**********************\n\nTwo other useful functions are ``get_file_hexdigest`` and ``get_text_hexdigest``.\n\n****************\nReporting issues\n****************\n\nUse the project `issue tracker <https://github.com/HuttleyLab/scitrack/issues>`_.\n\n**************\nFor Developers\n**************\n\nWe use flit_ for package building. Having cloned the repository onto your machine. Install ``flit``::\n\n$ python3 -m pip install flit\n\nDo a developer install of ``scitrack`` using flit as::\n\n$ cd path/to/cloned/repo\n$ flit install -s --python `which python`\n\n.. note:: This installs a symlink into ``site-packages`` of the python identified by ``which python``.\n\n.. [1] The hexdigest serves as a unique signature of a files contents.\n.. _flit: https://flit.readthedocs.io/en/latest/\n\n",
    "bugtrack_url": null,
    "license": "BSD-3",
    "summary": "SciTrack provides basic logging capabilities to track scientific computations.",
    "version": "2024.10.8",
    "project_urls": {
        "Bug Tracker": "https://github.com/HuttleyLab/scitrack/issues",
        "Documentation": "https://github.com/HuttleyLab/scitrack",
        "Homepage": "https://github.com/HuttleyLab/scitrack",
        "Source Code": "https://github.com/HuttleyLab/scitrack"
    },
    "split_keywords": [
        "science",
        " logging"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b4dc454a7c2896ea5e70442cdb314b5603fc48a507bb5d480a4763815a3622b4",
                "md5": "26eec80416d8292007d3d9400089be8c",
                "sha256": "1cee964d15e7e50b7013767ec73b24e2cc54d03ffc5ea9278aeba98675bf7780"
            },
            "downloads": -1,
            "filename": "scitrack-2024.10.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "26eec80416d8292007d3d9400089be8c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 7788,
            "upload_time": "2024-10-07T23:58:45",
            "upload_time_iso_8601": "2024-10-07T23:58:45.495641Z",
            "url": "https://files.pythonhosted.org/packages/b4/dc/454a7c2896ea5e70442cdb314b5603fc48a507bb5d480a4763815a3622b4/scitrack-2024.10.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cf16f83a00cb3252d7e3f5b913b9a03afdb299b0d43a1bb4f1e6b7e13ff1312e",
                "md5": "0803d6fb64a3c0955dfac08d5bd25fa6",
                "sha256": "2694a67212c075006324ef1371291868500a21cbe8e6c92402f0b2ac1a7bc7e0"
            },
            "downloads": -1,
            "filename": "scitrack-2024.10.8.tar.gz",
            "has_sig": false,
            "md5_digest": "0803d6fb64a3c0955dfac08d5bd25fa6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 7308,
            "upload_time": "2024-10-07T23:58:47",
            "upload_time_iso_8601": "2024-10-07T23:58:47.158159Z",
            "url": "https://files.pythonhosted.org/packages/cf/16/f83a00cb3252d7e3f5b913b9a03afdb299b0d43a1bb4f1e6b7e13ff1312e/scitrack-2024.10.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-07 23:58:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "HuttleyLab",
    "github_project": "scitrack",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "scitrack"
}
        
Elapsed time: 0.30858s